61
submitted 1 year ago* (last edited 1 year ago) by DrNeurohax@kbin.social to c/piracy@lemmy.dbzer0.com

(As part of the Reddit migration, any time I'm only able to find info on Reddit, I'm reposting it to kbin/Lemmy.)

TL;DR - To get the page's OCR text from Newspapers.com, replace /image/ with /newspage/ in the url with the thumbnail.

EDIT: @godless Pointed out that some libraries have access to Newspapers.com through a Library Edition portal. My local library has several newspaper archives, and I figured the first couple would be the most complete. Nope, but there was Newspapers.com Library Edition access buried under the fold. That worked!

Bonus tip - Also search for current info of close family members. The spokeo hit was due to searching his mother's name, and spokeo is too dumb to understand that deceased people don't move with their families to future homes. It treated his records like he was living ("Current" address, phone numbers, etc were listed, even though they were for his sister, who's still alive).

And here's my rant/vent/story...

I was looking for an obituary in that nebulous early 90's time period where only some info is digitized. Hi s family's having a memorial for him next week and I was hoping to bring a pic of the newspaper from his birthday and deathday, along with the obit. I had a general idea of the date of death, knew the city and funeral home, and his name minus middle initial. Sites like legacy.com refused to return a match. Even the state and county records sites were useless.

After a couple hours, I had only 2 partial hits. Bing Chat (yeah, I was surprised, too) said it found the obit, but it was locked behind a paywall. The newspaper that had it (which I checked earlier) said nothing was there. It appears that the obits are available going back to 2004. Dates before that were supposedly available in the paper's archive. The archive was 404. Or, rather, the entire domain was 404.

The second hit was on spokeo - one of those obnoxious sites that gives partial info and then wants you to subscribe to 3 different levels of services. But, from there I got his middle initial and the exact birthday and death date. That info helped.

I eventually made it to Newspapers.com, which threw up a paywall, but indicated it had the info. I did the usual checking the source and css, reader mode, incognito, etc. It was clear that the image was probably there, judging by the css. Nope. The only info I could find on getting through that barrier was on Reddit. It doesn't lead to the paper image, but the OCR text. Just replace /image/ with /newspage/ in the url with the thumbnail.

Good. It existed and was exactly where I was expecting through the whole search. Now to get the paper image that the text was extracted from... nope. Gotta sign up.

One last thing to try again, since Newspapers.com gave me the exact PAGE NUMBER.

I tried looking into the archives of the paper available in the library's database. It appears most obits (non-newsworthy ones) were excluded. My hypothesis is that the paper sold the archives to a site that stipulated that they must be excluded from other sources. It's the only explanation.

So, looks like I'll be visiting the library Monday to see if they have microfiche of the paper. WTF is going on that I can't find a major metropolitan newspaper's obit section in 2023? I can find 15 million pictures of influencers' breakfasts, but a 2x2 inch shred of paper is completely inaccessible. Not even a torrent out there of this stuff because who the fuck would make it hard to find an old newspaper?

(Forgot to mention that I used Google, Bing, DDG, and SearXNG. Bing was the most helpful, Google the least helpful.)

This shit right here is why I pirate - "great" business models. If there was a torrent of the entire decade's worth of that newspaper, it would have been easier to download that, compared to jumping through all these hoops.

you are viewing a single comment's thread
view the rest of the comments
[-] DrNeurohax@kbin.social 9 points 1 year ago

Thank you so much for this! It reminded me to revisit my library's general resources and look specifically for which archive collections they had available. I'm 1 state over, so I figured there was a good chance we would have Newspapers.com Library Edition access here.

The main/default collection my library sent me to was no help, but they had a Newspapers.com Library Edition portal listed further down. Final-fucking-ly got it. I really, really appreciate the help.

[-] godless@latte.isnot.coffee 3 points 1 year ago

Awesome, glad it worked!!!

this post was submitted on 23 Jul 2023
61 points (98.4% liked)

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

54420 readers
567 users here now

⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don't request invites, trade, sell, or self-promote

3. Don't request or link to specific pirated titles, including DMs

4. Don't submit low-quality posts, be entitled, or harass others



Loot, Pillage, & Plunder

📜 c/Piracy Wiki (Community Edition):


💰 Please help cover server costs.

Ko-Fi Liberapay
Ko-fi Liberapay

founded 1 year ago
MODERATORS