44
submitted 1 year ago* (last edited 1 year ago) by Historical_General@lemmy.world to c/piracy@lemmy.dbzer0.com

I saw this other guy asking how you'd download protected drive only view documents. So that reminded me of that other annoying characteristic of PDFs. They're 'protected'.

How do you deal with PDFs that are inherently uncustomisable and have fixed formatting? I appreciate the KO Reader and other readers can do reflowable text, but I'd prefer not to and epubs/txt/any customisable format would be better.

Any good methods of PDF to text/epub out there?

you are viewing a single comment's thread
view the rest of the comments
[-] kniescherz@feddit.de 3 points 1 year ago

No easy ones I know of. Have done it but it was a pain.

  1. Crop the pages so that headers, footnotes and page numbers arent part of the book anymore. Use Briss for that.

  2. Convert to html, not sure which is the best software though

  3. Clean up manually. That is the most work, mostly newlines and stuff. Dont even try without Regex.

I dont bother anymore, for smaller pages I do the cropping and read on kindle. Larger on tablet.

Have a look at this forum for more in depth knowledge: https://www.mobileread.com/forums/forumdisplay.php?f=184

[-] Historical_General@lemmy.world -2 points 1 year ago

Others are recommending Calibre which I'll try, even though I share your pessimism. Have you ever tried it?

[-] kniescherz@feddit.de 2 points 1 year ago

Yes I tried it and did not get any usable results.

[-] Historical_General@lemmy.world 1 points 1 year ago

In his own words lol:

To re-iterate PDF is a really, really bad format to use as input. If you absolutely must use PDF, then be prepared for an output ranging anywhere from decent to unusable, depending on the input PDF.

https://manual.calibre-ebook.com/conversion.html

[-] kniescherz@feddit.de 4 points 1 year ago

Wow who downvotes you?

That quote is pretty spot on. I feel that there might be well formated and simple pdfs which could be formatted but as soon as it gets busier or the pdf isnt well structured behind the scenes it gets messy.

[-] Historical_General@lemmy.world 3 points 1 year ago* (last edited 1 year ago)

I think some idiots/bots from another instance to do with a political topic did it lol. They've done it to a few comments. We haven't left reddit apparently.

Do you think lemmy admins could see the downvoters and check if they're using bots?

[-] mexicancartel@lemmy.dbzer0.com 2 points 1 year ago

Too many people downvoted your post too idk why

[-] Historical_General@lemmy.world 1 points 1 year ago

Yeah I noticed that.

I just checked, there are 40 downvotes (78 upvotes). That's wierd - I checked other posts and they don't seem to have anything like that - I saw none in the double digits.

I might just have to keep a separate account for posting politics which is sad but necessary so that my posts/comments are ranked according to utility and not just downvoted by angry nerds. I'll probably make a post about this somewhere too.

this post was submitted on 30 Jul 2023
44 points (67.2% liked)

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

54420 readers
221 users here now

⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don't request invites, trade, sell, or self-promote

3. Don't request or link to specific pirated titles, including DMs

4. Don't submit low-quality posts, be entitled, or harass others



Loot, Pillage, & Plunder

📜 c/Piracy Wiki (Community Edition):


💰 Please help cover server costs.

Ko-Fi Liberapay
Ko-fi Liberapay

founded 1 year ago
MODERATORS