this post was submitted on 19 Sep 2025
5 points (100.0% liked)

Self Hosted - Self-hosting your services.

16066 readers
2 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules

Important

Cross-posting

If you see a rule-breaker please DM the mods!

founded 4 years ago
MODERATORS
 

Context: my father is a lawyer and therefore has a bajillion pdf files that were digitised, stored in a server. I’ve gotten an idea on how to do OCR in all of them.

But after that, how can I make them easily searchable? (Keep in mind that unfortunately, the directory structure is important information to classify the files, aka you may have a path like clientABC/caseAV1/d.pdf

you are viewing a single comment's thread
view the rest of the comments
[–] Father_Redbeard@lemmy.ml 1 points 1 week ago

Would Papra work for you? I like it better than Paperless-NGX personally, which others have mentioned. But I'll admit I'm not sure it'll fit in your use case as I'm feeding it newly scanned documents for mine rather than existing file/folder hierarchy.