11
submitted 10 months ago* (last edited 10 months ago) by Red1C3@lemmy.world to c/programming@programming.dev

Long story short, I want to build a system that reorders some components in a document file (be it a docx or odt, I don't have a hard constraint atm).

So my problem input should be a document file, and I need to be able to approximate the number of pages consumed by this document file, I also need to be able to get the height of individual components (like a single paragraph or a table) to have the data I need to rearrange so I can make the document have less pages.

I don't have a hard constraint on the programming language of the tool either (Python preferred), I prefer not embedding LibreOffice into my system.

Also I'm willing to hear other solutions (maybe my input is not the optimal thing I can use for this problem).

Thanks in advance!

you are viewing a single comment's thread
view the rest of the comments
[-] take6056@feddit.nl 5 points 10 months ago

I would look into a library that does manipulation of odt (or docx). Code whatever algorithm you need to do the restructuring. Now your left with an in memory representation of the document that you can hopefully figure out how many pages it spans, or save it to a temporary file.

All depends really on how feature rich the odt libraries are and/or how deep you want to dive into the spec.

I feel like this is an XY problem. Is there an underlying issue your trying to resolve?

[-] Red1C3@lemmy.world 3 points 10 months ago

Yeah my main is issue is trying to figure out how many pages it spans, I've looked at some docx and odt libs, none did seem to have an API related to getting the number of pages nor the height of some component (except for stuff with fixed heights like images...).

The underlying issue is that I want to create an exam paper with the least papers possible per exam, so I guess that at least I should be able to get the height of each question of the exam and rearrange them (using an algorithm) in a fashion that uses less papers.

[-] ericjmorey@programming.dev 2 points 10 months ago

Use Google Apps Script to open the document in Google Docs, read the number of pages that Google Docs renders, closes the document, then delets the document (optional).

[-] Red1C3@lemmy.world 1 points 10 months ago

I need to automate the process to use it during an algorithm, this is far from practical.

[-] ericjmorey@programming.dev 0 points 10 months ago* (last edited 10 months ago)

My suggestion was to automate the process using Google Apps Script using an algorithm. You've not given a lot of details about what you actually want to do but for what you did give, Google Apps Script would let you automate the task.

load more comments (3 replies)
load more comments (3 replies)
this post was submitted on 13 Jan 2024
11 points (100.0% liked)

Programming

17314 readers
431 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 1 year ago
MODERATORS