26
parquet vs csv
(lemmy.ml)
Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!
Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.
Hope you enjoy the instance!
Rules
Follow the wormhole through a path of communities !webdev@programming.dev
Parquet is really used for big data batch data processing. It's columnar-based file format and is optimized for large, aggregation queries. It's non-human readable so you need a library like apache arrow to read/write to it.
I would use parquet in the following circumstances (or combination of circumstances):
Since the data is columnar-based, doing queries like
select sum(sales) from revenue
is much cheaper and faster if the underlying data is in parquet than csv.The big advantage of csv is that it's more portable. csv as a data file format has been around forever, so it is used in a lot of places where parquet can't be used.