54
Analyzing Data 180,000x Faster with Rust
(willcrichton.net)
Welcome to the Rust community! This is a place to discuss about the Rust programming language.
Credits
Referring to lazy Python as "baseline" is a joke.
But isn't it kind of obvious that if you are able to do 180k times improvement, then the baseline is probably not very impressive to begin with. Still, that doesn't take away that the optimizations were impressive, and that it was interesting to read about it.
I think your last sentence has one negation too much.
If it was interesting to read about it, then the criticism did not take away that the optimizations were impressive.
Fixed it.... I come from a language culture were we like our negations :) Also, not native english speaker, so combine the two and you are in for a ride!
Yeah, this one really had me scratching my head:
With rust is the joke as if you couldn't do it otherwise. Maybe c would be only 179,999x faster, or FORTRAN 180,001x, (numbers made up). Python could probably be made 60,000x faster as well.
Yet it's a fine baseline. The actual speedup for switching to rust was 8x, the rest was all about changing data structures, using SIMD, parallelism and batching.
I think it's a great baseline. Within academic context, Python (and perhaps Matlab) are extremely common for data analysis. I doubt many would transition code to other languages unless strictly needed such as the case in the article. Showing how to "simply" speed up code like the article does is a great way to snag speed even if you don't analyze timing, and just replicate steps from this article.
Having done stuff myself as part of research, and having people I know go from developer jobs to research jobs, I can safely say scientists generally do not make good code. Regardless of language. An article like this gives good steps to take from start to end, and would be a valuable tool in a possible transition to better code.
I can absolutely confirm. I work in a specialized industry where we have a team of PhDs in our R&D team that writes quick and dirty code in Fortran. That's what they know, and it's what they're most productive with.
Our production code is in Python. We took one of their solutions and made it way faster, and the main improvement was to restructure the code to not need everything in memory at once. Basically, they were processing data in 4D space, but only needed 3D worth of data at a time, and most of the time was being spent in memory allocation. So we drastically dropped memory usage and memory allocation by simply changing how they iterated, which also meant we could parallelize it, further increasing performance.
They're paid to come up with the algorithms, I'm paid to make it run fast in production. We looked into Rust, but for this project, Python got us well within a reasonable running time and Rust would've requested retaining a lot of our team, but it's still on the table if we need more performance.