209
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 06 Jun 2024
209 points (96.9% liked)
Open Source
31199 readers
153 users here now
All about open source! Feel free to ask questions, and share news, and interesting stuff!
Useful Links
- Open Source Initiative
- Free Software Foundation
- Electronic Frontier Foundation
- Software Freedom Conservancy
- It's FOSS
- Android FOSS Apps Megathread
Rules
- Posts must be relevant to the open source ideology
- No NSFW content
- No hate speech, bigotry, etc
Related Communities
- !libre_culture@lemmy.ml
- !libre_software@lemmy.ml
- !libre_hardware@lemmy.ml
- !linux@lemmy.ml
- !technology@lemmy.ml
Community icon from opensource.org, but we are not affiliated with them.
founded 5 years ago
MODERATORS
Because grocery stores don't make that data accessible to third party developers, otherwise someone would do what you're suggesting and they'd risk you shopping elsewhere.
Bah, the data is on their websites, figure out how to collect it.
Go ahead and try scraping an arbitrary list of sites without an API and let me know how that goes. It would be a constant maintenance headache, especially if you're talking about anything other than the larger chains that have fairly standardized sites
I bet an AI could do it
I don't think you understand how AIs work
@Dogyote @Zetaphor
I've been webscraping in my job for 6 years. Yes, it's a constant headache, they keep updating their sites and improving their antibot protections. But it can be done and some companies are doing it (on a biiiiig scale). It's just not very realistic that an open-source project would be able to invest that much effort into all the updates. Well some do, youtube-dl is basically webscraping and they are pretty up-to-date. It's just very rare.
@Dogyote @Zetaphor
And we also explored the AI option, it always turned out unrealistic. Either you would have to scrape the content and send it to the AI to parse the info, but then you'd be paying for every scrape, or run a powerful rig nonstop, but the results would still be hit and miss. Or you might let the AI generate the code for the scraping module, still not ideal, they were constantly hallucinating things that weren't there.