[-] mm_maybe@sh.itjust.works 1 points 2 days ago

I think that there are some people working on this, and a few groups that have claimed to do it, but I'm not aware of any that actually meet the description you gave. Can you cite a paper or give a link of some sort?

[-] mm_maybe@sh.itjust.works 5 points 6 days ago

It's 100% this. Politics is treated like a sport in the USA; the only thing that matters is your side winning, and which side you root for is largely dictated by location and family history. This is encouraged by the private news media, who intentionally report on election campaigns in this manner in order to increase ratings and ad revenue. Social media only made it worse because it made a lot of abstract identity dimensions, such as political affiliation, feel stronger to people than their everyday lives.

[-] mm_maybe@sh.itjust.works 16 points 1 week ago

Model sizes are larger than their training sets

Excuse me, what? You think Huggingface is hosting 100's of checkpoints each of which are multiples of their training data, which is on the order of terabytes or petabytes in disk space? I don't know if I agree with the compression argument, myself, but for other reasons--your retort is objectively false.

[-] mm_maybe@sh.itjust.works 13 points 1 week ago

I'm getting really tired of saying this over and over on the Internet and getting either ignored or pounced on by pompous AI bros and boomers, but this "there isn't enough free data" claim has never been tested. The experiments that have come close (look up the early Phi and Starcoder papers, or the CommonCanvas text-to-image model) suggested that the claim is false, by showing that a) models trained on small, well-curated datasets can match and outperform models trained on lazily curated large web scrapes, and b) models trained solely on permissively licensed data can perform on par with at least the earlier versions of models trained more lazily (e.g. StarCoder 1.5 performing on par with Code-Davinci). But yes, a social network or other organization that has access to a bunch of data that they own, or have licensed, could almost certainly fine-tune a base LLM trained solely on permissively licensed data to get a tremendously useful tool that would probably be safer and more helpful than ChatGPT for that organization's specific business, at vastly lower risk of copyright claims or toxic generated content, for that matter.

[-] mm_maybe@sh.itjust.works 65 points 1 week ago

The problem with your argument is that it is 100% possible to get ChatGPT to produce verbatim extracts of copyrighted works. This has been suppressed by OpenAI in a rather brute force kind of way, by prohibiting the prompts that have been found so far to do this (e.g. the infamous "poetry poetry poetry..." ad infinitum hack), but the possibility is still there, no matter how much they try to plaster over it. In fact there are some people, much smarter than me, who see technical similarities between compression technology and the process of training an LLM, calling it a "blurry JPEG of the Internet"... the point being, you wouldn't allow distribution of a copyrighted book just because you compressed it in a ZIP file first.

[-] mm_maybe@sh.itjust.works 13 points 1 week ago

What irks me most about this claim from OpenAI and others in the AI industry is that it's not based on any real evidence. Nobody has tested the counterfactual approach he claims wouldn't work, yet the experiments that came closest--the first StarCoder LLM and the CommonCanvas text-to-image model--suggest that, in fact, it would have been possible to produce something very nearly as useful, and in some ways better, with a more restrained training data curation approach than scraping outbound Reddit links.

All that aside, copyright clearly isn't the right framework for understanding why what OpenAI does bothers people so much. It's really about "data dignity", which is a relatively new moral principle not yet protected by any single law. Most people feel that they should have control over what data is gathered about their activities online, as well as what is done with those data after it's been collected, and even if they publish or post something under a Creative Commons license that permits derived uses of their work, they'll still get upset if it's used as an input to machine learning. This is true even if the generative models thereby created are not created for commercial reasons, but only for personal or educational purposes that clearly constitute fair use. I'm not saying that OpenAI's use of copyrighted work is fair, I'm just saying that even in cases where the use is clearly fair, there's still a perceived moral injury, so I don't think it's wise to lean too heavily on copyright law if we want to find a path forward that feels just.

[-] mm_maybe@sh.itjust.works 42 points 1 month ago* (last edited 1 month ago)

This is insane. What on earth could possibly constitute adult content on a navigation app? Are they going to start age-gating points of interest, like, if the destination address is a strip club in Virginia you have verify?

Edit: looks like it was added back. maybe just a mistake? https://fosstodon.org/@organicmaps/112982935184446753

[-] mm_maybe@sh.itjust.works 20 points 1 month ago

Actually, it's because most AI just sucks at either seeing or drawing the faces of people of color. This is a very well-known issue that was exposed by Timnit Gebru, one of the most formidable critics of the generative AI craze. Worth a follow on Mastodon: https://dair-community.social/@timnitGebru

[-] mm_maybe@sh.itjust.works 5 points 1 month ago

Personally, from puberty through most of high school, I never approached any girls because I never had to--in my social setting it was totally normal and acceptable for girls to ask boys out, and I guess I was approachable enough, despite not being very popular, to get asked a fair amount. Yet I still had this culturally inherited concept that it should be me who was doing the asking, or at least learning how to do it. So I struggled with this a lot in my teenage years... part of the problem was that I also didn't have the stereotypical physical attraction to women, but a more personality-based one, so a lot of time was wasted trying to convert close platonic friendships into romantic relationships (maybe that works for some people, but not in my experience).

Ultimately, the woman I married was someone I approached without knowing her beforehand, but only after like weeks of making very mutual, not at all creepy eye contact walking back and forth on the same paths in the music center of our college campus. I think that's sort of the bare minimum: some signs of shared context, some shared interest or hobby, some smoke signals indicating mutual attraction. And for a minute I still thought maybe we were meant to be just friends, but obviously I was wrong, and part of how I know that is the way we met: with clear physical attraction established. This was all before dating apps, and I think I can see their appeal from that perspective; they reduce the uncertainty about "what kind of relationship is this going to become" to some extent.

Anyways, though, I'm pretty sure that I didn't have to put myself through the suffering of trying to "be a man" and build up the confidence to ask women out, because I was lucky to have a milieu in which they could comfortably take the lead. Heck, my wife might have been the one to break the tension eventually if it wasn't me. What I do regret, for sure, is the platonic friendships I lost with women during that phase in which I felt I was "supposed" to be converting them into girlfriends. The kind of reasoning that draws a direct line from changes in who is doing the asking out (for a variety of reasons) to male loneliness is really not helping any young men who are similarly confused.

[-] mm_maybe@sh.itjust.works 16 points 1 month ago

I remember a beloved fish-and-chips restaurant in the area where I grew up that had, in addition to fun cartoons of a clam introducing various dishes, smoke stains all along the edge of the ceiling. It was that bad... funny to think that it was soon after smoking was banned that the place closed down--maybe it never actually tasted good but nobody could tell??

[-] mm_maybe@sh.itjust.works 52 points 1 month ago

Be me

Early adopter of LLMs ever since a random tryout of Replika blew my mind and I set out to figure what the hell was generating its responses

Learn to fine-tune GPT-2 models and have a blast running 30+ subreddit parody bots on r/SubSimGPT2Interactive, including some that generate weird surreal imagery from post titles using VQGAN+CLIP

Have nagging concerns about the industry that produced these toys, start following Timnit Gebru

Begin to sense that something is going wrong when DALLE-2 comes out, clearly targeted at eliminating creative jobs in the bland corporate illustration market. Later, become more disturbed by Stable Diffusion making this, and many much worse things, possible, at massive scale

Try to do something about it by developing one of the first "AI Art" detection tools, intended for use by moderators of subreddits where such content is unwelcome. Get all of my accounts banned from Reddit immediately thereafter

Am dismayed by the viral release of ChatGPT, essentially the same thing as DALLE-2 but text

Grudgingly attempt to see what the fuss is about and install Github Copilot in VSCode. Waste hours of my time debugging code suggestions that turn out to be wrong in subtle, hard-to-spot ways. Switch to using Bing Copilot for "how-to" questions because at least it cites sources and lets me click through to the StackExchange post where the human provided the explanation I need. Admit the thing can be moderately useful and not just a fun dadaist shitposting machine. Have major FOMO about never capitalizing on my early adopter status in any money-making way

Get pissed off by Microsoft's plans to shove Copilot into every nook and cranny of Windows and Office; casually turn on the Opympics and get bombarded by ads for Gemini and whatever the fuck it is Meta is selling

Start looking for an alternative to Edge despite it being the best-performing web browser by many metrics, as well as despite my history with "AI" and OK-ish experience with Copilot. Horrified to find that Mozilla and Brave are doing the exact same thing

Install Vivaldi, then realize that the Internet it provides access to is dead and enshittified anyway

Daydream about never touching a computer again despite my livelihood depending on it

[-] mm_maybe@sh.itjust.works 17 points 3 months ago

I am a consultant who sometimes writes code to do certain useful things as part of larger systems (parts of which may be commercial or GPL) but my clients always try to impose terms in their contracts with me which say that anything I develop immediately becomes theirs, which limits my ability to use it in my next project. I can to some extent circumvent this if I find a way to publish the work, or some essential part of it, under an MIT license. I'm never going to make money off of my code directly; at best it's middleware, and my competitors don't use the same stack, so I'm not giving them any real advantage... I don't see how I'm sabotaging myself in this situation; if anything the MIT license is a way of securing my freedom and it benefits my future customers as well since I don't have to rebuild from scratch every time.

1

Hi Lemmy, I used to run GPT-2 bots for fun on Reddit. At my peak I had about 30 of them. Reddit admins didn't like them, but I thought they were fun.

view more: next ›

mm_maybe

joined 1 year ago
MODERATOR OF