Project Analyzing Human Language Usage Shuts Down Because ‘Generative AI Has Polluted the Data’

Stopthatgirl7@lemmy.world · 9 months ago

Project Analyzing Human Language Usage Shuts Down Because ‘Generative AI Has Polluted the Data’

grue@lemmy.world · 9 months ago

The project creator doesn’t mince words:

wordfreq was built by collecting a whole lot of text in a lot of languages. That used to be a pretty reasonable thing to do, and not the kind of thing someone would be likely to object to. Now, the text-slurping tools are mostly used for training generative AI, and people are quite rightly on the defensive. If someone is collecting all the text from your books, articles, Web site, or public posts, it’s very likely because they are creating a plagiarism machine that will claim your words as its own.

So I don’t want to work on anything that could be confused with generative AI, or that could benefit generative AI.

OpenAI and Google can collect their own damn data. I hope they have to pay a very high price for it, and I hope they’re constantly cursing the mess that they made themselves.

Solumbran@lemmy.world · 9 months ago

Seems pretty mild and reasonable, to be honest.

kn33@lemmy.world · 9 months ago

Yeah, it seems really restrained for someone who has to end a project they’ve put so much effort into.

Randomgal@lemmy.ca · 9 months ago

NGL sounds like a butthurt dude. Emotional arguments without logic.

Croquette@sh.itjust.works · 9 months ago

I’d be fucking butthurt as well if my pet project was being destroyed by mega corpos for a shitty generative thief AI.

JaggedRobotPubes@lemmy.world · 9 months ago

This does not say wonders about reading comprehension.

SirQuackTheDuck@lemmy.world · edit-2 9 months ago

Imagine being an author whose sole income is writing books.

Here comes an AI that ~~stole~~ indexed your work and is asked by a customer of OpenAI to summarise your books. It does so perfectly and the issuer is able to use your results freely, since they think it’s AI generated and doesn’t require attribution.

You receive nothing in return.

Good luck making a living.

Edit: stole to indexed, added edit note