It’s kind of crazy to see it all on there for individual download. It should only be available in bulk imo, to act more as an archive and not a pirate site.
It’s kind of crazy to see it all on there for individual download. It should only be available in bulk imo, to act more as an archive and not a pirate site.
Ignoring the fact that training an AI is insanely transformative and definitely fair use, people would not get any kind of pay. The data is owned by websites and corporations.
If AI training was to be highly restricted, Microsoft and google would just pay each other for the data and pay the few websites they don’t own (stack, GitHub, Reddit, Shutterstock, etc), a bit of money would go to publishing houses and record companies, not enough for the actual artist to get anything over a few dollars.
And they would happily do it, since they would be the only players in the game and could easily overcharge for a product that is eventually going to replace 30% of our workforce.
Your emotional short sighted response kills all open source and literally gives our economy to Google and Microsoft. They become the sole owners of AI tech. Don’t be stupid, please. They want you to be mad, it literally only helps them.
The purchased service is internet. I should be able to use it how I want, including supplying it to other devices through my phone. This is the equivalent of Netflix not letting us cast onto tvs.
Not sure what you are defending here, this is clearly unethical and gross corporate behavior.
Most of the data is scraped, it’s not up to the website. You can’t give a list of citation since it isn’t a search engine, it doesn’t know where the information comes from and it’s highly transformative, it melds information from hundreds if not thousand of different sources.
If it worked only with volunteer work, there would simply be not enough data.
Any law restricting data use in AI is only going to benefit corporations, there isn’t a solution for individual content creators. You can’t pay them for the drop in the bucket they add, thee logistics are insane. You can let them opt out, but then you need to do the same for whole websites which leads to a corporate hellscape where three companies own our whole economy since they are the only ones who can train ais.
Models need vast amounts of data. Paying individual users isnt feasible, and like you said most of it can be scraped.
The only way I see this working is if scraped content is a no go and then you pay the website, publishing house, record company, etc which kills any open source solution and doesn’t really help any of the users or creators that much. It also paves the way for certain companies owning a lot of our economy as we move towards an AI driven society.
It’s definitely a hot mess but the way I see it, the more restrictive we are with it, the more gross monopolies we create for no real gains.
It’s because certain companies are stirring the pot and manipulating. They want people mad so they can put restrictions on training AI, to stifle the open source scene.
To avoid being sued? The internet archive shouldn’t be acting like a new age limewire. I hate record companies as much as the next guy but I use torrents and youtube-dl. No need for the internet archive to be offering the service at such risk.
They hold a lot of important stuff, I just don’t want open season to be declared on suing them. Pick your battles kind of moment.