Why Mark Zuckerberg wants to redefine open source so badly

fossilesque@mander.xyz · 3 days ago

Why Mark Zuckerberg wants to redefine open source so badly

philpo@feddit.org · 1 day ago

A friend of mine worked on the team that wrote the EU AI legislation. He is a fucking genius and so are his colleagues. There is little chance he can simply “change the definition of open source”. He might be able to challenge the EU definition in court and postpone paying,but be will pay.

The brussels bureaucracy is a absolutely fed up with US tech bro antics by now and both Microsoft and Google have already learned their lesson. Zuckerbergs Meta still tries to resist,but he will fall as well.

Funnily this is absolutely speed up by their antics in the US as this leads to more and more lawmakers here realising that the European societies need to be protected from them the same way it needs to be protected from China.

rageagainstmachines@lemmy.world · 2 days ago

Fuck off, Fuckerberg.

Queen HawlSera@lemm.ee · 2 days ago

Money

cyd@lemmy.world · edit-2 2 days ago

Aww come on. There’s plenty to be mad at Zuckerberg about, but releasing Llama under a semi-permissive license was a massive gift to the world. It gave independent researchers access to a working LLM for the first time. For example, Deepseek got their start messing around with Llama derivatives back in the day (though, to be clear, their MIT-licensed V3 and R1 models are not Llama derivatives).

As for open training data, its a good ideal but I don’t think it’s a realistic possibility for any organization that wants to build a workable LLM. These things use trillions of documents in training, and no matter how hard you try to clean the data, there’s definitely going to be something lawyers can find to sue you over. No organization is going to open themselves up to the liability. And if you gimp your data set, you get a dumb AI that nobody wants to use.

Ulrich@feddit.org · edit-2 3 days ago

Money? Is it money?

clicks article

For Meta, it’s all about the money.

Shocking.

LillyPip@lemmy.ca · edit-2 3 days ago

I taught myself programming in the 80s, then worked myself from waitress and line cook to programmer, UXD, and design lead to the point of being in the running for an Apple design award in the 2010s.

But I cared more than anything about making things FOR people. Making like easier. Making people happy. Making software that was a joy to use.

Then I got sick with something that’s neither curable nor easily manageable.

Now I’m destitute and have to choose between medicine and food, and I’m staring down homelessness. (eta I was homeless from age 16-18, and I won’t do that again now, with autoimmune dysautonomia and in my mid-50s, even if the alternative is final.)

Fuck these idiots who bought their way into nerd status (like Musk) or had one hot idea that took off and didn’t have to do anything after (this fucking guy). Hundreds or thousands of designers and programmers made these companies, and were tossed out like trash so a couple of people can be rock stars, making more per hour than most of us will see in a lifetime.

Slay the dragons.

chuckleslord@lemmy.world · 3 days ago

I mean, didn’t he famously steal the idea?

futatorius@lemm.ee · edit-2 3 days ago

His “idea” was about how to monetize a concept already in existence on MySpace, facilitated by completely ignoring any ethical constraints. That, and a snobbery-based product launch through the Ivies.

LillyPip@lemmy.ca · 3 days ago

You’re right. I forgot about the lawsuit and settlement (for $65m). They’re both frauds.

Strider@lemmy.world · 2 days ago

I’m sorry you had to go through this and are suffering. There are people that can (literally) feel your pain, I hope that can give some comfort.

I’m lucky to be in Europe, otherwise I would (very likely) be dead and broke if not.

horse_battery_staple@lemmy.world · 3 days ago

We’re trying! You didn’t know Karla when you were there did you? She had the best stories about Spain.

LillyPip@lemmy.ca · 3 days ago

I knew a Karla, but she was from Romania. Fantastic person. I miss her.

manucode@infosec.pub · 3 days ago

For Meta, it’s all about the money.

And avoiding regulation

NOT_RICK@lemmy.world · 3 days ago

Well yeah, because following regulations has an impact on the bottom line.

Curious Canid@lemmy.ca · 3 days ago

Well, they have almost always circumvented them instead, but that impacts the bottom line too.

grrgyle@slrpnk.net · 2 days ago

But at least that way they get to power trip

NOT_RICK@lemmy.world · 3 days ago

Yup, lawyers are expensive

Exec@pawb.social · 3 days ago

One is in direct relation with the other

don@lemm.ee · 3 days ago

The time it took me to reach this conclusion, after seeing the headline, is measured in quectoseconds.

Baron Von J@lemmy.world · 3 days ago

That’s alotl seconds!

don@lemm.ee · 3 days ago

Several thousand is a lot, sure.

Phoenixz@lemmy.ca · 2 days ago

Looking at any picture of mark suckerberg makes you believe that they are very much ahead with AI and robotics.

Either way, fuck Facebook, stop trying to ruin everything good in the world.

latenightnoir@lemmy.world · 3 days ago

Because he’s an insecure and greedy child.

futatorius@lemm.ee · 3 days ago

He’s also a sociopath who will say and do anything to get his way.

latenightnoir@lemmy.world · 2 days ago

You’re right, he’s a very complex asshole, indeed!

3aqn5k6ryk@lemmy.world · 3 days ago

I dont give a fuck what you want mark. nobody is. what i want is for you to fuck off.

boaratio@lemmy.world · 2 days ago

Because he’s a massive douche?

Itdidnttrickledown@lemmy.world · 2 days ago

Is it for control, money? Of course it is.

SplashJackson@lemmy.ca · 2 days ago

I don’t get it. What would they redefine it to?

Evotech@lemmy.world · 2 days ago

Ask “OpenAI”

ripcord@lemmy.world · 2 days ago

Did you read the article?

fuzzy_feeling@programming.dev · 3 days ago

Meta’s Llama models also impose licensing restrictions on its users. For example, if you have an extremely successful AI program that uses Llama code, you’ll have to pay Meta to use it. That’s not open source. Period.

open source != no license restrictions

According to Meta, “Existing open source definitions for software do not encompass the complexities of today’s rapidly advancing AI models. We are committed to keep working with the industry on new definitions to serve everyone safely and responsibly within the AI community.”

i think, he’s got a point, tho

is ai open source, when the trainig data isn’t?
as i understand, right now: yes, it’s enough, that the code is open source. and i think that’s a big problem

i’m not deep into ai, so correct me if i’m wrong.

airglow@lemmy.world · edit-2 2 days ago

Software licenses that “discriminate against any person or group of persons” or “restrict anyone from making use of the program in a specific field of endeavor” are not open source. Llama’s license doesn’t just restrict Llama from being used by companies with “700 million monthly active users”, it also restricts Llama from being used to “create, train, fine tune, or otherwise improve an AI model” or being used for military purposes (although Meta made an exception for the US military). Therefore, Llama is not open source.

Syntha@sh.itjust.works · 2 days ago

The license shall not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources

So as I understand it, under the OSI definition of the word, anything distributed under a copyleft licence would not be open source.

So all software with GNU GPL, for example.

airglow@lemmy.world · edit-2 2 days ago

That’s incorrect. GPL licenses are open source.

The GPL does not restrict anyone from selling or distributing GPL-licensed software as a component of an aggregate software distribution. For example, all Linux distributions contain GPL-licensed software, as the Linux kernel is GPLv2.

umbraroze@lemmy.world · 2 days ago

Open source software doesn’t, by definition, place restrictions on usage.

The license must not restrict anyone from making use of the program in a specific field of endeavor.

Clauses like “you can use this software freely except in specific circumstances” fly against that. Open source licenses usually have very little to say about what the software should be used for, and usually just as an affirmation that you can use the software for whatever you want.

TimeSquirrel@kbin.melroy.org · 3 days ago

I don’t think any of our classical open licenses from the 80s and 90s were ever created with AI in mind. They are inadequate. An update or new one is needed.

Stallman, spit out the toe cheese and get to work.

GoodEye8@lemm.ee · 3 days ago

I understand the same way and I think there’s a lot of gray area which makes it hard to just say “the data also needs to be open source for the code to be open source”. What would that mean for postgreSQL? Does it magically turn closed source if I don’t share what’s in my db? What would it mean to every open source software that stores and uses that stored data?

I’m not saying the AI models shouldn’t be open source, I’m saying reigning in the models needs to be done very carefully because it’s very easy to overreach and open up a whole other can of worms.

Kilgore Trout@feddit.it · 2 days ago

PostgreSQL is not built on top of the data you host in your db. It’s not a valid comparison.

pastermil@sh.itjust.works · edit-2 3 days ago

How about a no.

Theoriginalthon@lemmy.world · 3 days ago

I think the licence type he is looking for is shareware

Kompressor @lemmy.world · 3 days ago

Desperately trying tap in to the general trust/safety feel that open source software typically has. Trying to muddy the waters because they’ve proven they cannot be trusted whatsoever

kava@lemmy.world · edit-2 3 days ago

when the data used to train the AI is copyrighted, how do you make it open source? it’s a valid question.

one thing is the model or the code that trains the AI. the other thing is the data that produces the weights which determines how the model predicts

of course, the obligatory fuck meta and the zuck and all that but there is a legal conundrum here we need to address that don’t fit into our current IP legal framework

my preferred solution is just to eliminate IP entirely

buddascrayon@lemmy.world · 2 days ago

when the data used to train the AI is copyrighted, how do you make it open source? it’s a valid question.

It is actually possible to reveal the source of training data without showing the data itself. But I think this is a bit deeper since I’ll bet all of my teeth that the training data they’ve used is literally the 20 years of Facebook interactions and entries that they have just chilling on their servers. Literally 3+ billion people’s lives are the training data.

kava@lemmy.world · 1 day ago

Literally 3+ billion people’s lives are the training data.

yep. I never thought about it but you’re absolutely right. that is Facebook’s “competitive advantage” that the other AI companies don’t have.

although that’s part of it. I’m sure they do web scraping, novels, movie transcripts, college textbooks, research papers, newspapers, etc.

jacksilver@lemmy.world · 3 days ago

I mean, you can have open source weights, training data, and code/model architecture. If you’ve done all three it’s an open model, otherwise you state open “component”. Seems pretty straightforward to me.

kava@lemmy.world · 2 days ago

Yes, but that model would never compete with the models that use copyrighted data.

There is a unfathomably large ocean of copyrighted data that goes into the modern LLMs. From scraping the internet to transcripts of movies and TV shows to tens of thousands of novels, etc.

That’s the reason they are useful. If it weren’t for that data, it would be a novelty.

So do we want public access to AI or not? How do we wanna do it? Zuck’s quote from article “our legal framework isn’t equipped for this new generation of AI” I think has truth to it

jacksilver@lemmy.world · 2 days ago

I mean using proprietary data has been an issue with models as long as I’ve worked in the space. It’s always been a mixture of open weights, open data, open architecture.

I admit that it became more obvious when images/videos/audio became more accessible, but from things like facial recognition to pose estimation have all used proprietary datasets to build the models.

So this isn’t a new issue, and from my perspective not an issue at all. We just need to acknowledge that not all elements of a model may be open.

kava@lemmy.world · 2 days ago

So this isn’t a new issue, and from my perspective not an issue at all. We just need to acknowledge that not all elements of a model may be open.

This is more or less what Zuckerberg is asking of the EU. To acknowledge that parts of it cannot be opened. But the fact that the code is opened means it should qualify for certain benefits that open source products would qualify for.

FooBarrington@lemmy.world · 3 days ago

when the data used to train the AI is copyrighted, how do you make it open source?

When part of my code base belongs to someone else, how do I make it open source? By open sourcing the parts that belong to me, while clarifying that it’s only partially open source.

kava@lemmy.world · 2 days ago

This is essentially what Llama does, no? The reason they are attempting a clarification is because they would be subject to different regulations depending on whether or not it’s open source.

If they open source everything they legally can, then do they qualify as “open source” for legal purposes? The difference can be tens of millions if not hundreds of millions of dollars in the EU according to Meta.

So a clarification on this issue, I think, is not asking for so much. Hate Facebook as much as the next guy but this is like 5 minute hate material

FooBarrington@lemmy.world · 2 days ago

If they open source everything they legally can, then do they qualify as “open source” for legal purposes?

No, definitely not! Open source is a binary attribute. If your product is partially open source, it’s not open source, only the parts you open sourced.

So Llama is not open source, even if some parts are.

kava@lemmy.world · 2 days ago

I agree with you. What I’m saying is that perhaps the law can differentiate between “not open source” “partially open source” and “fully open source”

right now it’s just the binary yes/no. which again determines whether or not millions of people would have access to something that could be useful to them

i’m not saying change the definition of open source. i’m saying for legal purposes, in the EU, there should be some clarification in the law. if there is a financial benefit to having an open source product available then there should be something for having a partially open source product available

especially a product that is as open source as it could possible legally be without violating copyright

FooBarrington@lemmy.world · 2 days ago

Open source isn’t defined legally, only through the OSI. The benefit is only from a marketing perspective as far as I’m aware.

Which is also why it’s important that “open source” doesn’t get mixed up with “partially open source”, otherwise companies will get the benefits of “open source” without doing the actual work.

kava@lemmy.world · edit-2 2 days ago

It is defined legally in the EU

https://artificialintelligenceact.eu/

https://artificialintelligenceact.eu/high-level-summary/

There are different requirements if the provider falls under “Free and open licence GPAI model providers”

Which is legally defined in that piece of legislation

otherwise companies will get the benefits of “open source” without doing the actual work.

Meta has done a lot for Open source, to their credit. React Native is my preferred framework for mobile development, for example.

Again- I fully acknowledge they are a large evil megacorp but without evil large megacorps we would not have Open Source as we know it today. There are certain realities we need to accept based on the system we live in. Open Source only exists because corporations benefit off of this shared infrastructure.

Our laws should encourage this type of behavior and not restrict it. By limiting the scope, it gives Meta less incentive to open source the code behind their AI models. We want the opposite. We want to incentivize