Google says AI systems should be able to mine publishers’ work unless companies opt out, turning copyright law on its head

0x815@feddit.de · 1 year ago

Google says AI systems should be able to mine publishers’ work unless companies opt out, turning copyright law on its head

Phanatik@kbin.social · edit-2 1 year ago

I wasn’t talking about copyright law in regards to the model itself.

I was talking about what is/isn’t grounds for plagiarism. I strongly disagree with the idea that artists and art bots go through the same process. They don’t and it’s reductive to claim otherwise. It negatively impacts the perception of artists’ work to assert that these models can automate a creative process which might not even involve looking at other artists’ work because humans are able to create on their own.

A person who has never looked upon a single painting in their life can still produce a piece but the same cannot be said for an art bot. A model must be trained on work that you want the model to be able to imitate.

This is why ChatGPT required the internet to do what it does (the privacy violation is another big concern there). The model needed vast quantities of information to be sufficiently trained because language is difficult to decipher. Languages evolved by getting in contact with other languages and organically making new words. ChatGPT will never invent a new word because it’s not intelligent, it is merely imitating intelligence.

BlameThePeacock@lemmy.ca · 1 year ago

“A person who has never looked upon a single painting in their life can still produce a piece but the same cannot be said for an art bot. A model must be trained on work that you want the model to be able to imitate.”

No, they really can’t. Go look a 1 year old’s first attempt at “art” because it’s nothing more than random smashing of colour on paper. A computer could easily generate such “work” as well with no training data at all. They’ve seen art at that point, and still can’t replicate it because they need much more training first.

Humans require books (or teachers who read books) to learn how to read and write. That is “vast quantities of information” being consumed to learn how to do it. If you had never seen or heard of a book, you wouldn’t be able to write a novel. It’s also completely ignoring the fact that you had to previously learn the spoken language as well (which is a vast quantity of information that takes a human decades to acquire proficiency in even with daily practice)

Phanatik@kbin.social · 1 year ago

Once again, being reductive about artists’ work. Jackson Pollock’s entire career was smashing colours on a canvas. If you want to argue that Pollock had to look at thousands of paintings before making his, I honestly can’t take you seriously at that point.

A computer could easily generate such “work” as well with no training data at all.

Yes and in the eyes of its creators, that was deemed a failure which is why Midjourney and Dall-E are the way they are. These bots don’t want to create art, they want to imitate it.

Children have barely any experiences and can still create something. You might not deem it worthy of calling it art but they created something despite their limited knowledge and life experience.

Of course, you’d need books to read and write. The words have to be written and you need to see the words in written form if you also want to write them. But one thing you don’t take into account is handwriting. Another thing that is unique to every individual. Some have worse handwriting than others and with practice (like any muscle) it can be improved but you haven’t had to have seen handwritten text before writing it yourself. You only need to be taught how to hold a pen and you can write.

Novels are complex structures of language just like poetry. In order to write novels, you have to consume novels because it’s well understood that to find your own narrative voice you must see how others express theirs. Stories are told in unique ways and it’s crucial as a writer to understand and break these concepts down. Intention and purpose form a core part of storytelling and an LLM cannot and will not be able to express those things.

They’re written in certain ways because the author intended them to be that way, such as Cormac McCarthy deciding to be very minimalist with his punctuation.
I would love to see you make a point that an LLM without being specifically prompted to do so would make that stylistic decision. An LLM can’t make that decision because unless you specify a style it is aware of, it won’t organically do it.

I am also a writer. I’ve written a short story. One of my stylistic choices is that I don’t use dialogue tags like “said”. An LLM won’t make that choice because it isn’t designed to do so, it won’t decide to minimise its use of dialogue tags to improve the flow of the narrative unless you told it to.

It’s also completely ignoring the fact that you had to previously learn the spoken language as well (which is a vast quantity of information that takes a human decades to acquire proficiency in even with daily practice).

Yes, in order to learn a spoken language you have to have heard it. However, languages evolve over time. You develop regional accents and dialects. All of the UK speaks English but no two towns speak the same way.

BlameThePeacock@lemmy.ca · 1 year ago

Jackson Pollock didn’t create paintings, Jackson Pollock’s art was story telling and showmanship.

Yes, in order to learn a spoken language you have to have heard it. However, languages evolve over time. You develop regional accents and dialects. All of the UK speaks English but no two towns speak the same way.

Just like different models have their own patterns of writing…

You’re thinking about LLMs like they’re equivalent to multiple people(or groups of people) but each LLM is equivalent to a single person. The training and resulting function of each one is as distinct as an individual human.

I could raise one of my children to perform the exact same functions as an LLM or art creation tool. Give them exactly the same image/text sets that these models are trained on, and have them practice for a decade or two. Then I could tell them “Hey I need a picture of an orange rabbit riding a bike” and they could draw me one, or write a story about the same topic. There’s clearly no copyright infringement in that process, so why would it be different for creating a machine to do the same thing?

Phanatik@kbin.social · 1 year ago

An LLM or art creation tool is barely equatable to one person. The difference between a child and an art creation tool is that you could show a child a single picture of a bunny, a bike and a carrot then ask them to draw an orange bunny riding a bike and they could draw something resembling that. An art bot would require hundreds to thousands of images of each object to understand what it is before it can even make a reasonable attempt. It’s not even comparable the level of training required.

At least the child’s drawing will have some personality in it, every output from an art bot ends up looking soulless. The reason for that is the simple concept that an art bot only imitates what it’s been trained on and an artist draws on inspiration before applying the two things an art bot will never have; intent or purpose.

BlameThePeacock@lemmy.ca · 1 year ago

You’re missing the training even a child has received to reach the state where they could do that. If you raised a child to 5 years completely by themselves in an empty room they wouldn’t be able to draw anything at all, let alone something based on pictures. The act of drawing a variation on a bunny from a picture requires they learn and practice fine motor skills, and it requires them to have an understanding of animals.

Humans get literally 150,000+ hours of training time before we even let them try to become an adult.

Phanatik@kbin.social · 1 year ago

Sure but the training isn’t an algorithm deciding probabilities. Children do not 100% express themselves based on environment. On one side you have nature and the other you have nurture.

An example:
The FBI’s studies into serial killers uncovered that these people, even though have been influenced by their environment to become what they are, respond to external stimuli in an abnormal way which is what leads them down that path to begin with.

A child learns how language and creativity is expressed before attempting to express themselves. These bots aren’t built to deal with this expression because at their core, they are statistical models. It looks at a sentence like a series of variables to determine what comes next. The sentence itself could be nonsensical but the bot doesn’t know that, it’s using the probabilities it’s been trained on to construct the sentence.

You might say bots have their own way of expressing themselves but I would say that’s something we’re applying to the bot than it is demonstrating itself. I’m sure it’s very cute when it apologises for making a mistake but that apology isn’t sincere, it’s been programmed to respond that way when it thinks you’re pointing out its mistakes. It’s merely imitating a sense of remorse than displaying actual remorse.