Authors using a new tool to search a list of 183,000 books used to train AI are furious to find their works on the list.

  • El Barto@lemmy.world
    link
    fedilink
    English
    arrow-up
    30
    arrow-down
    9
    ·
    1 year ago

    These are machines, though, not human beings.

    I guess I’d have to be an author to find out how I’d feel about it, to be fair.

      • FaceDeer@kbin.social
        link
        fedilink
        arrow-up
        6
        arrow-down
        2
        ·
        1 year ago

        If an AI “reproduces” a work it was trained on it is a failure of an AI. Why would anyone want to spend millions of dollars and devote oodles of computing power to build something that just does what a simple copy/paste operation can accomplish?

        When an AI spits out something that’s too close to one of the original training set that’s called “overfitting” and it is considered an error to be corrected. Most overfitting that’s been detected has been a result of duplication in the training set - when you hammer an AI image generator in training with thousands of copies of the Mona Lisa it eventually goes “alright, I get it already, when you say ‘Mona Lisa’ you want that exact pattern!” And will try its best to replicate that pattern when you ask it to later. That’s why training sets need to be de-duplicated.

        AIs are meant to produce new things.

    • kromem@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      1 year ago

      Did you write a comment on Reddit before 2015? If so, your copyrighted content was used without your permission to train today’s LLMs, so you absolutely get to feel one way or another about it.

      The idea that these authors were somehow the backbone of the models when any individual contribution was like spitting in the ocean and model weights would have considered 100 pages of Twilight fan fiction equivalent to 100 pages from Twilight is honestly one of the negative impacts of the extensive coverage these suits are getting.

      Pretty much everyone who has ever written anything indexed online is a tiny part of today’s LLMs.

      • El Barto@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        1 year ago

        Thank you for your reply.

        On a completely separate note, it’s funny to think that there exists Twilight fan fiction when Twilight itself started as fan fiction work.

        Edit: I dun goofed.

        • kromem@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 year ago

          Pretty sure it’s the other way around.

          Fifty Shades of Gray started out as Twilight fanfiction before becoming its own thing.

          AFAIK Twilight was always just its own pulp fiction.

    • sab@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      2
      ·
      1 year ago

      I don’t think anyone is faulting the machines for this, just the people who instruct the machines to do it.

    • Shurimal@kbin.social
      link
      fedilink
      arrow-up
      15
      arrow-down
      17
      ·
      1 year ago

      These are machines, though, not human beings.

      What’s the difference? On the most fundamental level it’s all the same.

      • brygphilomena@lemmy.world
        link
        fedilink
        English
        arrow-up
        15
        arrow-down
        3
        ·
        1 year ago

        A human, regardless of how many books they read, will have personal experiences that are undeniably unique to themselves. They will interpret the works they read differently from each other based on their worldly experiences. Their writing, no matter how many books they read and get inspired on, will always be influenced by their own personal lives. They can experience love, hate, heartbreak, empathy, sadness, and happiness.

        This is something a LLM does not have, and in my opinion, is a massive distinguishing factor. So on a “fundamental” level, it is not the same. It is no where near the same.

        • originalucifer@moist.catsweat.com
          link
          fedilink
          arrow-up
          2
          arrow-down
          4
          ·
          1 year ago

          do you really think we are that far off… from giving a foundational memory and motivation layers to these LLMs, that could mimic… or even… generate the generic thoughts youre indicating?

          i dont think so. you seem to imply its impossibility, i expect its inevitability. the human brain will not be a black box forever… it still exists in a world of physics we can emulate, even if rudimentary.

      • Wander@kbin.social
        link
        fedilink
        arrow-up
        15
        arrow-down
        5
        ·
        1 year ago

        Unless you think theres no difference between killing a person and closing a program, I think we can agree they should be treated differently in the eyes of the law.

        And so theres a difference between a person reading a book and being inspired by it, and someone writing a program that automatically transforms the book in data that can create new books.

        • jennraeross@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          1
          ·
          1 year ago

          Please do not take this as support of ai use of copyrighted works (I don’t), but as far as I can tell, yes we are machines. This rant is just me being aspie atm, so feel free to ignore it.

          We are thinking machines programmed by our genetics, predispositions, experiences, and circumstances. A 2 part explanation of how humans are merely products of their circumstances was once put forward to me. The first part is that humans can do anything, but only the thing we want to do most.

          For instance, a common rebuttal is that people can choose go to the gym even when they find the experience of exercise undesirable. However, when that happens, it’s merely a case of other wants out balancing the want to not go to the gym, typically they want to be fit.

          We want to not spend money, but we want to not rush going to jail for stealing more, usually. We want to not work overtime, but sometimes we want the extra cash more than that.

          The second part of the argument is that we can’t choose what we want. When someone talks themselves out of the slice of cheesecake, they aren’t changing what they want, they’re resolving said want against the larger want they have to lose weight.

          And if we make decisions by our wants, while said wants are not decided by us, then despite appearances we are little more than complex automata.