OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling’s Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.

  • Blapoo@lemmy.ml
    link
    fedilink
    English
    arrow-up
    78
    arrow-down
    10
    ·
    1 year ago

    We have to distinguish between LLMs

    • Trained on copyrighted material and
    • Outputting copyrighted material

    They are not one and the same

    • Tetsuo@jlai.lu
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      2
      ·
      1 year ago

      Output from an AI has just been recently considered as not copyrightable.

      I think it stemmed from the actors strikes recently.

      It was stated that only work originating from a human can be copyrighted.

      • Anders429@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Output from an AI has just been recently considered as not copyrightable.

        Where can I read more about this? I’ve seen it mentioned a few times, but never with any links.

    • TwilightVulpine@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 year ago

      Should we distinguish it though? Why shouldn’t (and didn’t) artists have a say if their art is used to train LLMs? Just like publicly displayed art doesn’t provide a permission to copy it and use it in other unspecified purposes, it would be reasonable that the same would apply to AI training.

      • Blapoo@lemmy.ml
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Ah, but that’s the thing. Training isn’t copying. It’s pattern recognition. If you train a model “The dog says woof” and then ask a model “What does the dog say”, it’s not guaranteed to say “woof”.

        Similarly, just because a model was trained on Harry Potter, all that means is it has a good corpus of how the sentences in that book go.

        Thus the distinction. Can I train on a comment section discussing the book?