• bobburger@fedia.io
      link
      fedilink
      arrow-up
      67
      ·
      8 months ago

      To be fair it’s a pretty terrible dataset. The AI is just going to say “this” to every question you ask

    • Ebby@lemmy.ssba.com
      link
      fedilink
      arrow-up
      15
      ·
      8 months ago

      Perhaps, but not worth buying if you can’t make profit or keep it from your competition.

      60M is for over almost 20 years of data, but once it’s ingested, google will only want new content. Next year, it’ll be more like 3M if the dataset isn’t poisoned by bots or the AI fad hasn’t collapsed. Reddit will struggle with finances again and users will suffer. At least that’s my prediction.

        • Ebby@lemmy.ssba.com
          link
          fedilink
          arrow-up
          6
          ·
          8 months ago

          Haha! Wow I guess so. I’ll keep some shelf space available in the geezer museum next to 3D TV’s, deep fakes, fidget spinners, and my pogs. :D

        • Barbarian@sh.itjust.works
          link
          fedilink
          arrow-up
          6
          ·
          8 months ago

          It currently looks very much like a bubble. After the dot com bubble, the internet didn’t go away, but most companies died off and all the stupid monetisation went bankrupt.

          We may be seeing something similar

    • qjkxbmwvz@startrek.website
      link
      fedilink
      arrow-up
      12
      ·
      8 months ago

      I wonder if Google’s unlimited legal budget plays a role. Not a lawyer, so probably way off here…

      But, for example, reddit’s success in part depends on Google ingesting their data — reddit shows up in Google searches all the time, which can only happen if Google uses reddit’s content. So reddit telling Google “you can’t use our content” doesn’t work, and they need to say something like, “you can use our content for search results but you can’t consume it as training data.”

      This is a pretty straightforward statement/request/demand, but one could imagine Google lawyers maliciously complying and throwing their hands up dramatically, claiming “well we use some amount of AI in our search results, so if we can’t use your content for AI training then we can’t risk using it for search results.” Which would, I imagine, really, really hurt reddit (no Google results would be catastrophic I suspect).

      So, perhaps the “low” 60M figure is just Google using their leverage.

      Or not. As a random person on the Internet, I can say I’m probably not contributing anything meaningful here…

    • GBU_28@lemm.ee
      link
      fedilink
      English
      arrow-up
      6
      ·
      8 months ago

      How quickly you forget that half of it is just “I also choose this guy’s wife” and “the narwhal bacon’s at midnight”

    • trolololol@lemmy.world
      link
      fedilink
      arrow-up
      2
      arrow-down
      1
      ·
      8 months ago

      Considering it’s all full of Nazis and bots, and if you get to filter all of them out you’re left with reposts and low quality memes followed by comments that represent the hostile side of each of us… I’d say anything over $5 is a good deal for spez.

      Now, I hope Google uses this data exclusively for detecting inappropriate answers. Can you imagine it giving answers based on the endless threads i of " I’m not your mate, bro; I’m not your bro, dude…".