• NevermindNoMind@lemmy.world
    link
    fedilink
    arrow-up
    40
    ·
    10 months ago

    For those who haven’t read the article, this is not about hallucinations, this is about how AI can be used maliciously. Researchers used GPT-4 to create a fake data set from a fake human trial, and the result was convincing. Only advanced techniques were able to show that the data was faked, like too many patient ages ending with 7 or 8 than would be likely in a real sample. The article points out that most peer review does not go that deep into the data to try to spot fakes. The issue here is that a malicious researcher could use AI to generate fake data supporting whatever theory they want and theoretically get published in peer reviewed journal.

    I don’t have the expertise to assess how much of a problem this is. If someone was that determined, couldn’t they already fake data by hand? Does this just make it easier to do, or is AI better at it thereby increasing the risk? I don’t know, but it’s an interesting data point as we as a society think about what AI is capable of and how it could be used maliciously.

  • BT_7274@lemmy.world
    link
    fedilink
    arrow-up
    38
    arrow-down
    2
    ·
    10 months ago

    How many times do we have to play this game before people realize it’s not a researcher, lawyer, doctor, or anything that has to rely on facts and established, valid data?

    It’s a next-word generator that’s remarkably good at sounding human. Yes, this can often lead to accurate sounding information, but it doesn’t actually “know” anything. Not in any sense that could be relied on.

  • ArugulaZ@kbin.social
    link
    fedilink
    arrow-up
    8
    ·
    10 months ago

    Wow, it’s getting more and more human every day in its thought process! I wonder when it’ll start coming up with its own conspiracy theories?

  • andrew_bidlaw@sh.itjust.works
    link
    fedilink
    arrow-up
    4
    ·
    10 months ago

    Wilkinson … has examined several data sets generated by earlier versions of the large language model, which he says lacked convincing elements when scrutinized, because they struggled to capture realistic relationships between variables.

    This revealed a mismatch in many ‘participants’ between designated sex and the sex that would typically be expected from their name. Furthermore, no correlation was found between preoperative and postoperative measures of vision capacity and the eye-imaging test. Wilkinson and Lu also inspected the distribution of numbers in some of the columns in the data set to check for non-random patterns. The eye-imaging values passed this test, but some of the participants’ age values clustered in a way that would be extremely unusual in a genuine data set: there was a disproportionate number of participants whose age values ended with 7 or 8.

    It’s 2 am and the homework is due this morning-energy. It seems they were careless and probably thought their data wouldn’t be studied at all. Relationships between columns is where forgeries like these would always suffer. It takes a good amount of understanding to make one, and LLMs lack it unless explicitly guided by a human to take them into account. Otherwise they would find their own, where post-op condition may depend on patient’s last name and 8’s and 9’s are the most popular age’s second digit to choose.