Google blasted for AI that refuses to say how many Jews were killed by the Nazis

VirtualOdour@sh.itjust.works · 8 months ago

Google blasted for AI that refuses to say how many Jews were killed by the Nazis

kromem@lemmy.world · 8 months ago

Given you’re one of the more rational commenters on Lemmy I’ve seen, you might be interested in why this is such an issue.

Large language models are stochastic, where their output can vary randomly, but only for equally probable things to say. Like if you say “where are we going to go on this sunny day” it might answer “the beach” one time and “a park” another.

But when things are not equally probable in the training data, because they have no memory between invocations, they end up collapsing on the most likely answer - this is after all what they were trained to predict.

For example, if you ask Google’s LLM to give you a random number between one and ten, you’ll get the number seven every single time. This is because humans are more biased to the number 7 (followed by 3) over numbers like 4, and that pattern is picked up by the model, which doesn’t have a memory between invocations so it goes with the most represented option and doesn’t vary it at all over the initial requests (it will vary when there’s a chat history though).

So what happens when you ask for a description of a doctor? By default, you get a white male every single time. This wouldn’t be an issue if it varied biased probabilities in the training data stochastically, but it can’t do this for demographics any better than it can for numbers between one and ten.

Obviously an intervention is needed, and various teams are all working on ways to do that. Google initially gave instructions to specifically add diversity to every prompt showing people, which was kind of like using a buzzsaw where a scalpel was needed. It will get better over time, but there’s going to be edge cases that need addressing along the way.

In terms of the Holocaust query, that topic is often adjacent to conspiratorial denialism which is connected to a host of other opinions no one (other than Gab) wants in a LLM or voice assistant, so here too we’re almost certainly looking at overly broad attempts to silence neo-Nazi denialism propaganda and not some sort of intended censorship of the actual history.

winterayars@sh.itjust.works · 8 months ago

we’re almost certainly looking at overly broad attempts to silence neo-Nazi denialism propaganda and not some sort of intended censorship of the actual history.

And that’s probably what the NY Post is actually upset about.

iarigby@lemmy.world · 8 months ago

terrific explanation, thank you

SkyezOpen@lemmy.world · 8 months ago

Any idea why they don’t just apply LLMs to natural language processing? “Turn the living room lights off and bedroom lights on” should be pretty simple to parse, yet my assistant has a breakdown any time I do anything more than one command at a time.

kromem@lemmy.world · 8 months ago

It’s expensive and slow. Especially to do well and to connect to 3rd party system calls like “turn_off_lights([“living room”])”.