

Among the tested models, GPT-4 Turbo ranked highest with 46% accuracy, while Llama-3.1-8B scored the lowest at 33.6%.
“The main takeaway from this study is that LLMs, while impressive, still lack the depth of understanding required for advanced history,” said del Rio-Chanona. “They’re great for basic facts, but when it comes to more nuanced, PhD-level historical inquiry, they’re not yet up to the task.”
I’m sorry, you fucking what? How about you test the world’s population in PhD level history and see if you get a 46%? Are you fucking kidding me? You’re telling me this machine is half accurate on PhD history and you’re tryna act like that doesn’t just make your entire history department fucking useless? At most, you have 5 years until it’s better at the job than actual humans trained for it, because it’s already better than the public at large.
I’mma be honest, English has no business making fun of any other language. English is not a language, it’s three languages standing on eachother’s shoulders in a trenchcoat.