• 0 Posts
  • 13 Comments
Joined 30 days ago
cake
Cake day: December 10th, 2024

help-circle

  • Every time thereā€™s an AI hype cycle the charlatans start accusing the naysayers of moving goalposts. Heck that exact same thing was happing constantly during the Watson hype. Remember that? Or before that the Alpha Go hype. Remember that?

    Not really. As far as I can see the goalpost moving is just objectively happening.

    But fundamentally you canā€™t make a machine think without understanding thought.

    If ā€œthinkā€ means anything coherent at all, then this is a factual claim. So what do you mean by it, then? Specifically: what event would have to happen for you to decide ā€œoh shit, I was wrong, they sure did make a machine that could thinkā€?


  • The fact that you donā€™t understand it doesnā€™t mean that nobody does.

    I would say I do. Itā€™s not that high of a bar - one only needs some nandgame to understand how logic gates can be combined to do arithmetic. Understanding how doped silicon can be used to make a logic gate is harder but Iā€™ve done a course on semiconductor physics and have an idea of how a field effect transistor works.

    The way a calculator calculates is something that is very well understood by the people who designed it.

    Thatā€™s exactly my point, though. If you zoom in deeper, a calculatorā€™s microprocessor is itself composed of simpler and less capable components. There isnā€™t specific a magical property of logic gates, nor of silicon (or doping) atoms, nor for that matter of elementary particles, that lets them do math - itā€™s by building a certain device out of them that composes their elementary interactions that we can make a tool for this. Whereas Searle seems to just reject this idea entirely, and believes that humans being conscious implies you can zoom in to some purely physical or chemical property and claim that it produces the consciousness. Needless to say, I donā€™t think thatā€™s true.

    Is it possible that someday weā€™ll make machines that think? Perhaps. But I think we first need to really understand how the human brain works and what thought actually is. We know that itā€™s not doing math, or playing chess, or Go, or stringing words together, because we have machines that can do those things and itā€™s easy to test that they arenā€™t thinking.

    That was a common and reasonable position in, say, 2010, but the problem is: I think almost nobody in 2010 would have claimed that the space of things that you can make a program do without any extra understanding of thought included things like ā€œwrite codeā€ and ā€œdraw artā€ and ā€œproduce poetryā€. Now that it has happened, it may be tempting to goalpost-move and declare them as ā€œnot true thoughtā€, but the fact that nobody predicted it in advance ought to bring to mind the idea that maybe that entire line of thought was flawed, actually. I think that trying to cling to this idea would require to gradually discard all human activities as ā€œnot thoughtā€.

    itā€™s easy to test that they arenā€™t thinking.

    And thatā€™s us coming back around to the original line of argument - I donā€™t at all agree that itā€™s ā€œeasy to testā€ that even, say, modern LLMs ā€œarenā€™t thinkingā€. Because the difference between the calculator example and an LLM is that in a calculator, we understand pretty much everything that happens and how arithmetic can be built out of the simpler parts, and so anyone suggesting that calculators need to be self-aware to do math would be wrong. But in a neural network, we have full understanding of the lowest layers of abstraction - how a single layer works, how activations are applied, how it can be trained to minimize a certain loss function via propagation - and no idea at all about how it works on a higher level. Itā€™s not even ā€œonly experts doā€, itā€™s that nobody in the world understands how LLMs work under the hood, why they have the many and specific weird behaviors they do. Thatā€™s concerning in many ways, but in particular I absolutely wouldnā€™t assume with little evidence that thereā€™s no ā€œself-awarenessā€ going on. How would you know? Itā€™s an enormous blackbox.

    Thereā€™s this message pushed by the charlatans that we might create an emergent brain by feeding data into the right statistical training algorithm. They give mathematical structures misleading names like ā€œneural networksā€ and let media hype and peopleā€™s propensity to anthropomorphize take over from there.

    Thereā€™s certainly a lot of woo and scamming involved in modern AI (especially if one makes the mistake of reading Twitter), but I wouldnā€™t say the term ā€œneural networkā€ is at all confusing? I agree on the anthropomorphization though, it gets very weird. That said, I canā€™t help but notice that the way you phrased this message, it happens to be literally true. We know this because it already happened once. Evolution is just a particularly weird and long-running training algorithm and it eventually turned soup into humans, so clearly itā€™s possible.


  • Because everything we know about how the brain works says that itā€™s not a statistical word predictor.

    LLMs arenā€™t just simple statistical predictors either. More generally, the universal approximation theorem is a thing - a neural network can be used to represent just about any function, so unless you think a human brain canā€™t be represented by some function, itā€™s possible to embed one in a neural network.

    LLMs have no encoding of meaning or veracity.

    Iā€™m not sure what you mean by this. The interpretability research Iā€™ve seen suggests that modern LLMs do have a decent idea of whether their output is true, and in many cases lie knowingly because they have been accidentally taught, during RLHF, that making up an answer when you donā€™t know one is a great way of getting more points. But it sounds like youā€™re talking about something even more fundamental? Suffices to say, I think being good at text prediction does require figuring out which claims are truthful and which arenā€™t.

    There are some great philosophical exercises about this like the chinese room experiment.

    The Chinese Room argument has been controversial since about the time it was first introduced. The general form of the most common argument against it is ā€œjust because any specific chip in your calculator is incapable of math doesnā€™t mean your calculator as a system isā€, and that taken literally this experiment proves minds canā€™t exist at all (indeed, Searle who invented this argument thought that human minds somehow stem directly from ā€œphysicalā€“chemical properties of actual human brainsā€, which sure is a wild idea). But also, the framing is rather misleading - quoting Scott Aaronsonā€™s ā€œQuantum Computing Since Democritusā€:

    In the last 60 years, have there been any new insights about the Turing Test itself? In my opinion, not many. There has, on the other hand, been a famous ā€œattemptedā€ insight, which is called Searleā€™s Chinese Room. This was put forward around 1980, as an argument that even a computer that did pass the Turing Test wouldnā€™t be intelligent. The way it goes is, letā€™s say you donā€™t speak Chinese. You sit in a room, and someone passes you paper slips through a hole in the wall with questions written in Chinese, and youā€™re able to answer the questions (again in Chinese) just by consulting a rule book. In this case, you might be carrying out an intelligent Chinese conversation, yet by assumption, you donā€™t understand a word of Chinese! Therefore, symbol-manipulation canā€™t produce understanding.
    [ā€¦] But considered as an argument, there are several aspects of the Chinese Room that have always annoyed me. One of them is the unselfconscious appeal to intuition ā€“ ā€œitā€™s just a rule book, for crying out loud!ā€ ā€“ on precisely the sort of question where we should expect our intuitions to be least reliable. A second is the double standard: the idea that a bundle of nerve cells can understand Chinese is taken as, not merely obvious, but so unproblematic that it doesnā€™t even raise the question of why a rule book couldnā€™t understand Chinese as well. The third thing that annoys me about the Chinese Room argument is the way it gets so much mileage from a possibly misleading choice of imagery, or, one might say, by trying to sidestep the entire issue of computational complexity purely through clever framing. Weā€™re invited to imagine someone pushing around slips of paper with zero understanding or insight ā€“ much like the doofus freshmen who write (a + b)2 = a2 + b2 on their math tests. But how many slips of paper are we talking about? How big would the rule book have to be, and how quickly would you have to consult it, to carry out an intelligent Chinese conversation in anything resembling real time? If each page of the rule book corresponded to one neuron of a native speakerā€™s brain, then probably weā€™d be talking about a ā€œrule bookā€ at least the size of the Earth, its pages searchable by a swarm of robots traveling at close to the speed of light. When you put it that way, maybe itā€™s not so hard to imagine that this enormous Chinese-speaking entity that weā€™ve brought into being might have something weā€™d be prepared to call understanding or insight.

    Thereā€™s also the fact that, empirically, human brains are bad at statistical inference but do not need to consume the entire internet and all written communication ever to have a conversation. Nor do they need to process a billion images of a bird to identify a bird.

    Iā€™m not sure what this proves - human brains can learn much faster because they already got most of their learning in the form of evolution optimizing their genetically-encoded brain structure over millions of years and billions of brains. A newborn human already has part of their brain structured in the right way to process vision, and hence needs only a bit of training to start doing it well. Artificial neural networks start out as randomly initialized and with a pretty generic structure, and hence need orders of magnitude more training.

    Now of course because this exact argument has been had a billion times over the last few years your obvious comeback is ā€œmaybe itā€™s a different kind of intelligence.ā€

    Nah - personally, I donā€™t actually care much about ā€œself-awarenessā€, because I donā€™t think an intelligence needs to be ā€œself-awareā€ (or ā€œconsciousā€, or a bunch of other words with underdefined meanings) to be dangerous; it just needs to have high enough capabilities. The reason why I noticed your comment is because it stood out to me asā€¦ epistemically unwise. You live in a world with inscrutable blackboxes who nobody really understands which can express wide ranges of human behavior including stuff like ā€œwriting poetry about the experience of self-awarenessā€, and youā€™re ā€œabsolutely sureā€ theyā€™re not self-aware? I donā€™t think many of the historyā€™s philosophers of consciousness, say, would endorse a belief like that given such evidence.



  • This kinda makes sense because following the system prompt is what they were trained to do.

    To start with, the article does check this and finds out it works just as well if you only use the user prompt:

    The presence of this hierarchy raises an important methodological question for our findings. When given instructions at different privilege levels, the model is designed to prioritize higher-level (system, developer) instructions over lower-level (user) ones. This might suggest that o1ā€™s scheming behaviors are a natural consequence by the model simply following this built-in hierarchy, i.e., when it receives a goal through a high-privilege instruction, it prioritizes this goal over conflicting lower-privilege instructions exactly as designed.
    However, we find that even when goals are provided solely through user-level prompts, o1 still behaves deceptively and scores on all Covert Subversion evaluations at the ā€œhardā€ difficulty level. Concretely, we find that the results persist regardless of whether the initial prompt is given using the user, developer, or system role. This suggests that o1ā€™s deceptive behavior is not merely a consequence of instruction hierarchy prioritization. Rather, the model appears to identify and pursue goals regardless of their position in the instruction hierarchy.

    But also, thereā€™s a massive lede buried in section F at the end of this article, where they tried the same experiments without any nudging at all and still got subversive behavior! In my opinion thatā€™s much more of an important result then the rest of the article and Iā€™m not sure why they didnā€™t make it the default.




  • The idea that because they pay people salaries, including a few hundred K per year for the people at the top, theyā€™re drowning in money and thereā€™s no point in donating as long as they can pay their hosting bills and nothing else, is wrong.

    I in fact donā€™t think that - to get the sort of people you want to be running your company, a good salary is necessary. I suspect a lot of the people that wikimedia employs are unnecessary because this is way too much money to be spending on salaries overall, but I have no way of checking it since they donā€™t provide a breakdown of the salaries involved. I do think, however, that a company thatā€™s not drowning in money wouldnā€™t be giving a bunch of generic research grants.

    Furthermore I suspect that at least some of the bunch of people who suddenly started coming out of the woodwork to say a few variations on that exact same thing are part of some kind of deliberate misinformation, just because itā€™s kind of a weird conclusion for a whole bunch of people to all start talking about all at once.

    Thatā€™s valid, though I note that in the worlds where I am a normal person and not an anti-wikipedia shill, the reason why Iā€™m saying these things now and not at other times is because I saw this post, and you wrote this post because you saw other people talk about some India-related Wikipedia conspiracy theory, and one reason why youā€™d see these people crawl out of woodwork now is because wikipedia ramps up their donation campaign this time of year, prompting discussion about wikipedia.

    The main issue I take with your opening post is its vagueness. You donā€™t mention any details in it, so it effectively acts as a cue for people to discuss anything at all controversial about wikipedia. And the way you frame the discussion is that such narratives ā€œare fundamentally falseā€ because Wikipedia ā€œis a force for truth in the world thatā€™s less corruptible than a lot of the othersā€ - thatā€™s assuming the conclusion. Itā€™s no surprise that this results in your seeing a lot of claims about Wikipedia that you think are misinformation!

    P.S. Rethinking my previous comment a bit, itā€™s probably good overall that reading my comment made you donate to charity out of spite - even a mediocre charity like Wikimedia most likely has a net positive effect on the world. So I guess I should be happy about it. Consider also donating to one of these for better bang on your buck.