• bionicjoey@lemmy.ca
    link
    fedilink
    English
    arrow-up
    69
    ·
    3 months ago

    Makes sense. AAVE is mostly a spoken thing, LLMs are mostly trained on the corpus of written text on the internet and in books. It’s pretty rare for people to write in an AAVE style in those contexts.

    • givesomefucks@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      29
      ·
      3 months ago

      Except it has no difficulty reading and understanding AAVE, because people use it online frequently…

      Like, the article makes that abundantly clear, but everyone commenting just read the headline and assumed what it meant was it couldn’t understand it…

      • bionicjoey@lemmy.ca
        link
        fedilink
        English
        arrow-up
        23
        arrow-down
        1
        ·
        3 months ago

        I never said it can’t understand it. I am agreeing with the notion that it has a bias against using it.

        • givesomefucks@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          22
          ·
          3 months ago

          You said it’s rarely used online, which just isn’t true.

          But like even this:

          I am agreeing with the notion that it has a bias against using it

          I’m not sure if you understand the bias is against users who use AAVE, or if you’re saying a LLM doesn’t want to use AAVE.

          Maybe you did understand everything, and you’re just being vague.

          But almost everything you said could be interpreted multiple ways.

          • sugar_in_your_tea@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            12
            arrow-down
            1
            ·
            3 months ago

            Well, if the training data is largely standard english, AAVE could look like less educated English, because it doesn’t follow the normal rules and conventions. And there’s probably a higher correlation between AAVE use and lower means and/or education because people from the black community who have higher means and/or education probably use standard English more often because that’s how they’re trained.

            So I don’t think this is evidence about the model being “racist” or anything of that nature, it’s just the model doing model things. If you type in AAVE, chances are higher that you fit the given demographic, because that’s likely what the training data shows.

            So, I guess don’t really see the issue here? This just sounds like people thinking the model does more than it does. The model merely matches input text to data in the model. That’s it. There’s no “understanding” here, it’s just matching inputs to outputs.

              • Mac@mander.xyz
                link
                fedilink
                English
                arrow-up
                1
                ·
                3 months ago

                There are times when it’s acceptable and even admirable to be offended on someone else’s behalf.
                I’m not sure this is one of those times.

  • Ghyste@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    29
    arrow-down
    1
    ·
    3 months ago

    They can’t possibly encounter much of it in training material… Of course they’re not going to like it.

    • givesomefucks@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      12
      ·
      3 months ago

      What?

      It trains off social media, and even white kids use AAVE online. And kids make the most social media comments.

      A lot of times when someone posts a text screenshot and everyone talks about how kids talk crazy, it’s just a patois of AAEV mixed in with “regular” English.

      It should be able to “read” it fine.

      The bias part (as clearly stated in the article…) is when you ask a LLM to describe the person who would phrase something in AAVE, and the LLM replies back with stereotypes about Black people.

      So it can read and interpret it fine, it just has a bias against people who talk like that

      • TexMexBazooka@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        3 months ago

        LLM’s don’t have a bias against anyone, it’s literally just data. And those models are by and large fed with traditionally grammatically correct data. They don’t understand dialects, you’re looking soooooo hard for something to be offended over

        • givesomefucks@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 months ago

          If you’re going to revive a 3+ day old thread…

          At least read the article first so you have a clue what other people were talking about

  • Lvxferre@mander.xyz
    link
    fedilink
    English
    arrow-up
    24
    arrow-down
    5
    ·
    edit-2
    3 months ago

    I’m not from USA, black, nor a native English speaker, but due to Linguistics I can give you guys some further info.

    AAE (Afro-American English), in a nutshell, is a group of English varieties used by some speakers from USA and Canada. In a lot of aspects they resemble geographical varieties, like the ones you’d see in plenty other languages, but there’s a key difference: it isn’t used by people “of a certain region”, but rather by people “of a certain race” (black people).

    This is mostly but not completely spoken (cue to the term AAVE - the “V” stands for “vernacular”); it affects also the way that those people use the written language. So often you see AAE features in written English, like:

    • Negative concord - for example, “I don’t want to hear nothing about this shit, man.”
    • Habitual-be - for example, “They be talking about this everyday.”
    • bits of non-standard spelling, due to phonetic differences
    • expressions and vocab typically used primarily by black people

    What the article is saying is that LLMs are biased against those features. It’s a rather strong bias, and not noticed for a geographical variety used as reference (Appalachian English). In other words: the LLM has been fed racist babble, and now it’s regurgitating it.

    • givesomefucks@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      5
      ·
      3 months ago

      Since they’re vernacular you’ll mostly hear them being spoken, they aren’t really written

      AAVE is commonly “written” now because most writing is texts and social media comments. So even if they luck out and learn “proper” English, people still going to type on their phones the same way they talk.

      Even for white kids, most of Gen Z slang is just taken from AAVE, when older people complaining about not being able to read zoomer slang from text or comments, it’s just heavily influenced by AAVE.

      There’s been bleed over for centuries, but with the Internet and social media it’s merging faster, which is common for dialects of people that interact frequently

      • Lvxferre@mander.xyz
        link
        fedilink
        English
        arrow-up
        9
        ·
        3 months ago

        Warning: I’ve edited the comment that you’re replying to. I’m saying this for the sake of transparency, as you’re clearly quoting the earlier version.

        The key here is that AAVE is not written, but AAE is. That “V” is for vernacular, it excludes written English by definition.

        Now, I’m not sure if those white kids are using AAE or simply borrowing things from AAE into their written English. I simply don’t have data on that.

        There’s been bleed over for centuries, but with the Internet and social media it’s merging faster, which is common for dialects of people that interact frequently

        Varieties merging or splitting is rarely the result of just more contact between people; it’s all about identity. If things are happening as you described them, it’s simply that those white kids stopped seeing black people as “the others”, to see them as “part of the same group as us”.

        • givesomefucks@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          5
          ·
          3 months ago

          That “V” is for vernacular, it excludes written English by definition.

          Yeah. But most people “write” online like they speak…

          https://commonwealthtimes.org/2021/02/18/aave-is-not-your-internet-slang-it-is-black-culture/

          If people followed rules about language, yeah, vernacular would just be spoken speech. But that’s not how it works. The rules are made to reflect what people are doing. The rules don’t control what people do.

          So yes, while the word vernacular commonly meant only spoken words, there ain’t nothing stopping nobody from typing like they speak.

          And people been doing it for a long time

          • Lvxferre@mander.xyz
            link
            fedilink
            English
            arrow-up
            12
            ·
            3 months ago

            Yeah. But most people “write” online like they speak…

            That’s a common misconception.

            While your written and spoken varieties do interact a fair bit, no, people don’t “write like they speak”. Not even online.

            And that is not simply an “ackshyually”. A lot of AAVE features simply don’t transpose into writing - like prosody, non-rhoticity, /ɪ/-breaking, /äɪ/-monophtongisation… at most you can consciously approximate them into writing, but they won’t be there.

            If people followed rules about language, yeah, vernacular would just be spoken speech. But that’s not how it works. The rules are made to reflect what people are doing.

            That is not about people following/not following “rules”, it’s about nomenclature - it’s exactly the reason why “AAE” and “AAVE” are necessary as separated terms.

            • treefrog@lemm.ee
              link
              fedilink
              English
              arrow-up
              2
              arrow-down
              4
              ·
              3 months ago

              More and more people are using speech to text. And it does show how differently people speak than write (apparently I never say my be in because, for example).

              But it also means that llms aren’t only being fed text, but also speech converted into text.

              • Lvxferre@mander.xyz
                link
                fedilink
                English
                arrow-up
                4
                ·
                3 months ago

                For me it’s like “holy fuck… do I eat so fucking many vowels???” It reaches a point that I eventually gave up using text-to-speech with Portuguese in my cell phone, I go straight for Italian because at least then it gets me right.

                But it also means that llms aren’t only being fed text, but also speech converted into text.

                That might be part of the issue causing the bias shown in the article.

            • givesomefucks@lemmy.world
              link
              fedilink
              English
              arrow-up
              4
              arrow-down
              9
              ·
              3 months ago

              at most you can consciously approximate them into writing, but they won’t be there.

              A lot of the difficulty older white people have with it, is it’s spelled phonetically to maintain those things.

              I gave you a link, lots of people have talked about this, it’s not just some idea I came up with.

              You’re still talking like language has to follow the rules.

              That’s backwards. The rules change to follow the language

              Ain’t you old enough to have heard “ain’t ain’t a word because it ain’t in the dictionary”?

              Well, now it is.

              And now the dictionary lists “figuratively” as one of the definitions for “literally”.

              Insist on following rules, and the dictionary wouldn’t update.

              I don’t know how to put it anymore plainly, I’m sorry if you still don’t understand

              • Lvxferre@mander.xyz
                link
                fedilink
                English
                arrow-up
                8
                arrow-down
                1
                ·
                3 months ago

                You’re still talking like language has to follow the rules.

                That is clearly false. Refer to what I said in the very comment that you’re replying to: “That is not about people following/not following “rules”, it’s about nomenclature

                Please stop misrepresenting what I said.

                I gave you a link, lots of people have talked about this, it’s not just some idea I came up with.

                You’re implying that I claimed that you came up with this. I did not.

                The link does not contradict what I said. It’s simply using a different nomenclature, using the acronym “AAVE” to the whole instead of strictly the vernacular varieties.

                The informative content there (i.e. beyond definitions) is mostly accurate, but contrariwise to what you’re implying, I am not contradicting it.

                I don’t know how to put it anymore plainly, I’m sorry if you still don’t understand

                Emphasis mine. Drop off the passive aggressiveness; the one here not understanding shit is you, as shown by the fact that you’re consistently distorting what I said.

                I’m not bothering further with you. Go put words on someone else’s mouth.

    • yamanii@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      3 months ago

      I see, that’s very different from most countries I imagine? People often speak on their own local dialect, here a northeastern would informally speak a completely different portuguese than someone from the south, doesn’t matter the race.

      • Lvxferre@mander.xyz
        link
        fedilink
        English
        arrow-up
        3
        ·
        3 months ago

        Yup, it’s atypical even in the rest of the Americas. I think that the nearest equivalent in Portuguese would be the quilombola dialects, but even then it’s way off - because those dialects are still geographically associated with their respective quilombos, not just with race.

  • Grimy@lemmy.world
    link
    fedilink
    English
    arrow-up
    17
    arrow-down
    3
    ·
    3 months ago

    So for those that didn’t read the article, it basically explains how LLMs have a negative connotation about AAE. When asked to associate words with AAE written phrases, it used words like “aggressive”. When given a normal English phrase and the same phrase but in AAE and then asked what jobs would suit this person, the LLM gave low income jobs for the AAE statement with broader options for the normal English one.

    It’s a serious problem because people that naturally write in AAE are most likely getting worse results. It stems mostly from old rascist newspaper articles and similar things.

    • thedirtyknapkin@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      ·
      3 months ago

      i bet it’s honestly more more from like 4chan and other modern online racist communities. where they would mock aave with racist caricatures. agree with the rest, but if it’s related to aave then i doubt the old newspapers were the source.

    • TexMexBazooka@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      2
      ·
      3 months ago

      It’s a serious problem because people that naturally write in AAE are most likely getting worse results

      Person using LLM built on grammatical rules of the English language has subpar results when operating outside of those rules. More at 6.

    • TheRealKuni@lemmy.world
      link
      fedilink
      English
      arrow-up
      10
      ·
      3 months ago

      Essentially, yes. Ebonics isn’t inherently offensive or inappropriate, as far as I can tell, but it has connotations that are not attached to AAE. Linguists avoid the term today, and modern uses of it tend to be derogatory.

      Source

  • randon31415@lemmy.world
    link
    fedilink
    English
    arrow-up
    8
    arrow-down
    1
    ·
    3 months ago

    African Americans have a weak bias against writing in African American English -> Colleges have weak bias against accepting African Americans as graduate students -> Academic text have strong bias for text written by graduate students -> LLM training data has bias for academic texts -> LLMs have a strong bias for writing like training data.

    The error occurs upstream a bit, don’t point at the coders.

    • TexMexBazooka@lemm.ee
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      2
      ·
      edit-2
      3 months ago

      Writing in AAVE is silly, just like someone from the Deep South including southern drawl in their writing would be, or someone from Boston spelling “car keys” as “kha kees”

      So

      African Americans have a weak bias against writing in African American English -> Colleges have weak bias against accepting African Americans as graduate students

      Is a bit of a jump. Someone writing in AAVE probably wouldn’t get accepted to college, because written word is supposed to transcend dialects and follow a set of rules to be universally understandable.

  • madcat@lemm.ee
    link
    fedilink
    English
    arrow-up
    17
    arrow-down
    23
    ·
    3 months ago

    Because there is no such thing as “African American English”. There is proper English and then there is slang.

        • Lvxferre@mander.xyz
          link
          fedilink
          English
          arrow-up
          12
          ·
          3 months ago

          It’s kind of off-topic, but also on-topic:

          The Queen/king and no one else.

          King Charles uses a variety called Received Pronunciation, but both of his sons (William and Harry) use Southern Standard British instead. Geoff Lindsey has a video on the differences.

          As such, once William rises to the throne, what’s considered “the King’s English” will change. And, alongside it, what plenty people in the UK consider as standard English will change too.

    • Lvxferre@mander.xyz
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      2
      ·
      3 months ago

      What you call “proper English” (or “proper” any other language) is merely an arbitrary construct. It is not set on stone.

      That applies to all levels of a language, by the way, not just vocabulary (“slang”).

      • madcat@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 months ago

        Slang is slang. It’s always used verbally. I am not sure why someone would expect a llm to generate proper slang. Not sure at all how stating that fact makes one a “bigot”.

    • Wanderer@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      2
      ·
      3 months ago

      It’s bad enough the American’s are too stupid to use the proper one that we have to have two.

      But people talking incorrectly is not a reason to write like that. Unless it’s a character speaking or whatever.