Bots are running rampant. How do we stop them from ruining Lemmy?

Buttflapper@lemmy.world · edit-2 8 months ago

Bots are running rampant. How do we stop them from ruining Lemmy?

tal@lemmy.today · edit-2 8 months ago

Just pay $10 for every account that you want to create

So, making identities expensive helps. It’d probably filter out some. But, look at the bot in OP’s image. The bot’s operator clearly paid for a blue checkmark. That’s (checks) $8/mo, so the operator paid at least $8, and it clearly wasn’t enough to deter them. In fact, they chose the blue checkmark because the additional credibility was worth it; X doesn’t mandate that they get one.

And it also will deter humans. I don’t personally really care about the $10 because I like this environment, but creating that kind of up-front barrier is going to make a lot of people not try a system. And a lot of times financial transactions come with privacy issues, because a lot of governments get really twitchy about money-laundering via anonymous transactions.

EDIT: I think that maybe a better route is to try to give users a “credibility score”. So, that’s not a binary “in” or “out”. But other people can see some kind of automated assessment of how likely, for example, a person might be to be a bot.

thinks more

I mean, this is just spitballing, but could even be done not at a global level, but at a per-other-user level. Like, okay, suppose you have what amounts to a small neural network, right? So the instance computes a bunch of statistics about a each user, like account age, stuff like that, and then provides that to the client. But it doesn’t determine the importance of those metrics in whether the other user should see that post, just provides the raw data. You’ve got a bunch of inputs to a neural net, then. Then the other user can have a set of classifications. Maybe just “hide”, but also maybe something like “bot” or “political activism” or whatever. And it takes those input metrics from the instances, and trains that neural net to produce client-side classifications, and then auto-tags users based on that. That’s gonna be a pain to try to defeat, because the bot operator can’t even see how they’re being scored – they haven’t “gotten over the hurdle” or not.

But you don’t want to make every end user train a neural net from scratch. Hmm.

So maybe what you do is let users create their own scores and expose those to other users, right? I think that I read that BlueSky does something like that, was working on letting users create “curated feeds” for other users. They’re doing something simpler, no machine learning, but that’s got some drawbacks, means that you have to spend more time determining whether a score is good. So, okay. Say I’m gonna try to score a user based on whether-or-not I think that they’re a bot. I have the option to make that score publicly-available. Other users can “subscribe” to that metric, and when they do, there’s a new input node added to their local classifier’s list of input nodes. Like, “Dons Bot list”.

But I don’t have to subscribe to Don’s Bot List, and even if I do, it doesn’t mean that I automatically consider that other user a bot. Don’s rating is just an input into whether my own classifier considers them a bot. If I regularly disagree with Don, even if I’m subscribed to his list, my local neural net will slash the importance of his rating. If I agree with Don unless some other input to my classifier’s neural net is triggered, then the classifier can learn that.

QuadratureSurfer@lemmy.world · 8 months ago

Yep, exactly this. It might deter some small time bot creators, but it won’t stop larger operations and may even help them to seem more legitimate.

If anything, my favorite idea comes from this xkcd:

https://xkcd.com/810/

Dark Arc · 8 months ago

Yeah, BlueSky has this concept of user moderation lists. It’s effectively like subscribing to a adblock filter. There might be some things blocked by patterns (e.g., you could have one that blocks anything that involves spiders) and there might be others that block specific accounts (e.g., you could have one that blocks users that are known to cause problems, are prone to vulgar language, etc).

I think the problem with credibility scores in general though, is it’s sort of like a “social score” from black mirror. Real people can get caught in the net of “you look like a bot” and similarly different algorithms could be designed to game the system by gaming the metrics to look like they’re not a bot (possibly even more so than some of the real people).

This is kind of what lead me down the route of bringing things back into the physical world. Like, once you have things going back through the normal systems … you arguably do lose some level of anonymity but you also gain back some guarantees of humanity.

It doesn’t need to be the level of “you’ve got a government ID and you’re verified to be exactly you with no other accounts” … just “hey, some number of people in the real world, that are subject to the respective nation’s laws, had to have come into contact with a real piece of mail.”

Maybe that just turns into the world’s slowest UDP network in existence. However, I think it has a real chance of making it easier to detect real people (i.e., folks that have a small number of overlapping addresses). The virtual mailbox the other person gave has 3,000 addresses… if you assume 5 people per mailing address is normal that’s 15,000 bots total before things start getting fishy if you’ve evenly distributed all of those addresses. If you’ve got 3,000 accounts at the same address, that’s very fishy. Addresses also change a lot less frequently than IP addresses, so a physical address ban is a much more strict deterrent.