Bot Hunting Is All About the Vibes

At the heart of every bot-detection tool is a human’s gut feeling—and all the messiness that comes with it.
An illustration of a bot next to a human.
ILLUSTRATION: ABBR. PROJECT

Bots Run the Internet

Christopher Bouzy is trying to stay ahead of the bots. As the person behind Bot Sentinel, a popular bot-detection system, he and his team continuously update their machine learning models out of fear that they will get “stale.” The task? Sorting 3.2 million tweets from suspended accounts into two folders: “Bot” or “Not.”

To detect bots, Bot Sentinel’s models must first learn what problematic behavior is through exposure to data. And by providing the model with tweets in two distinct categories—bot or not a bot—Bouzy’s model can calibrate itself and allegedly find the very essence of what, he thinks, makes a tweet problematic.

Training data is the heart of any machine learning model. In the burgeoning field of bot detection, how bot hunters define and label tweets determines the way their systems interpret and classify bot-like behavior. According to experts, this can be more of an art than a science. “At the end of the day, it is about a vibe when you are doing the labeling,” Bouzy says. “It’s not just about the words in the tweet, context matters.”

He’s a Bot, She’s a Bot, Everyone’s a Bot 

Before anyone can hunt bots, they need to figure out what a bot is—and that answer changes depending on who you ask. The internet is full of people accusing each other of being bots over petty political disagreements. Trolls are called bots. People with no profile picture and few tweets or followers are called bots. Even among professional bot hunters, the answers differ.

Bot Sentinel is trained to weed out what Bouzy calls “problematic accounts”—not just automated accounts. Indiana University informatics and computer science professor Filippo Menczer says the tool he helps develop, Botometer, defines bots as accounts that are at least partially controlled by software. Kathleen Carley is a computer science professor at the Institute for Software Research at Carnegie Mellon University who has helped develop two bot-detection tools: BotHunter and BotBuster. Carley defines a bot as “an account that is run using completely automated software,” a definition that aligns with Twitter’s own. “A bot is an automated account—nothing more or less,” the company wrote in a May 2020 blog post about platform manipulation.

Just as the definitions differ, the results these tools produce don’t always align. An account flagged as a bot by Botometer, for example, might come back as perfectly humanlike on Bot Sentinel, and vice versa.

Some of this is by design. Unlike Botometer, which aims to identify automated or partially automated accounts, Bot Sentinel is hunting accounts that engage in toxic trolling. According to Bouzy, you know these accounts when you see them. They can be automated or human-controlled, and they engage in harassment or disinformation and violate Twitter’s terms of service. “Just the worst of the worst,” Bouzy says.

Botometer is maintained by Kaicheng Yang, a PhD candidate in informatics at the Observatory on Social Media at Indiana University who created the tool with Menczer. The tool also uses machine learning to classify bots, but when Yang is training his models, he’s not necessarily looking for harassment or terms of service violations. He’s just looking for bots. According to Yang, when he labels his training data he asks himself one question: “Do I believe the tweet is coming from a person or from an algorithm?”

How to Train an Algorithm

Not only is there no consensus on how to define a bot, but there’s no single clear criteria or signal any researcher can point to that accurately predicts whether an account is a bot. Bot hunters believe that exposing an algorithm to thousands or millions of bot accounts helps a computer detect bot-like behavior. But the objective efficiency of any bot-detection system is muddied by the fact that humans still have to make judgment calls about what data to use to build it.

Take Botometer, for example. Yang says Botometer is trained on tweets from around 20,000 accounts. While some of these accounts self-identify as bots, the majority are manually categorized by Yang and a team of researchers before being crunched by the algorithm. (Menczer says some of the accounts used to train Botometer come from data sets from other peer-reviewed research. “We try to use all the data that we can get our hands on, as long as it comes from a reputable source,” he says.)

There’s a mystical quality in the way Yang speaks about how the team trains the Random Forest, the supervised machine-learning algorithm at the core of Botometer. “When I ask other people to label accounts, I don’t give them too many specific directions,” Yang says. “There are signals in bots that are hard to describe but that humans notice.” In other words, the Botometer team is trying to bake in some of the human instincts that allow people to detect who’s human and who’s not.

After these accounts are labeled, Botometer’s model crunches more than a thousand features of each category of account, according to Menczer. For instance, the model looks at how many of each part of speech appeared in the text of a tweet. It also considers sentiment, when the account was created, and how many tweets or retweets it has. Time is also a factor, says Menczer. “How often does an account tweet? How many times in a day? How many times in a week? What is the distribution of the interval?” If an account is tweeting all hours of the day without enough downtime to sleep, for example, it could be a bot. These inputs, amongst others, carefully calibrate a decision tree that dictates how the model evaluates accounts it is unfamiliar with. “So it’s a little bit complicated,” Menczer says.

The tools are also evolving. The Botometer you can use today is the fourth version of the tool, according to Menczer, and it’s trained using new data sets that account for changes in bot behavior. “We add new data sets, we add new features. Sometimes we remove features that we don’t think are as useful anymore,” he says.

The Botometer team recently realized that bot accounts were frequently using AI-generated photos in their Twitter bios. They learned that the eyes on these fake faces follow a pattern: They’re in the same position. Incorporating images of faces that are created by an algorithm into Botometer’s training data and labeling them as bots could eventually help the tool flag accounts that use similar images in their bios.

Flawed Human Nature

Despite the work that goes into creating these tools, the bot-hunting field is not without detractors. Darius Kazemi, an engineer at Meedan, a nonprofit that works in the misinformation space, is not shy about his skepticism of bot-detection software. “I think the very premise of bot-detection is flawed, and I don’t think it’s going to get better,” he says. Part of the reason for this, Kazemi says, is that “problematic content” is not a standardized metric.

For Kazemi, bot hunting boils down to trust and ideology. “If you are ideologically aligned with the bot developers, then these tools will give you the signal you are looking for,” he says.

Bouzy and Yang express the same concerns about bias, and they have implemented measures to counter it. Bot Sentinel is largely trained with tweets from users that Twitter has already deemed problematic, using Twitter’s own policies as a benchmark. “We still use our judgment when labeling tweets, but at least we have a starting point,” Bouzy says. “We do our best to limit the bias, but unfortunately, no system is perfect. However, we believe Bot Sentinel is the most accurate publicly available tool to identify disruptive and problematic accounts.”

Botometer tries to have as many researchers as possible labeling tweets to mitigate Yang’s own biases. The team also seeds training data with nontraditional inputs. “For instance, we purchase fake followers that we know are bots and use those accounts to train the model,” Yang says. “We also can vet our model by seeing if accounts flagged as bots eventually get suspended.” All of this data is made publicly available and open for inspection. “We try different ways to make it as solid as possible.”

Menczer says the controversy over bot detection often lies in human biases—people trust such tools wholeheartedly or expect them to do something beyond their capabilities. “A tool can be useful, but it has to be used in the right way,” he says. Just as these tools shouldn’t be used as proof that someone you follow is a bot, Menczer says, it’s also incorrect to conclude that errors in the system are proof that it doesn’t work at all.

Lousy With Bots

Regardless of what these bot-hunting models have learned to detect, it’s clear that they are detecting something. Bot Sentinel and Botometer have become the go-to tools for misinformation researchers and both claim to have a track record of successfully flagging accounts before Twitter suspends them.

Kazemi is still not sold on the value of bot detection. “It’s measuring something,” he says. “But the real question is whether you can make useful decisions based on signals from these services. I’d say no.”

Menczer admits that bot-detection tools are not always accurate but says they don’t have to be perfect to be useful. “Yes, there are going to be some mistakes—for sure. That’s the nature of machine learning, right?” he says. “Yes, the tool makes mistakes. That doesn’t mean that it’s useless. But also the problem is hard, so you shouldn’t just use the tool blindly.”

This area of research is also relatively new and rapidly evolving—as are the bots. Carnegie Mellon’s Carley emphasizes that researchers have focused on Twitter bots because they’re public and therefore accessible. But Twitter bots are not alone. And without tools that can identify bots at scale, and stamp out the nefarious ones, the internet will become more overrun than it already is.

Update 9-30-22, 4:25 pm ET: This article has been updated to clarify that Bot Sentinel is trained to identify problematic accounts, not simply automated or partially automated accounts.

Update 10-3-22, 12:30 am ET: We clarified a paragraph describing an example of a feature Botometer could develop using the eye position of AI-generated bio images.