🧘🏿 🛵 📱 How to open comments and not drown in spam 🧘 🦗 🤵

When your job is to create something beautiful, you can not especially talk about it, because the result is before everyone's eyes. But if you erase the inscriptions from fences, no one notices your work until the fences look decent or until you erase something wrong.

Any service where you can leave a comment, review, send a message or upload pictures, sooner or later faces the problem of spam, fraud and obscene language. This cannot be avoided, but this must be fought.

My name is Mikhail, I work in the Anti-Spam team, which protects users of Yandex services from such problems. Our work is rarely noticeable (and good!), So today I will talk about it in more detail. You will find out in which cases moderation is useless and why accuracy is not the only indicator of its effectiveness. We’ll also talk about mate using the example of cats and dogs and why it’s sometimes useful to “think like a swindler”.

In Yandex, there are more and more services where users publish their content. You can ask a question or write an answer in Yandex.Kew, discuss the news of the yard in Yandex.Rayon, share the traffic situation in conversations on Yandex.Maps. But when the service audience grows, it becomes attractive to scammers and spammers. They come and fill in the comments: they offer easy money, advertise miraculous funds and promise social benefits. Because of spammers, some users lose money, while others lose the desire to spend time on a scruffy, spam-overgrown service.

And this is not the only problem. We strive not only to protect users from scammers, but also to create a comfortable atmosphere for communication. If people in the comments encounter obscene language and insults, they are very likely to leave and never return. So, this also needs to be able to fight.

Clean web

As it often happens with us, the first developments were born in the Search, in the part that fights spam in the search results. Ten years ago, there appeared the task of filtering adult content for family searches and for queries that do not imply answers from the 18+ category. So the first manually typed dictionaries of porn and mat appeared, they were replenished by analysts. The main task was to classify requests into those where adult content is acceptable and where not. For this task, markup was collected, heuristics were built, models were trained. So the first developments appeared for filtering inappropriate content.

Over time, Yandex began to appear UGC (user generated content) - messages that are written by the users themselves, and Yandex only publishes. For the reasons described above, many messages could not be published without looking - moderation was required. Then we decided to create a service that would provide protection against spam and cybercriminals for all Yandex UGC products and use the best practices for filtering inappropriate content in the Search. The service was called Clean Web.

New tasks and help tolokers

At first, only simple automation worked for us: the services sent us texts, and we ran mat dictionaries, porn and regular dictionaries on them - analysts made everything manually. But over time, the service was used in an increasing number of Yandex products, and we had to learn to work with new problems.

Often, instead of recalling, users publish a meaningless set of letters, trying to wind up their achievements, sometimes advertise their company in reviews of a competitor's company, and sometimes they simply confuse organizations and write in a review about a pet store: “Perfectly cooked fish!”. Perhaps someday artificial intelligence will learn to perfectly capture the meaning of any text, but now automation sometimes copes worse than humans.

It became clear that one could not do without manual marking, and we added the second step to our circuit - sending it for manual inspection by a person. There were those published texts for which the classifier did not see problems. You can easily imagine the scale of such a task, therefore we not only relied on assessors, but also used the “wisdom of the crowd”, that is, we asked for help from tolokers. It is they who help us identify what the machine missed, and thereby teach it.

Smart caching and LSH hashing

Another problem that we encountered when working with comments is spam, or rather, its volume and speed of distribution. When the audience of Yandex.Ryon began to grow rapidly, spammers came there. They learned to get around regulars by slightly changing the text. Spam, of course, was still found and deleted, but hundreds of people could see an unacceptable message even for 5 minutes on a Yandex scale.

Of course, this did not suit us, and we made smart caching of texts based on LSH ( locality-sensitive hashing) It works like this: we normalized the text, threw out links from it and cut it into n-grams (sequences of n letters). Further, the hashes from n-grams were considered, and the LSH-vector of the document was already built on them. The point is that similar texts, even if they changed a little, turned into similar vectors.

This decision allowed reusing verdicts of classifiers and tokers for similar texts. In a spam attack, as soon as the first message passed the check and got into the cache with the spam verdict, all new similar messages, even modified ones, received the same verdict and were deleted automatically. Later, we learned to train and automatically retrain spam classifiers, but this “smart cache” has remained with us and still helps us a lot.

Classifier of good texts

Not having time to take a break from fighting spam, we realized that 95% of our content is moderated manually: classifiers respond only to violations, and most texts are good. We load the tolokers, who in 95 cases out of 100 put a rating of "Everything is OK." I had to do an unusual job - to make classifiers of good content, the benefit of markup for this time has accumulated enough.

The first classifier looked like this: we lemmatize the text (we reduce the words to the initial form), throw out all the service parts of speech and use the pre-prepared “dictionary of good lemmas”. If all the words are “good” in the text, then the whole text does not contain violations. On different services, this approach immediately gave from 25 to 35% of manual markup automation. Of course, this approach is not ideal: it is easy to combine several innocent words and get a very offensive statement, but it allowed us to quickly reach a good level of automation and gave time to train more complex models.

The following versions of the classifiers of good texts already included linear models, and decision trees, and their combinations. For marking rudeness and insults, for example, we try the BERT neural network. Here it is important to grasp the meaning of the word in context and the connection of words from different sentences, and BERT does a good job of this. (By the way, recently, colleagues from the News told how they use the technology for a non-standard task - to search for errors in the headers.) As a result, we managed to automate up to 90% of the flow depending on the service.

Accuracy, completeness and speed

In order to develop, you need to understand what kind of automatic classifiers bring benefits, changes in them, and also whether the quality of manual checks does not degrade. For this, we use indicators of accuracy and completeness.

Accuracy is the proportion of correct verdicts among all verdicts of bad content. The higher the accuracy, the less false positives. If you do not follow the accuracy, then in theory you can delete all the spam and obscenities, and with them half of the good messages. On the other hand, if you rely only on accuracy, then the best technology will be that does not catch anyone. Therefore, there is also an indicator of completeness: the proportion of detected bad content among the total volume of bad content. These two metrics balance each other out.

For measurement, we sample the entire incoming stream for each service and give samples of content to assessors for expert evaluation and comparison with machine solutions.

But there is another important indicator.

I wrote above that hundreds of people can see an unacceptable message even in 5 minutes. Therefore, we consider how many times we managed to show people bad content before hiding it. This is important, because it’s not enough to work efficiently - you need to work fast. And when we built the defense against the mat, we felt it to the full.

Antimat on the example of kitties and dogs

A slight lyrical digression. Someone may say that swearing and insults are not as dangerous as malicious links, and not as annoying as spam. But we strive to maintain a comfortable environment for communication between millions of users, and people do not like to return to where they are offended. No wonder the ban on obscene language and insults is spelled out in the rules of many communities, including on Habré. But we were distracted.

Mata dictionaries do not cope with all the wealth of the Russian language. Despite the fact that there are only four main obscene roots, from them you can make a myriad of words that you can’t catch by any regulars. In addition, you can write in transliteration part of a word, replace letters with similar combinations, rearrange letters, add stars, etc. Sometimes, without context, it is basically impossible to determine what the user meant an obscene word. We respect the rules of Habr, therefore, we will demonstrate this not on live examples, but on seals and dogs.

“Lyau,” said the cat. But we understand that the cat said another word ...

We began to think about the algorithms of “fuzzy matching” of our vocabulary and about more intelligent preprocessing: transliterated, glued spaces and punctuation, looked for patterns and wrote separate regulars on them. This approach yielded results, but often reduced accuracy, without giving the desired completeness.

Then we decided to "think like swear words." We ourselves began to introduce noise into the data: rearranged the letters, generated typos, replaced the letters with similar spelling, and so on. The initial markup for this was taken by applying mat dictionaries to large corps of texts. If you take one sentence and distort it in several ways, you will already have many sentences. So you can increase the training sample tens of times. All that remained was to train on a received pool some more or less smart model that took into account the context.

It’s too early to talk about the final decision. We are still experimenting with approaches to this problem, but we already see that a simple symbolic convolution network of several layers significantly exceeds dictionaries and regulars: it turns out to increase both accuracy and completeness.

Of course, we understand that there are always ways to get around even the most advanced automation, especially when it’s so reckless: to write so that a stupid machine does not understand. Here, as in the fight against spam, we have no goal to eradicate the very possibility of writing something obscene, our task is to make the game not worth the candle.

It’s easy to open the opportunity to share your opinion, communicate and comment. It is much more difficult to achieve a safe, comfortable environment and respect for people. And without this, there will be no development of any community.