Comparison of Russian rap scenes using R and Text Mining techniques. Noize Mc, Oxxxymiron, Uncle Zhenya. Episode 2

R. Text Mining. Rap. Episode 2

This article is a continuation of the material “Comparison of Russian rap scenes using R and Text Mining techniques. "Noize Mc and Kasta vs Pharaoh and Morgenshtern" and now I will try to analyze in detail the work of Noise Ms and Oksimiron. However, I want to note that this will not be a comparison between the two. The purpose of this article is not to show which of them is cooler, but to convey the depth and diversity of their music, which we have the opportunity to enjoy in real time. We are very lucky that we can follow their successes and go to their concerts. In this material there will be no comparison as in the first part, there will not be such a strong contrast.

This time, analysis was also done using R, Python and the API. You can read more in the first part , because I do not want to repeat it.

Those who are even a little familiar with the work of Noise Ms and Oksimiron will agree that for sure, the words used by these artists will be very different due to the different themes of the songs. For example, many of Oxford’s graduate’s songs and Oksimiron’s diploma in medieval English literature have many references to religion and history. For example, a track called “Ivory Tower”. Few people know that this metaphor was first used in the Biblical Song “by the song of Solomon”: “Your neck is like a pillar of ivory” and allegorically means an area of ​​high aspirations, far from the bustle of the world and its worries. Therefore, it is not surprising that many of his texts seem difficult for perception and understanding, so much so that for help in decoding they turn toAnatoly Wasserman .

On the other hand, Noise Ms focuses his work on a wider audience, so the language of his music is understandable and close to a very large number of people. Also, many of Ivan’s lyrics (real name Noise Ms) are written “here and now” and tell about current events at the time of writing. For example, the track “Mercedes S-666” was written in the wake of an accident that occurred on Leninsky Prospekt in 2010, when the vice president of Lukoil Anatoly Barkov and two women, Olga Alexandrina and Vera Sedelnikova, became participants in the traffic accident. died in that accident. Moscow traffic police declares that Aleksandrina and Sedelnikova are to blame. Eyewitnesses claim that this is not the case.

To begin with, by analogy with the previous article, I calculated the total number of words for Noise Ms - 56 473 (157 songs) and for Oxymiron - 16 540 (39 songs). Oksimiron was taken for analysis 2 of his official album + Mixtape number 2, the first one he decided to exclude, since, in almost all works, Oksimiron performs only one verse.

This is how the number of unique words looks after deleting stop words.


As you can see, Noise Ms and Oksimiron use only 2209 common words in their texts. More than 50% of the vocabulary of each artist is unique, which undoubtedly indicates the difference in their authorial styles. I would venture to suggest that the number of unique words in Oksimiron would be even higher if the number of albums and tracks was even a little closer to Noise. For comparison, Leo Nikolayevich Tolstoy in his work “Anna Karenina” has 12,752 unique words out of 253,311.

So that I could easily and clearly see the most popular words from Noise Ms and Oxymiron, I gathered them into word clouds.


As well as words common to them.


Then I had some logical questions. Which word is considered more popular and memorable by this or that artist? What words characterize his work more strongly? Those that he often said in one song, but in fewer numbers, or those that he mentioned, suppose, once, but in more tracks.

It is very difficult to make an unambiguous conclusion. Indeed, based on the first article, the word "tyr" was the most popular among Caste, but those who are familiar with their work immediately indicated that it is difficult to call this word one of the defining ones for this band, since it was pronounced almost all times one track "Tyrim". Therefore, someone may never turn on a track with the most frequently mentioned word, but someone on the contrary will know and associate the artist exclusively with this song. For example, for me, Caste will always be associated with the words of the song “Around the Noise” (“Do not boil everything nishtyak”).

If we take a word that was used in more tracks, then the likelihood that this word will be heard and connected with the work of a certain artist is much higher.

As I already said, both of the approaches have a right to exist and have strengths and weaknesses, therefore, in order to provide readers with a complete picture, I analyzed the texts of Noise Ms and Oxymiron in two ways.

This is how the juxtaposition of the most commonly used words looks like in Noise Ms and Oksimiron. The first meaning is the most popular words among the artists, the second is the words that are mentioned in more songs. No stop words.



If you carefully study the data in the tables, it becomes clear that most of the words are common and do not affect the style of the text. However, there are words that stand out against the general background, and they create the uniqueness of the author's style.

To understand how the texts of Noise Ms and Oksimiron differ from other works and texts written in Russian, I compared the data for the most frequently used words (before deleting stop words) with the same statistics taken from the national corpus of the Russian language . This information and reference system, based on a collection of Russian texts in electronic form, contains more than 50,000 documents. To compile the rating, 192 689 044 word forms were used.


It is expected that the most popular words turned out to be prepositions, conjunctions, particles, pronouns, etc. In comparison with tens of thousands of other works, Noise Ms and Oksimiron even had almost the same percentage of the use of these words.

In order to more accurately analyze the similarity / difference of texts, it is not enough to consider only individual words and the frequency of their use, it is also important to consider which connectives these words make up, the so-called bigrams, 3-frames, etc. After all, using the same vocabulary, you can compose sentences and phrases that are different in meaning. After analyzing which connectives make up certain words, one can draw a more confident conclusion about similarity or difference.

This is how the most popular bigramas look in Noise and in Oxymyron. I again compared them with information from the corpus of the Russian language.


And again, as in the case with the usual comparison of word forms, the connectives of words between the artists and the corpus of the Russian language are very similar, but there are prominent elements that distinguish the themes and style of the artists.

A very important, controversial and controversial point for me was the determination of the breadth and diversity of the authors' vocabulary. How to do this without turning to dictionaries for the interpretation of the meaning of words and definition of their subject? Does the versatility of creativity determine the total number of words in the works? Or is the key to the number of unique words? In the first case, you can simply use the same words in all songs, and take only the number. In the second, many of the unique words can be used in the n-number of songs, and then again manipulate the same words. As you can see, both approaches have many reservations.

Therefore, I made the assumption that the frequency with which artists use unique words in their songs can tell us about the breadth. The more unique words used in fewer songs, the more confidently one can say that the topics are different. Either the performer is a master of synonyms, and then the themes are the same, but the words are different, which is also undoubtedly good, because it shows the breadth of knowledge of the Russian language.
Below is a table that indicates how many words were used in how many songs. For example, the word "punks" was used only in 1 song, but perhaps several times. And the more words were used in only one work - the higher the uniqueness. For convenience, this measure was called by me - “The Index of Uniqueness of Words”. The higher the value, the more unique and diverse the text.

For clarity of understanding, I will give an example from the table: Noise Ms used 5,451 unique words in only one track (possibly several times), 1,467 unique words were used by him in two works, etc. He used 12 unique words in more than 40 tracks.


As you can see, the percentage of uniqueness in groups is approximately the same for two artists. Just over 60% of the Noise Ms and as much as 75% of the unique words of Oxymiron were used in only one track.

It would be interesting to compare these indicators, for example, with pop music, where the theme is not so wide, because initially rap music is protest music. Performers raise difficult topics for themselves and society, try to understand them, or share their reasoning. Pop music is more designed to entertain and relax listeners, it is easier.

But, I want to emphasize that I in no way compare rap with pop in this example. I show the results of an analysis of the work of two talented artists - Noise Ms and Oksimiron.
About words, their number and uniqueness has already been said, if not almost everything, then much. But what else can affect the perception of the audible text? In the case of rap artists, this, of course, is the speed of spoken words. The speed and quality of pronunciation of words, of course, affects the perception and understanding of the text.

Below is the speed of pronunciation of words per unit time (one second). You can also get acquainted with the statistics of songs with the greatest number of words, as well as with works with the highest speed of “reading”.


Noise Ms has an average word pronunciation rate of 1.77 words per second. This was to be expected, as many of Noise's songs have an element of “traditional” singing that lengthens the pronunciation time of the word. And the style of his songs is not pure rap or hip-hop, but more often a mixture of rock and rap.


Oxymiron has an average number of words spoken per second higher than that of his colleague - 2.55 words per second.

The track XXX Shop, most likely, should be excluded from these statistics, since it contains 2 verses in English and they are performed by other artists. However, we listen to the tracks as a whole, without dividing them into artists. Noise Ms also has a lot of collaborations.

Based on the analysis, we can safely say a few things. First, both authors in their work confidently use all the riches that the Russian language provides them. Secondly, most of the words that make up their songs are commonly used and popular among other authors, however, several word forms and bigrams that are characteristic only of them can be distinguished. And thirdly, the music of Noise MS and Oxymiron is different, both in style, in subject and in vocabulary, which they use. And definitely, this music that deserves attention.

Also, I hope that the presented methods for analyzing the texts of the performers will seem useful and accessible to you. Indeed, the analysis of music, including rap, should be different from the usual analysis of literary works. In the second case, the emphasis is on the length of sentences, the number of syllables in words, the number of words in sentences, the number of nouns / adjectives / turns, etc. In my opinion, in rap music this does not make sense, since the sentences are combined into one whole during the reading. Words are pronounced with great speed, and here it is important to at least just keep track of what the performer is reading.

Comment, criticize. Indeed, the more reviews, the faster and more efficiently we will be able to improve the well-known methods of analyzing musical works.

Bonus Uncle Jenya

Uncle Zhenya . Few people are familiar with his work, but this person is unique and this uniqueness is expressed in the texts. They are complex in structure and incredibly filled with meaning and deep content. Mention of Nietzsche, Castaneda, images from mythology, wordplay and compositional refrain. I advise everyone to get a little acquainted with his work.

A review of his texts will be short, as it was made a bonus at the request oftrawl. From words to deeds.

I managed to find 14 tracks of Uncle Zhenya with lyrics. In them he used 10,064 words, and 5,756 after removing the stop words. The number of unique words is 2750. Here is a word cloud made up of a list of the most popular.


Of course, hip-hop is one word, but when processing texts, all word forms are divided into tokens.

This is how the most popular words and the most often used in texts look.


Interestingly, hip was used once more than hop.

And so, Uncle Zhenya controlled the vocabulary in the texts. He used 72% of the 2750 unique words in only one work (possibly several times). Which again can talk about different topics in his work. In general, its indicators are very similar to those of Oxymiron.


And finally, I want to show the songs with the most words and the highest reading speed.


Uncle Zhenya’s speed is even higher than that of Oxymiron.

the end

All Articles