The prince said or where the war was. My experience in the study "War and Peace"

I still haven’t read the novel-epic of Leo Tolstoy “War and Peace” - at school it was not interesting because of the author’s “verbiage”, but somehow there’s no time to start such a voluminous work.

However, I decided that it was worth studying ...

image

Training


I did not clean out third-party words and signs (Latin part numbers, footnote numbers and part of comments) that against the background of almost 400 thousand words of the text of the novel, an error of even a thousand words would not give incorrect data, but I decided to make the minimum text preparation .

Part of the file preparation program
from collections import Counter
import matplotlib.pyplot as plt
import numpy as np

#filename = input(« : „)

filename = #
file = open(filename, 'r')
text = file.read()
text = text.replace(“\n», " ")
text =text.replace(']','').replace('[','').replace('\"','').replace(",", "").replace(".", "").replace("?", "").replace("!", "").replace(")", "").replace("(", "")
text =text.lower()
words_untill = text.split() #



As a person who works constantly with numbers, I became interested in the following questions:

1. The longest word in the novel


Having learned from his wife that Lev Nikolayevich was still the graphomaniac, he decided to find out what long words he had invented for the novel by Tolstoy.

So, TOP 3 long words.

The first place (27 letters and a hyphen) was divided by the words supernaturally beautiful , supernaturally refined and irresistibly charming :

... How a good head waiter serves as something supernaturally beautiful that piece of beef that you don’t want if you see it in a dirty kitchen, so in tonight Anna Pavlovna served her guests first viscount, then abbot, as something supernaturally refined ...

... The Frenchman is self-confident because he respects himself personally, both mind and body,irresistibly charming for both men and women. The Englishman is self-confident on the grounds that he is a citizen of the most comfortable state in the world, and therefore, as an Englishman, he always knows what he needs to do and knows that everything he does as an Englishman is undoubtedly good. The Italian is self-confident because he is excited and easily forgets himself and others ...

Second place (25 letters and a hyphen) took the word monotonously diverse :

... The Hussars did not look back, but with every sound of a passing nucleus, as if on command, the entire squadron with with his monotonously diverse faces, holding his breath while the core flew, he rose on stirrups and fell again ...

Third place (24 letters) was taken by the wordExcellency , this word, unlike the previous ones, occurs eight times, as an appeal to Field Marshal Mikhail Illarionovich Kutuzov.

Part of the program for finding the longest word
from collections import Counter
import matplotlib.pyplot as plt
import numpy as np

words = text.split() # , ,
words = sorted(words, key = len, reverse=True) #

for i in range(3): #
print(words[i].ljust(30), len(words[i])) # -3


2. The most commonly used word


Previously, the list was cleared of words of one and two letters in order to remove prepositions and short pronouns from cycles of comparisons. After the first iteration, it turned out that not one of the three letters (sword, evil, rear, etc.) was included in the TOP-10, and I sequentially cleared the list of three-letter words, and even, after further experiments, of four-letter ones words.

Part of a short word cleaning program

words2 =[]# c,
for i in range(len(words)):#
if len(words[i])>4:
words2.append(words[i])
else: break # , , ,


There weren’t so many nouns in the list of most frequently used words, so I had to remove the words “only”, “when”,
“so”, “now”, “this”, “which”, “ from the list of words of the novel which ”,“ because ”,“ again ”,“ suddenly ”,“ very ”,“ nothing ”,“ his ”.

Part of the program for finding the most popular words

words_counts = Counter(words2)
n = []
pop_word = []
for word, count in words_counts.most_common(10):# -10
n.append(count)
pop_word.append(word)
print(word.ljust(20), count)


As a result, the TOP-10 popular words:

1. said - 1411
2. prince - 952
3. time - 544
4. Andrey - 500
5. spoke - 464
6. princess - 435
7. said - 424
8. people - 391
9. Natasha - 376
10. people - 372

Since the search was conducted without taking into account the forms of words, for the “prince" I had to find all forms of the word. After clarifying the data, the PRINCE took first place in the TOP with 1435 references in the novel, against the verb TOLD.

Search all forms of the word PRINCE

n4 = []
form_n4 = []
for i in range(len(words_untill)):
if «» in words_untill[i]:
n4.append(1)
form_n4.append(words_untill[i])
else: n4.append(0)
print(« — » + str(len(form_n4)))


As you can see from the list, the verbs SAID (1411) and SPEAKED (464) are more common in the novel than the verbs SAID (424), which suggests that men speak 4.5 times more in the novel than women (accusations are heard here sexism addressed to Leo Nikolaevich), and the Princess (435) appears much less often than the Prince.

It also became interesting what attitude the society had towards Natalya Ilyinichna Rostova aka Natasha Rostova . Throughout the novel, she remained Natasha, despite the fact that by the end of the novel Natalia Rostova became the wife of Pierre Bezukhov. In all forms, Natasha occurs in the text 591 times, while forms of the name Natalya and Natalie occur only 9 times.

3. Where was the war in the novel?


Despite the name, “war” in the novel occurs in all forms only 278 times.

Search all forms of the word WAR

n3 = []
form_n3 = []
for i in range(len(words_untill)):
if «» in words_untill[i] and «» not in words_untill[i]:# «»
n3.append(1)
form_n3.append(words_untill[i])
else: n3.append(0)
print(form_n3)
print(« — » + str(len(form_n3)))


I divided the whole novel into sections of 10 thousand words and decided to trace the mention of the words “prince”, “Natasha” and “war” during the course of the novel.

Breaking a novel of 10 thousand words

: «0» — , «1» — .

m1=[]
m2=[]
m3 =[]
m4 = []
while i <= len(n1):
m1.append(sum(n1[i: i+10000]))# «»
m2.append(sum(n4[i: i+10000]))# «»
m3.append(sum(n3[i: i+10000]))# «»
m4.append(sum(nata1[i: i+10000]))# «»
i=i+10000


image

The histogram shows that less about the princes after a surge in the description of the war towards the end of the novel, and more and more they remember about Natasha.

The inverse correlation is clearly visible in the distribution of the dependence on the words “war” and “Natasha” - the less war, the more Natasha.

image

The inverse correlation in the distribution of the reference to the words “prince” and “Natasha” is also clearly visible.

image

In the distribution of the dependence on the mention of the words “prince” and “war” there is no clear correlation, although it is clear that when little is said about the war, then they do not remember the princes, but this does not explain the large number of references to the “princes” in the absence of a “war”.

image

It is necessary to track the correlation during the development of the narrative.

image

As can be seen from the graph, a high correlation is present only in the middle of the novel, when there is war in the novel, in other places of the novel the correlation is low, on the basis of which it can be concluded that the use of the “prince” and “war” does not have a constant correlation in the course of the novel.

findings


  1. You need to read the classics !!!
  2. If you want to read about the war, and not about love, then read the first part of the first volume and the third volume.
  3. If you want to read about how princes lived in peacetime, then the second volume is perfect.
  4. If you are interested in love in the absence of war, then you should read the fourth volume.

All Articles