Phrase sentiment analysis using neural networks

Hello everyone!

All people who receive higher education, without being expelled, nevertheless reach the stage of writing a diploma. I was no exception. I wanted to implement something interesting and master the hitherto unexplored, so I drew attention to the topic of neural networks and artificial intelligence in general. And the task that I solved with the help of it is the analysis of the tonality of the text, which is already widely used in various monitoring systems. I will try to describe the process of its solution in this article.

In short, the goal is to understand whether a phrase has a positive connotation or a negative one. I want to say right away that this problem can be solved in several ways, and not only by neural networks. We can make dictionaries in which the positions of words are marked, etc. (all methods are in abundance on the hub), but each method may go further according to the article, so we will leave their review for later.

Data


The first task on my way was collecting and preprocessing data for training. A good dataset for such a case is the corpus of short texts by Y. Rubtsova, previously divided into negative and positive sentences collected on Twitter. What is especially convenient - all this exists in CSV format.

Training preparation


Pay attention to the form in which the data is presented - a lot of emoticons, links, unnecessary characters, hits. All this is not important information and only interferes with learning, moreover, everything must be removed in Latin. Therefore, the text would be good to preprocess.

def preprocessText(text):
    text = text.lower().replace("", "")
    text = re.sub('((www\.[^\s]+)|(https?://[^\s]+))', ' ', text)
    text = re.sub('@[^\s]+', ' ', text)
    text = re.sub('[^a-zA-Z--1-9]+', ' ', text)
    text = re.sub(' +', ' ', text)
    return text.strip()

Having run away all the sentences from the file, we bring them to lower case, replace "" with "e", references, mentions, we simply remove the English words for lack of meaning. In short, we make them the same type, cleaning out the “garbage” that is superfluous for training.

Tools


Of course, if you have a supercomputer at home, you can scroll down this section further, looking for an interesting part. I advise the rest of the service Google Colab , which allows you to run Jupyter Notebooks (and who haven’t heard about it, to help the search engine) using only a browser, and all the work is done on a virtual machine in the cloud.
The temporary size of the session that you are given to work is limited to 12 hours - you can finish earlier, after which everything is reset.

We write our beautiful code


Like any other newcomer to machine learning, I chose Python - because it's simple, and libraries are a whole cloud.

First, the package manager will execute one important command, the meaning of which I will explain a little later.

image

Next, we import the libraries that we will use when training the grid and preparing the data, I think many of them are familiar to you.

image

Finally to the point.

So, why did we download and import the Tensorflow-text library? The fact is that phrases cannot be “fed” to the grid in the form in which it is readable to us. This is where Word Embedding comes in, a term I haven’t found an adequate translation for, and in general I doubt its existence. But roughly speaking, we are talking about matching a vector to a word. This is well said here .

We need to convert entire sentences into a vector, so we use a ready-made solution from Google - Universal Sentence Encoder.

image

You can download it from the hub here . There, by the way, there are many more interesting ready-made solutions that can be used when learning a neural network, so as not to bother yourself.

image

All tweets are classified by class - bad / good. We create a pandas-dataframe in which they are sorted by class (the bad ones are not visible in the picture, due to the fact that they fit in).

image

We prepared the data - let's get down to the model itself. To do this, use the Keras framework.

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential, load_model
model = tf.keras.Sequential()

model.add(
  tf.keras.layers.Dense(
    units=256,
    input_shape=(X_train.shape[1], ),
    activation='relu'
  )
)
model.add(
  tf.keras.layers.Dropout(rate=0.5)
)

model.add(
  tf.keras.layers.Dense(
    units=128,
    activation='relu'
  )
)
model.add(
  tf.keras.layers.Dropout(rate=0.5)
)

model.add(tf.keras.layers.Dense(2, activation='softmax'))
model.compile(
    loss='categorical_crossentropy',
    optimizer=tf.keras.optimizers.Adam(0.001),
    metrics=['accuracy']
)

history = model.fit(
    X_train, y_train,
    epochs=10,
    batch_size=16,
    validation_split=0.1,
    verbose=1,
    shuffle=True
)

model.evaluate(X_test, y_test)

A little about the model itself. It has input, hidden, and output layers.

Each layer has its own activation function.

A little explanation: In artificial neural networks, the neuron activation function determines the output signal, which is determined by the input signal or a set of input signals. You can read more here , by the way a lot of them exist for different tasks, but we will work only with 2. We

assign the activation function ReLu to the first 2 layers . The day off is Softmax .

In addition to adding layers, you can notice the word “Dropout”. What is it? Oddly enough, in addition to the problem of under-learning of a neural network, when its predictions are not true, there is the problem of over-training - the model explains well only examples from the training sample, adapting to the training examples, instead of learning to classify examples that were not involved in the training. That is corny on the new data, your beautiful model, who had done her job superbly before, simply “flies” and starts to surprise you unpleasantly. So Dropout is engaged in the fact that with some specified probability it “turns off” neurons from the grid, so that they cease to participate in the learning process. Then the results from several networks are averaged (when a neuron is excluded from the network, a new one is obtained).

By the way, a great article for those who are interested.

You can start learning!

Train on 53082 samples, validate on 5898 samples
Epoch 1/10
53082/53082 [==============================] - 12s 223us/sample - loss: 0.5451 - accuracy: 0.7207 - val_loss: 0.5105 - val_accuracy: 0.7397
Epoch 2/10
53082/53082 [==============================] - 11s 213us/sample - loss: 0.5129 - accuracy: 0.7452 - val_loss: 0.5000 - val_accuracy: 0.7523
Epoch 3/10
53082/53082 [==============================] - 11s 215us/sample - loss: 0.4885 - accuracy: 0.7624 - val_loss: 0.4914 - val_accuracy: 0.7538
Epoch 4/10
53082/53082 [==============================] - 11s 215us/sample - loss: 0.4686 - accuracy: 0.7739 - val_loss: 0.4865 - val_accuracy: 0.7589
Epoch 5/10
53082/53082 [==============================] - 11s 214us/sample - loss: 0.4474 - accuracy: 0.7889 - val_loss: 0.4873 - val_accuracy: 0.7616
Epoch 6/10
53082/53082 [==============================] - 11s 216us/sample - loss: 0.4272 - accuracy: 0.8004 - val_loss: 0.4878 - val_accuracy: 0.7603
Epoch 7/10
53082/53082 [==============================] - 11s 213us/sample - loss: 0.4081 - accuracy: 0.8111 - val_loss: 0.4986 - val_accuracy: 0.7594
Epoch 8/10
53082/53082 [==============================] - 11s 215us/sample - loss: 0.3899 - accuracy: 0.8241 - val_loss: 0.5101 - val_accuracy: 0.7564
Epoch 9/10
53082/53082 [==============================] - 11s 215us/sample - loss: 0.3733 - accuracy: 0.8315 - val_loss: 0.5035 - val_accuracy: 0.7633
Epoch 10/10
53082/53082 [==============================] - 11s 215us/sample - loss: 0.3596 - accuracy: 0.8400 - val_loss: 0.5239 - val_accuracy: 0.7620
6554/6554 [==============================] - 0s 53us/sample - loss: 0.5249 - accuracy: 0.7524
[0.5249265961105736, 0.752365]

So, 10 eras have passed. For those who are not at all familiar with such concepts, I will explain the definition from the Internet: The era is one iteration in the learning process, including the presentation of all examples from the training set and, possibly, checking the quality of training on the control set. So all our data went 10 times completely through the whole process.

Result


image

Of course, the network will be needed more than once and it would be nice to know how to save it for posterity, so that they do not have to re-train it and all that.

The structure is saved in JSON format, and the weights are written to a h5 file .

The search engine is full of guides on how to crank the reverse process of initializing the network from these files, so I will not describe it.

Using the predict method, we will try to find out the opinion of the network and the tonal component of 2 obviously different phrases in this regard. True, they still need to be reduced to a matrix form first, but we already know how to do this using a ready-made solution.

At the output, we see 2 numbers - the probability that the phrase belongs to the “negative” / “positive” classes. I think the picture clearly shows that there is a difference) So similar words were in the end and the network did a great job with their relationship to their classes.

Conclusion


So, I want to say that mastering advanced tools for developing neural networks and solving simple problems, having correctly determined the necessary steps to solve it and having read the theory a bit, seems to be quite an easy task. I would like to note that I saw several articles on the topic of tonal analysis on Habré, but it was still interesting to try something easier and without a huge mass of text, although you need to study the theory unconditionally :)

You can find the code here if you put an asterisk on the project, it will be great. If you need files with weights and network structure, as well as processed data - write to the comments, add to the repository.

All Articles