🥂 🤧 😃 Error back propagation algorithm using Word2Vec as an example 🖇️ 🥂 👆🏻

Since I encountered significant difficulties in finding an explanation of the back propagation mechanism of the error that I would like, I decided to write my own post about the back propagation of the error using the Word2Vec algorithm. My goal is to explain the essence of the algorithm using a simple but non-trivial neural network. In addition, word2vec has become so popular in the NLP community that it will be useful to focus on it.

This post is connected with another, more practical post that I recommend reading, it discusses the direct implementation of word2vec in python. In this post, we will focus mainly on the theoretical part.

Let's start with the things that are necessary for a true understanding of back propagation. In addition to concepts from machine learning, such as the loss function and gradient descent, two more components from mathematics come in handy:

linear algebra (in particular matrix multiplication)
rule of the chain of differentiation of functions from many variables

If you are familiar with these concepts, then further considerations will be simple. If you have not mastered them yet, you can still understand the basics of backpropagation.

First, I want to define the concept of back propagation, if the meaning is not clear enough, it will be disclosed in more detail in the following paragraphs.

1. What is a backpropagation algorithm?

Within the framework of a neural network, the only parameters involved in training the network, that is, to minimize the loss function, are weights (here I mean weights in the broad sense, including displacements). Weights change at each iteration until we get to the minimum of the loss function.

, — , , .
, , .
, , , , w1 w2.

1. .

, w1 w2 .

, . , $\partial\mathcal{L}/\partial w_1$ $\partial\mathcal{L}/\partial w_2$ , , . $\eta$ , .

2. Word2Vec

word2vec, , , . , word2vec, NLP., word2vec [N, 3], N - , . , , '', , ( ), , ''. , word2vec .word2vec : (CBOW) (skip-gram). , CBOW, , skip-gram.. , woed2vec .

3. CBOW

CBOW . , :

2. Continuous Bag-of-Words, ,a = 1 (identity function, , ).Softmax.one hot encoding , , , , , 1.: ['', '', '', '', '', '']OneHot('') = [0, 0, 0, 1, 0, 0]OneHot(['', '']) = [1, 0, 0, 1, 0, 0]OneHot(['', '', '']) = [1, 0, 0, 0, 1, 1], W

$V\times N$ ,

$W’$

$N\times V$ , V — , N — ( , word2vec)y t, , , , , ., .

, word2vec :"I like playing football"CBOW (2) ., 4 , V=4, , N=2, :

$\textrm{Vocabulary}=[\textrm{“I”}, \textrm{“like”}, \textrm{“playing”}, \textrm{“football”}]$ '' '' , . :

, one-hot encoding.

, , , . , , .

3.1 (Loss function)

1, , x:

$\begin{eqnarray*} \textbf{h} = & W^T\textbf{x} \hspace{7.0cm} \\ \textbf{u}= & W'^T\textbf{h}=W'^TW^T\textbf{x} \hspace{4.6cm} \\ \textbf{y}= & \ \ \mathbb{S}\textrm{oftmax}(\textbf{u})= \mathbb{S}\textrm{oftmax}(W'^TW^T\textbf{x}) \hspace{2cm} \end{eqnarray*}$ , h — , u — , y — ., , , (wt, wc). , onehot encoding ., onehot wt ( ).softmax , :

Error back propagation algorithm using Word2Vec as an example

1. What is a backpropagation algorithm?

2. Word2Vec

3. CBOW

3.1 (Loss function)

3.2 CBOW

3.3

3.4

4. CBOW

5. Skip-gram

6.

Sitelinks

More articles: