👨🏾‍✈️ 🐫 👩🏿‍💻 Spartan neural network training 🕒 💰 ♾

One of the problems of training neural networks is retraining. This is when the algorithm learned to work well with the data that it saw, but on others it copes worse. In the article, we describe how we tried to solve this problem by combining training with gradient descent and an evolutionary approach.

If someone wants to better understand what will be discussed below, then you can read these articles on the hub : article1 and article2

( , , , )

, , . .

, , . , .

P.S. , , ., , , . .

N -
K
-
( )
goto 1

— CIFAR10
— resnet18
— SGD
— CrossEntropyLoss
— accuracy
5
50 ,
40-50

: , ..

№1. .

: Adding Gradient Noise Improves Learning for Very Deep Networks, Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks.

For i in range(N):
    N  - 
       SGD
   #   
   For k in range(K):
         
             .

, .

G .

, , .
( 1) . — . ?
.


1. accuracy.

, . accuracy , , . . ( 2).


2. .

.
? ? . SGD () . . , .

:
Accuracy 47.81% — .
Accuracy 47.72% — SGD.

. 40 . SGD . .


1. ccuracy, resnet18, CIFAR10 10 , SGD. 40 5 . . SGD, .

:--:
2. ccuracy, resnet18, CIFAR10 10 , SGD. 40 5 . accuracy. SGD, .

4 , resnet18 accuracy. accuracy. .

. .
, . , .

, .

.. , . . , .

, .
. , .

. .
backward.

№2.

OpenAI Evolution Strategies as a Scalable Alternative to Reinforcement Learning, https://github.com/staturecrane/PyTorch-ES

For i in range(N):
    N  -   SGD
   For k in range(K):

— . -1 1 σ, .

normalized_rewards = (rewards - np.mean(rewards)) / np.std(rewards)
for index, param in enumerate(self.weights):
   A = np.array([p[index] for p in population])
   rewards_pop = torch.from_numpy(np.dot(A.T,normalized_rewards).T).float()
   param.data = param.data + LEARNING_RATE/(POPULATION_SIZE * SIGMA) * rewards_pop

. - . , . .

/ — SGD+
/ — SGD

C	Pretrained

Loss — , . SGD , , .

Validation — accuracy. , - 5 SGD , , , SGD+

Final score — accuracy , .

: SGD , c .
, , , , Google .

accuracy

		Pretrained
SGD	47.72%	68.56 %
	47.81%	68.61 %
SGD + OpenAI	49.82%	69.45 %

Adam, , . .
It was possible to make mutations part of the optimizer, rather than writing a separate shell for this
It took several times more time than we planned

We will be glad to receive feedback not only on the content, but also on the article itself as a whole. If you are interested in this topic or were interested, then write too, it would be great to talk, maybe we missed something.

useful links

Spartan neural network training

№1. .

№2.

More articles: