Spartan neural network training

One of the problems of training neural networks is retraining. This is when the algorithm learned to work well with the data that it saw, but on others it copes worse. In the article, we describe how we tried to solve this problem by combining training with gradient descent and an evolutionary approach.


image


If someone wants to better understand what will be discussed below, then you can read these articles on the hub : article1 and article2


( , , , )


, , . .


, , . , .


P.S. , , ., , , . .


:


-


  1. N -
  2. K
  3. -
  4. ( )
  5. goto 1

:


— CIFAR10
— resnet18
— SGD
— CrossEntropyLoss
— accuracy
5
50 ,
40-50


: , ..


№1. .


.


: Adding Gradient Noise Improves Learning for Very Deep Networks, Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks.


For i in range(N):
    N  - 
       SGD
   #   
   For k in range(K):
         
             .

, .


G .


:


  1. , , .
  2. ( 1) . — . ?
  3. .

.


image
1. accuracy.

, . accuracy , , . . ( 2).


image
2. .

.
? ? . SGD () . . , .


:
Accuracy 47.81% — .
Accuracy 47.72% — SGD.


. 40 . SGD . .


image
1. ccuracy, resnet18, CIFAR10 10 , SGD. 40 5 . . SGD, .
image
:--:
2. ccuracy, resnet18, CIFAR10 10 , SGD. 40 5 . accuracy. SGD, .

4 , resnet18 accuracy. accuracy. .


. .
, . , .


, .


.. , . . , .


.


:


  1. , .
  2. . , .


  1. . .
  2. backward.

№2.


OpenAI Evolution Strategies as a Scalable Alternative to Reinforcement Learning, https://github.com/staturecrane/PyTorch-ES


For i in range(N):
    N  -   SGD
   For k in range(K):
         
         
          
     


— . -1 1 σ, .



normalized_rewards = (rewards - np.mean(rewards)) / np.std(rewards)
for index, param in enumerate(self.weights):
   A = np.array([p[index] for p in population])
   rewards_pop = torch.from_numpy(np.dot(A.T,normalized_rewards).T).float()
   param.data = param.data + LEARNING_RATE/(POPULATION_SIZE * SIGMA) * rewards_pop

. - . , . .



/ — SGD+
/ — SGD


CPretrained

Loss — , . SGD , , .


Validation — accuracy. , - 5 SGD , , , SGD+


Final score — accuracy , .


: SGD , c .
, , , , Google .


accuracy


Pretrained
SGD47.72%68.56 %
47.81%68.61 %
SGD + OpenAI49.82%69.45 %

:


  1. Adam, , . .
  2. It was possible to make mutations part of the optimizer, rather than writing a separate shell for this
  3. It took several times more time than we planned

We will be glad to receive feedback not only on the content, but also on the article itself as a whole. If you are interested in this topic or were interested, then write too, it would be great to talk, maybe we missed something.


useful links


  1. Adding Gradient Noise Improves Learning for Very Deep Networks
  2. Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks
  3. Evolution Strategies as a Scalable Alternative to Reinforcement Learning

All Articles