Convolutional neural network and its integration in iOS (part 1)


Means of implementation.

Various machine learning algorithms are integrated into many high-level programming languages. The most popular and fastest growing of them is Python.

TensorFlow is an open software library for machine learning developed by Google to solve the problems of building and training a neural network in order to automatically find and classify images, achieving the quality of human perception. The main API for working with the library is implemented for Python.

Keras is an open neural network library. It is an add-on for the TensorFlow and Theano frameworks. It is aimed at operational work with deep learning networks, while being designed to be compact, modular and expandable.

The latest version of Python 3.7.2 is selected as the interpreter. The development environment is PyCharm Community.

Data preparation.

In the modern world, for most AI tasks, you do not need to collect a data set manually, there are many resources where, after registration, you can download ready-made datasets. A data set with ten digits in sign language was selected (Fig. 1).

Fig. 1. - Numbers in sign language.

The algorithm will need to detect the presence of a gesture in the photo and classify it with a number from 0 to 9. Before building the architecture of the neural network, you need to prepare data for its training. First, Sign Language Digits Dataset data is downloaded. The first file contains 2063 black and white pictures with numbers in sign language. Pictures have a size of 64x64 pixels. The second file contains a label (vector) corresponding to the picture.

The pixel values ​​in the pictures are normalized and are in the interval (0,1). You can immediately use them to enter the neural network, but the most optimal result will be achieved if you follow the standardization procedure. After its execution, the rules will be satisfied for each matrix of pixels: The

average value in the matrix is ​​zero.
The dispersion in the matrix is ​​unity.

Photos are standardized and added to a new array for their subsequent combination with previous data (

Fig. 2) Fig. 2. - The procedure for loading and processing images.

Neural network architecture.

Network_sign_lang.py - the main project file in which the neural network is described. At the beginning of the script, several global variables of the neural network are described (Fig. 3).

Fig. 3. - Global options.

Batch_size - the number of images fed into the neural network for each iteration. Num_classes - the number of classes that the model will predict. Img_size - the size of the images supplied to the input layer.

The train_test_split () function from the Sklearn library is imported into the project, it splits the X and y arrays in the 80:20 ratio into training and test samples, and also randomly mixes the data so that they are not sorted by class.

Then the model is initialized (Fig. 4), the first layer of Conv2D is added. Images from the Xtrain array are input, the dimension of the output space is indicated - 64, the convolution core - 4x4, the convolution step and the activation function of the ReLU layer.

Fig. 4. - Neural network

Since neural networks typically use a much larger number of training data, the constructed model is prone to retraining. To avoid this, an effective regularization method, such as Dropout, is used. This layer will exclude the indicated percentage of random neurons so that there is no glut between the layers. The number 0.5 will mean that on each batch the algorithm will exclude half of random neurons. The second main layer is also similar to the first - convolutional. (Conv2D)

The next is the pooling layer (MaxPooling2D). It serves as another filter for model output. Since when displaying the numbers horizontally or vertically, the meaning of the image does not change, the neural network should classify them equally.

The flatten layer serves as a link between the data obtained by the algorithm and the output vector with the prediction. The last layer of the network is the Dense layer with sofmax activation function. This function allows you to get a normalized vector of probabilities for each class, where the sum of the probabilities will be equal to one.

Next, you need to compile the created model by specifying the following parameters: metric, loss function, and optimizer. As a metric, accuracy is chosen - the percentage of correctly classified examples. The loss function is categorical_crossentropy. Optimization Algorithm - Adam. Before starting the training, several callbacks are added. EarlyStopping - stops learning the neural network when its accuracy ceases to increase with the epochs of learning. ModelCheckpoint - saves the best model weights to a file for later use.

The training of the neural network is started with the saving of data about its process into the histor variable. Validation_split - Takes ten percent of the training data for validation, which is another way to regularize.

The size of the training sample is 1965 examples, the test sample is 547, and the validation sample is 219. After completion of the training process, a graph of the dependence of the accuracy obtained on the training and test data is constructed (Fig. 5).

Fig.5. - Training schedule.

The graph shows that the model was saved at the 15th era (with the greatest accuracy and the smallest gap between Xtrain and Xtest).

The next step is to load the resulting neural networks into another script to verify their functioning. The values ​​of the metrics on the test data are displayed (Fig. 6).

Fig. 6. - Checking the performance of the model.

Based on the results of the metrics, we can conclude that the models do not differ much. The lower the loss function index, the more confident the predictions of the algorithm. The accuracy or fidelity of the model shows the percentage of correctly classified photos, which means that the higher it is, the better the neural network. Consequently, it is more logical to use the first model obtained as a result of early stopping of education.

About data preprocessing and integration of a neural network of a certain format in an iOS application, we will describe in the next part.

All Articles