Convolutional neural network and its integration in iOS (part 2)



In the previous part, we prepared the data, and also examined the implementation tools and the architecture of the neural network ( link ). Let's start the next stage of development with data preprocessing.

Upgrades

Keras provides a wide range of tools for data preprocessing, especially for images. The ImageDataGenerator method (Fig. 7) allows you to expand the data set with artificial transformations.


Fig. 7. - Data generator.

The first number rotation_range is a random number for rotating images inside the generator. Width_shift_range - shows how much you can reduce the pixel values ​​in width. Height_shift_range - the coefficient by which each pixel will be multiplied to compress in height. Shear_range - shear rate, angle of shift in the counterclockwise direction in degrees. Zoom_range - range for random scaling. Horizontal_flip - random image display horizontally. Vertical_flip - random vertical image display. Fill_mode - points outside the input data are filled in accordance with the specified mode. Data_format - image data format. Thus, the neural network will have a better generalization, because the original dataset basically has clean pixels, the numbers are in the center.In real photographs, other situations are possible when the gesture is in a corner or blurred. The learning process is being rewritten with new data.

In Fig. 8, the Test graph started to decrease, so the training was stopped. If the val_acc metric were not used to stop, the algorithm would continue to work, resulting in a retrained neural network.

Compared to past results, the value of the loss function decreased and, therefore, the degree of confidence of the algorithm in predictions increased. Accuracy increased slightly more than 2% (Fig. 8, Fig. 9).


Fig. 8 - Training schedule on new data.


Fig. 9. - Model metrics.

Conversion

For integration into iOS application you need to get a neural network of a certain format. This allows you to make the CoreML framework. It allows you to run trained machine learning models in a format. mlmodel in applications. For working with images, the Vision framework works on the basis of Core ML and helps with tracking and recognition of faces, text, objects, barcodes. Horizon determination and matrix acquisition for image alignment are also available.

To convert a model from .h5 to. mlmodel is used by the library in Python Coremltools. This library supports conversion to mlmodel format.

For the correct conclusion of the model, a dictionary is specified that will correspond to 10 digits. The next line declares, Model_check.h5 is the name of the model, it is further indicated that it will receive images on the input layer of the neural network. The image_scale field will lead to standardize the matrix of pixels. (Fig. 10) Fig. 10


. - Conversion.

Mobile app. Work with a prepared model.

An example from the official website of Apple developers is used as the basis for the project. The latest Xcode development environment, the programming language is Swift fifth version.

Files (Fig. 11) containing the code in the project are considered. AppDelegate.swift launches the application start window and initializes all files connected to the project.


Fig. 11. - The structure of the application.

A prepared model is added to the project after the conversion of sign_lang.mlmodel (Fig. 12).


Fig. 12. - Model in Xcode.

The Type line indicates that the Machine Learning Model added to the project is indeed a classifying neural network. To predict the model, black-and-white pictures with gray elements of dimension 64x64 will be input, the program will convert them to this format using Vision. The output will be a dictionary in which the pairs will be the line with the label that was specified during the conversion, and the confidence (probability) with which the neural predicts it.

The main application files.

The main file is considered - Image Classification View Controller.swift

ImageView - a window for displaying the current image on the phone screen, sent to the input of the model.

CameraButton - a button that, when clicked, will bring up a context menu offering the user to do the following: Take Photo, Choose Photo. The first action will open the camera of the smartphone, in which the photo mode will be available. (Fig. 13)

When choosing an alternative action, Choose Photo, an internal gallery will open. Pressing the Use Photo button will return the user to the main screen, where the neural network will analyze the images and give its prediction, as shown in Fig. 14.

After the word Classification, we see the conclusion of the two most likely outcomes that the model predicted. The number in parentheses is the confidence of the neural network that its prediction is correct. From Fig. 14 it can be seen that the algorithm with a confidence of 0.99 says that the number 0 is shown. And this coincides with reality. No machine learning algorithm can give 100% accuracy on real data, and the constructed neural network is no exception, and false predictions are possible, which can be seen in Fig. 15.


Fig. 13. - Work with the camera.


Fig. 14. - Prediction of the neural network.


Fig. 15. - False prediction.

As a result, we designed the architecture of a convolutional neural network for recognizing digits of sign language. Neural network training was performed, and as a result of the expansion of training data and the selection of hyperparameters, an accuracy of 81% was obtained on a test data set. The algorithm has been successfully converted for porting to a smartphone. A mobile application was developed in which a neural network was integrated.

Code Link

All Articles