A selection of articles on machine learning: cases, guides and studies for February 2020



Following the January post, meet the second issue of the digest. Here you will find a list of English-language materials for February, which are written without undue academicism. The publications contain code examples and links to non-empty repositories. The technologies mentioned are in the public domain and many of them do not require heavy-duty iron for testing.

The articles are divided into four types:
Announcements of open source tools and datasets.
Practical guides for PyTorch and TensorFlow.
Case studies of machine learning.
ML research.


Announcements of open source tools


ClearGrasp

The algorithm is designed to solve the problem of recognition of transparent objects that unevenly reflect and refract light. Any standard RGB-D camera is suitable for work.

PyTorch3D

Facebook has announced a highly modular and optimized library for PyTorch , which simplifies the deep training of models in three-dimensional images.

Hydra The

new framework from the PyTorch eco-system, which is designed to solve problems associated with the complexity of projects. Provides project management capabilities through the command line and configuration files.

TensorFlow.js for React Native

The tool does not use webview for rendering and does not depend on the API of the web platforms that are used in the browser. Thus, it is a new integration platform with a backend that is suitable for this environment.

Matrix Compression Operator

The operator allows you to use any matrix compression function defined as factorization and create a tensor flow API to dynamically apply this compression during the training of any tensor flow variable.

Torchmeta Meta-Learning

Library provides a single interface for various datasets to simplify the creation of new algorithms.

AutoFlip

Often you want to change the screen orientation from horizontal (16: 9 or 4: 3) to vertical. Finally, the framework appeared, which helps to dynamically crop frames with minimal loss. The tool determines the boundaries of the frame and moving objects, leaving only the most important on the screen.



Constrained Optimization Library

A tool for TensorFlow , which allows you to reduce the degree of dishonest results when solving problems from the real world, when many additional parameters are taken into account (for example, when issuing bank loans). The tool algorithmically converts data sampling constraints into a zero-sum game for two players.

Poincare Maps

Using Hyperbolic Geometry Toolreveals the hierarchical relationships of pairwise similarities of various cells. This allows the use of machine learning to map and analyze the development of cells of organisms.

PyTorch Lightning + Torchbearer

The creators of the high-level abstraction Torchbearer have joined forces with the growing popularity of PyTorch Lightning and are now working on their team. Abstraction automates development, makes code standardized, maintained and scalable. Thus, so that researchers can focus more on science, rather than working with a code base.

Open images v6

The sixth version of the Open Images dataset was released, which significantly expanded the type of marking and comments on images. Captures for photographs are so detailed that they will also affect the further development of interdisciplinary research, where computer vision is combined with natural language processing.

CCMatrix: a dataset for training translation models The

dataset consists of 4.5 billion bittext sentences in 576 language pairs and will help in creating more advanced NMT models.

Guides


Distributed principal component method using TFX

How TensorFlow Transform allows you to apply the principal component method in a scalable form using the resources of computational clusters, and how to enable processing of transformations in a TFX pipeline.

Speeding up neural networks using TensorNetwork in Keras.

Material on how to use the TensorNetwork library to process tensor networks in the context of machine learning.

TensorFlow Lattice: Flexible, Controlled, and Interpreted Machine Learning An

introductory overview of the library 's capabilities for teaching limited and interpreted lattice models.

Cases


AR masks with TensorFlow.js

Purchased Loreal startup ModiFace shares its experience of using machine learning in the context of AR masks. The example of a beauty brand shows how machine learning can be applied in ecommerce.

Real-time license plate recognition A

step-by-step case proves that machine learning is now available to everyone. The author tells how to assemble a budget device at home, create a model, train it, place it on AWS, and also develop a client part.

Determining air pollution using a telephone

A case for creating an application that determines the level of air pollution from a photo from the phone’s camera. The problem that needed to be solved was to crowdsource data from different users for further training of the model, but at the same time ensure the safety of user data.

Adding a volume effect to two-dimensional images

Facebook shares its experience in developing a convolutional neural network
program that creates a three-dimensional image effect for two-dimensional images. When creating, it was necessary to solve a lot of problems, both in training the model and in optimizing the system to support mobile phones.



How not to go broke with the rapid growth of users

As the creators of Dungeon AI scaled to support 1 million users, and with the help of Cortex they made a microservice based on a machine learning model.

Research


Using “Radioactive Data”

The “Radioactive Data” method allows you to determine that a machine learning model has been trained using a specific data set. This can help researchers and engineers keep track of which dataset was used to train the model so they can better understand how different datasets affect the performance of different neural networks.

TyDi QA: a dataset of questions and answers in different languages

Google has published a study and data sets consisting of 200,000 questions and answers pairs from 11 languages ​​representing a wide range of linguistic phenomena. The study participants were asked to ask a related question on the basis of the text, the answer to which is not contained in the text, after which it was proposed to find the answer to the question in the Wikipedia article. And these data compiled a dataset.

Artificial creation of data sets for clinical trials

Due to various limitations, it is very difficult to create datasets with photographs of skin lesions. Now there is a tool that generates the necessary data for further training. DermGAN takes as input the real image and the corresponding pre-generated semantic map with the main characteristics of the real image (skin type, skin condition, location of the neoplasm), from which it generates a new synthetic example with the requested characteristics.

Accelerated MRI Scan

The goal of the project is to accelerate the MRI scan of patients by 10 times using AI. Snapshots are generated using DNN from raw data, and artifacts often appear in the process. The study tells how malicious machine learning has helped reduce their numbers.

Infrastructure optimization for recommendations based on DNN The

study analyzes various infrastructures that are used to issue personalized recommendations for products, videos, etc. using DNN. Tools are also provided to verify how well DNN-based production-scale recommendations work. For example, a benchmark of Intel servers used in data centers (Broadwell, Haswell, Skylake) is carried out.

Txt2π

A review of the new reinforcement learning approach. It is designed to help solve a difficult task in which the agent needs to take several steps based on the goal and knowledge of the environment, which can change. The model must learn to play a game where you need to defeat monsters based on certain rules (Read to Fight Monsters).

CNN training on ultra-high resolution images

Existing data and model parallelism methods train neural networks with billions of parameters, but training on data consisting of high-resolution images, such as CT images, remains a problem. In this paper, we consider the applicability of convolutional neural networks in ultra-high resolution images (there is a project code).

Street View Map Orienteering Training

Google collects applications from researchers who are ready to help create a data set for subsequent training of spatial orientation neural networks.

T5: A New Tool for Transfer Learning

As a result of a large-scale survey, the researchers identified the best transfer learning techniques and applied these ideas to create the pre- trained T5 model , as well as the dataset on which it was trained.

In the March selection, expect articles on the use of ML in the fight against COVID-19: determining the temperature of people in real time by infrared radiation, diagnosing the virus, tracking outbreaks of the epidemic, and more. In the meantime, that’s all. Thank you for the attention!

All Articles