Flexibility and automation in machine learning

In this article I want to talk about the main difficulties of machine learning automation, its nature and advantages, and also consider a more flexible approach that allows you to get away from some of the shortcomings.

image

Automation, by definition, Mikell P. Groover is a technology by which a process or procedure is performed with minimal human involvement. Automation has long been able to achieve increased productivity, which often leads to lower costs per unit of product. Automation methods, as well as their application areas, are rapidly improving and over the past centuries have evolved from simple mechanisms to industrial robots. Automation begins to affect not only physical labor, but also intellectual, getting to relatively new areas, including machine learning - automated machine learning (auto ml, aml). At the same time, machine learning automation has already found its application in a number of commercial products (for example, Google AutoML, SAP AutoML and others).

imageimageimage

Disclaimer
This article does not pretend to be dogmatic in the field and is the author’s vision.

Automated Machine Learning


The tasks in the field of data processing and machine learning are associated with many factors that arise due to the complexity of the system and complicate their solution. These include ( according to Charles Sutton ):

  • The presence of uncertainty and uncertainty, which leads to a lack of a priori knowledge of data and the desired dependencies. Thus, the research element is always present.
  • "Death from a thousand cuts." In practice, when building a pipeline for data processing and analysis and subsequent modeling, you have to make many large and small decisions. For example, is it necessary to normalize the data, if so, what method, and what parameters should this method have? Etc.
  • The presence of feedback loops resulting from uncertainty. The longer the immersion in the task and the data takes place, the more you can learn about them. This leads to the need to take a step back and make changes to the existing processing and analysis mechanisms.
  • In addition, the results of models obtained by machine learning algorithms are only an approximation of reality, i.e. obviously not accurate.

image

Thus, the process of obtaining a full pipeline of data processing and analysis can be considered as a complex system (i.e., a complex system).

Complex system
Peter Sloot, « » « », . , () , , () , () .. , , .

On the one hand, the presence of these factors complicates both the solution of machine and deep learning problems and their automation. On the other hand, the ever-growing and increasingly accessible computing capabilities allow us to attach more resources to the task.

imageAccording to the common CRISP-DM standard, the life cycle of a data analysis project iteratively consists of six main stages: understanding a business task, understanding and studying data (data understanding), processing data (data preparation), modeling ( modeling), quality assessment (evaluation) and practical application (deployment, application). In practice, not all of these steps can be effectively automated today.

Most works or existing libraries (h2o, auto-sklearn, autokeras) focus on modeling automation and partly on quality assessment. However, the expansion of the approach towards data processing automation allows covering more stages (which, for example, was applied in the Google AutoML service).

Formulation of the problem


The tasks of machine learning with a teacher can be solved by various methods, most of which are reduced to minimizing the loss function Jor maximizing the likelihood function Lin order to obtain an estimate of the parameters  hatθmbased on the available sample - training dataset yt:

All Articles