High Reasoning on Deep Learning

Hi friends. today's material is dedicated to the launch of the next set of groups in the basic and advanced courses “Mathematics for Data Science”.




Today we will touch on some thoughts about deep learning methods. We start with a review of the template methods for applying deep learning in the scientific environment, and then we will talk about the end-to-end design process, as well as briefly about the features of alternative machine learning methods that may turn out to be more promising for solving specific problems.

Science Deep Learning Templates


How are deep learning methods typically used in the scientific community? At a high level, you can formulate several template methods with which deep learning can be used in the following tasks:

  1. Predictions. , – (). . , , , , . , « / ». – , , . , ( ) (, ), .
  2. . , , . . , . , , , . , . , .
  3. . , (, , , ), . , , , , .

Workflow


Using the aforementioned deep learning application templates, we will look at the workflow of designing a deep learning system from start to finish. In Figure 1, you can see what a typical deep learning workflow looks like.


Figure 1: Diagram of a typical deep learning workflow.

A typical deep learning application development process can be seen as consisting of three main steps: (i) the data processing step, (ii) the training component, (iii) validation and analysis. Each of these stages includes several stages and methods associated with them, which is also shown in the figure. In this review, we will cover most of the training stage methods, and several validation and data analysis techniques. Please note that while the natural sequence includes first processing the data, then training, and eventually validation, the standard development process is likely to lead to several iterations of stages, that is, the method or choice made at a particular stage will be reviewed based on the results of a later stage.

Having chosen a forecasting problem that interests you, you can think about three stages of designing and using a deep learning system: (i) the data processing stage, for example, collecting, labeling, preprocessing, visualization, etc., (ii) the training stage, for example, choosing a model neural network, the definition of tasks and methods for training the model, (iii) the validation and analysis stage, where, based on the data obtained, an efficiency assessment is carried out, as well as the analysis and interpretation of hidden ideas and ablative studies of general methods.

Naturally, these three stages follow each other. However, very often the first attempt to create a deep learning system is unsuccessful. To solve the problems, it is important to remember the iterative nature of the design process, in which the results of the various stages serve as the basis for a review of the architecture and re-execution of other stages.

On Figure 1 shows examples of common iterations with bilateral connecting arrows: (i) arrow Iterate (1) , which corresponds to the iteration in the process data collection, since it happens that after the data visualization process leyblinga raw data may need to be adjusted, since the result was too noisy or not captured the desired target; (ii) arrow Iterate (2), which corresponds to iterations in the learning process, for example, if another goal or method is more suitable, or if the learning process needs to be divided into several stages, conducting first self-supervision , and then training with the teacher; (iii) the Iterate arrow (3) , which is responsible for changing data processing steps based on the results of the training phase; (iv) arrow Iterate (4)is responsible for changing the architecture of the learning process based on the results obtained at the validation stage in order to reduce the training time or use a simpler model; (v) the Iterate arrow (5) is an adaptation of the data processing steps based on the validation / analysis results, for example, when the model relies on false data attributes, and the data must be reassembled to avoid this.

Research Focus and Nomenclature


In this section, we will talk about the many methods used at the training stage, along with some methods that are characteristic of the data processing and validation stages (for example, augmentation, interpretability and analysis of representations).

At the training stage, we consider popular models, tasks and methods. By models (which are also sometimes called architecture) we understand the structure of a neural network of deep learning - the number of layers, their type, number of neurons, etc. For example, in the task of classifying images, images are input, and the probability distribution over the (discrete) set of different categories (or classes) comes in. By methods, we mean the type of training used to train the system. For example, learning with a teacher is a popular learning process when a neural network receives tagged data, where labels indicate observations.

Unlike various models and tasks, methods can be subsets of other methods. For example, self-supervision- this is a method in which a neural network is trained on data instances and labels, where labels are automatically created on data instances, this method can also be attributed to teaching methods with a teacher. That sounds a bit confusing! However, at this stage, it is enough to have at least a general understanding of models, problems, and methods.

Use deep learning or not?


Before diving into the various methods of deep learning, it is important to formulate the problem and understand whether deep learning will provide the right tools for solving it. Powerful basic models of neural networks offer many complex functionalities, such as complex image transformations. However, in many cases, deep learning may not be the best first step or may not be suitable for solving the problem. Below we briefly review the most common machine learning methods, especially in scientific contexts.

Dimension Reduction and Clustering. In the scientific community, the ultimate goal of data analysis is to understand the basic mechanisms that generate patterns in the data. When the goal is so, dimensionality reduction and clustering are simple but extremely effective methods for revealing hidden data properties. They often turn out to be useful in the step of researching and visualizing data (even if more complex methods are later used).

Dimension reduction.Dimension reduction methods are linear, that is, they are based on linear transformations to reduce the data dimension, or nonlinear, that is, reducing the dimension with approximate preservation of the nonlinear data structure. Popular linear methods for reducing dimensionality are the Principal Component Method and non-negative matrix decomposition, while non-linear methods are the Stochastic embedding of neighbors with t-distribution and UMAP. Many dimensional reduction methods already have quality implementations in packages like scikit-learn or on github (e.g. github.com/oreillymedia/t-SNE-tutorial or github.com/lmcinnes/umap ).

Clustering. Clustering techniques often used in conjunction with dimensional reduction provide a powerful way to identify similarities and differences in a data set. Methods are commonly used, such as the k-means method (often a modified k-means method), a mixture model of Gaussian distributions, hierarchical clustering, and spectral clustering. Like dimensional reduction methods, clustering methods have good implementations in packages like scikit-learn .

Linear regression, logistic regression (and variations). Perhaps the most fundamental methods for solving the problem of teaching with a teacher, such as classification and regression, linear and logistic regression and their variations (for example, Lasso and ridge regression) can be especially useful in case of limited data and a clear set of (possibly pre-processed) features (for example, in the form of tabular data). These methods also make it possible to evaluate the adequacy of the formulation of the problem and can be a good starting point for checking a simplified version of the problem being solved. Due to its simplicity, linear and logistic regressions are highly interpretable and provide simple ways to perform attribute attribution.

Decision Trees, Random Forest and Gradient Boosting. Other popular class of methods are Decision Trees, Random Forest and Gradient Boosting. These methods can also work in conjunction with regression / classification problems and are well suited for modeling non-linear relationships between input and output. The random forest, which is part of the ensemble of decision trees, can often be preferred to deep learning methods in conditions where the data have a low signal-to-noise ratio. These methods may be less interpretable than linear / logistic regression, however, in a recent work, we looked at software libraries under development that solve this problem.

Other methods and resources. All of the above methods, as well as many other popular methods, such as graphical models, Gaussian processes, Bayesian optimization, are discussed in detail in the University of Toronto's Machine Learning course or at Stanford CS229, in detailed articles on towardsdatascience.com and interactive tutorials such as d2l.ai /index.html (called Dive into Deep Learning) and at github.com/rasbt/python-machine-learning-book2nd-edition .



What tasks does a data scientist solve? What sections of mathematics and for what tasks do you need to know? What are the requirements for data scientists? What knowledge in mathematics is needed to stand out from the crowd and secure career progress ? Answers to all these questions and not only can be obtained at our free webinar , which will be held on May 6. Hurry up to sign up !



All Articles