49. Pros and cons of end-to-end learning

We continue to consider the speech recognition system:

Most of the elements of this conveyor are created without the use of machine learning (developed by people or hand-designed):

MFCC is a set of sound features extracted by mathematical manipulations with frequencies that do not require learning algorithms. This provides a convenient convolution of the incoming signal with the loss of irrelevant information.
Phonemes - the invention of linguists. Using them, a simplified model of the sounds of live speech is created. Like any model of a complex phenomenon, phonemes are not perfect, the quality of the system of which they are a part is limited by their imperfect reflection of reality.

On the one hand, non-learning algorithms (hand-engineered components) limit the potential performance of the speech system. On the other hand, their use has certain advantages:

The functions of the MFCC are resistant to certain properties of speech that do not affect the meaning of what was said, for example, the tonality of the voice. Their application simplifies the task for the trained algorithm.
Phonemes , if they correctly reflect the sounds of real speech, help the learning algorithm to catch the basic sound elements, improving the quality of its work

:

, (hand-engineered), . , , , (hand-engineered pipeline).

, , MFCC . , , , , , .

, , , « » — . , ( , ). , , .

, , . . . , (hand engineering).

, , . .

50. :

? . .
:

. , , . (, Amazon Mechanical Turk) . , .

, :

, , (: , ). , . . , . . .

In general, if large samples are available for training “intermediate modules” of the conveyor (such as a car detector or a pedestrian detector), then you can consider using a conveyor consisting of several components. Such a non-cross-cutting approach would be preferable, since it allows you to use all available data.

I believe that until there is more data for training end-to-end systems, a non-end-to-end (pipeline) approach is much more promising for the development of autonomous driving systems: its architecture better matches the available data.

continuation

Translation of Andrew Un's book, Passion for Machine Learning, Chapters 49 and 50

49. Pros and cons of end-to-end learning

50. :

More articles: