Translation of Andrew Un's book, Passion for Machine Learning, Chapters 53 and 54

Previous chapters


Partial error analysis


53. Analysis of errors in parts


Let the system be a composite conveyor consisting of modules with machine learning. What component of this system should be improved first? By correlating system errors with specific elements of the conveyor, a decision can be made on prioritizing work.


Let's get back to our Siamese cats classifier example:



The first element of the system - a cat detector, detects and cuts out a fragment with a cat from the image. The second element - the identifier of the breed, decides whether the Siamese cat on the fragment or not. You can spend years working to improve any of these two components. How to decide which one to focus on?


The use of error analysis in parts implies that for each error we are trying to determine the result of the operation of which module (or sometimes several) of the composite system it is. For example, the system incorrectly determines that there is no Siamese cat in the image (y = 0), despite the fact that he is depicted on it and the correct label is y = 1.


image!


Let's manually analyze the results of each module of the system. Suppose a cat detector detects a cat as follows:


image


, :


image


, . y = 0. , , y = 0. , . , , :


image


, « ». , 100 , 90 , 10 « ». « ».


, . , . , , .


54. «»


:


image


:


image


« » , , , y = 0 ( ).


image


, « » , , , , , . , , « », « ».


, . , , , :


1. , ,



2. « » . « » , «». , «».


, , « » «» .


:


  • 1: «» , « » - y = 0. , , .
  • Case 2: on a perfectly cut fragment, the “breed classifier” correctly returns y = 1. Thus, if the cat detector produced a better fragment, the general conclusion of the system would be correct. In this case, we attribute the error to the “cat detector”.

By analyzing thus incorrectly classified examples of validation samples, we can unambiguously attribute each error to one or another component of the system. This approach allows us to estimate the proportion of errors per each element of the system, and, therefore, decide which one to concentrate on.


continuation


All Articles