“Sorry, I recognized ...” or recognize raspberries and controllers using the Tensorflow Object Detection API

At the end of last year, I wrote an article about how I was intrigued by the ability to recognize objects in images using neural networks. In that article, using PyTorch, we categorized either raspberries or an arduino-like controller on video. And despite the fact that I liked PyTorch, I turned to him because I couldn’t deal with TensorFlow right away. But I promised that I will return to the issue of recognition of objects in the video. It seems time has come to keep the promise.

In this article, we will try on our local machine to retrain the finished model in Tensorflow 1.13 and the Object Detection API on our own set of images, and then use it to recognize berries and controllers in the video stream of a web camera using OpenCV.

Want to improve your berry recognition skill by summer? Then you are welcome under cat.



Contents:

Part I: introduction
Part II: train the model in TenosrFlow
Part III: apply the model in OpenCV
Part IV: conclusion

Part I: Introduction


Those who have read the previous article about PyTorch already know that I am an amateur in questions of neural networks. Therefore, do not perceive this article as the ultimate truth. But anyway, I hope that I can help someone deal with the basics of video recognition using the Tensorflow Object Detection API.

This time I did not try to make a tutorial, so the article will be shorter than usual.
To begin with, the official tutorial on using the Object Detection API on a local machine, to put it mildly, is hardly exhaustive. As a novice, I was completely inadequate and had to focus on blog articles.

To be honest, I would like to try TensorFlow 2.0, but in most publications, at the time of this writing, migration issues were not completely resolved. Therefore, in the end, I settled on TF 1.13.2.

Part II: teaching a model at TensorFlow


I drew instructions for teaching the model from this article , or rather from its first half, until JavaScript was applied (If you do not speak English, you can see an article on the same topic in Habré ) .

True, in my case there are several differences:

  1. I used Linux because Anaconda for Linux already has built protobuf and pycocoapi, so I didn't have to build them myself.
  2. TensorFlow 1.13.2, Object Detection API 1.13 , TensorFlow 1.13.2. master TF 1.15, 1.13.
  3. numpy — 1.17.5, 1.18 .
  4. faster_rcnn_inception_v2_coco ssd_mobilenet_v2_coco, , .

Just in case, I’ll say that I did not use a graphics accelerator. Training was carried out only on processor capacities.

A set of images, a configuration file, a saved graph, as well as a script for image recognition using OpenCV, as always, can be downloaded from GitHub .

A long 23 hours of model training have passed, all the tea in the house has already been drunk, “What? Where? When?" inspected and now my patience finally came to an end.

We stop training and save the model.

Install OpenCV in the same environment of “Anaconda” with the following command:

conda install -c conda-forge opencv

I eventually installed version 4.2.

Further, the instructions from this article we will no longer need.

After saving the model, I made one mistake that was not obvious to me, namely, I immediately tried to substitute the graph.pbtxt file used earlier in the training / folder in the function:

cv2.dnn.readNetFromTensorflow()

Unfortunately, this does not work this way and we will have to do one more manipulation to get graph.pbtxt for OpenCV.

Most likely, the fact that I now advise is not a very good way, but for me it works.

Download tf_text_graph_ssd.py , and also tf_text_graph_common.py put them in the folder where our saved graph is located (I have this inference_graph folder).
Then go to the console in this folder and execute from it a command of approximately the following contents:

python tf_text_graph_ssd.py --input frozen_inference_graph.pb --config pipeline.config --output graph.pbtxt

And that’s all it remains to upload our model to OpenCV.


Part III: apply the model in OpenCV


As in the article about PyTorch regarding the work with OpenCV I took as a basis the program code from this publication .

I made small changes to simplify it a little more, but since I do not fully understand the code, I will not comment on it. Works and nice. It’s clear that the code could have been better, but I don’t have time yet to sit down for OpenCV tutorials .

OpenCV code

# USAGE
# based on this code https://proglib.io/p/real-time-object-detection/
# import the necessary packages
from imutils.video import VideoStream
from imutils.video import FPS
import numpy as np
import imutils
import time
import cv2

prototxt="graph.pbtxt"
model="frozen_inference_graph.pb"
min_confidence = 0.5

# initialize the list of class labels MobileNet SSD was trained to
# detect, then generate a set of bounding box colors for each class
CLASSES = ["background", "duino","raspb"]
COLORS = [(40,50,60),((140,55,130)),(240,150,25)]

# load our serialized model from disk
print("[INFO] loading model...")

net =cv2.dnn.readNetFromTensorflow(model,prototxt)

# initialize the video stream, allow the cammera sensor to warmup,
# and initialize the FPS counter
print("[INFO] starting video stream...")
vs = VideoStream(src=0).start()
time.sleep(0.5)
fps = FPS().start()

# loop over the frames from the video stream
while True:
	# grab the frame from the threaded video stream and resize it
	# to have a maximum width of 400 pixels
	frame = vs.read()
	frame = imutils.resize(frame, width=300)

	# grab the frame dimensions and convert it to a blob
	(h, w) = frame.shape[:2]
	blob = cv2.dnn.blobFromImage(frame, size=(300, 300), swapRB=True)

	# pass the blob through the network and obtain the detections and
	# predictions
	net.setInput(blob)
	detections = net.forward()

	# loop over the detections
	for i in np.arange(0, detections.shape[2]):
		# extract the confidence (i.e., probability) associated with
		# the prediction
		print (detections)
		confidence = detections[0, 0, i, 2]

		if confidence > min_confidence:
			# extract the index of the class label from the
			# `detections`, then compute the (x, y)-coordinates of
			# the bounding box for the object
			idx = int(detections[0, 0, i, 1])
			box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
			(startX, startY, endX, endY) = box.astype("int")

			# draw the prediction on the frame
			label = "{}: {:.2f}%".format(CLASSES[idx],
				confidence * 100)
			cv2.rectangle(frame, (startX, startY), (endX, endY),
				COLORS[idx], 2)
			y = startY - 15 if startY - 15 > 15 else startY + 15
			cv2.putText(frame, label, (startX, y+3),
				cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 1)

	# show the output frame
	cv2.imshow("Frame", frame)
	key = cv2.waitKey(1) & 0xFF

	# if the `q` key was pressed, break from the loop
	if key == ord("q"):
		break

	# update the FPS counter
	fps.update()

# stop the timer and display FPS information
fps.stop()
print("[INFO] elapsed time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))

# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()


So, everything is ready. We launch the model, point the lens at my old CraftDuino and enjoy the result:



At first glance, it’s not at all bad, but it’s only at first glance.
It looks like in 23 hours, the model was retrained, therefore it gives serious errors when defining objects.

Here is a visual demonstration:



As you can see, not only a knife, but even just a black background, this model defines it as an arduino-like controller. Perhaps this is because in the training data there were dark pictures with the Arduino and its analogs, on which the model managed to bump in 23 hours.

As a result, I had to load my computer for another 8 hours and train a new model.

Things are much better with her.

Here is an example with CraftDuino:



Live raspberries are not at hand. I had to print pictures. From the screen of the phone or monitor, you can also recognize, but from paper it was more convenient.



Let's check how the model recognizes the Arduino nano, which in due timeDrzugrikfor me, I soldered into my mega device with sensors:



As you can see, it recognizes quite well, but with a very poor angle and in warm lighting, it can recognize some fragments like raspberries. But in fact, a frame with an error was hard to catch in the lens.

Now let's check how she classifies those objects on which she was not trained.

Again, an example with a knife and a black background:



This time everything works as it should.

We will offer our model to recognize the Canny 3 tiny controller, which I wrote about in a previous article .



Since our model does not know anything except raspberries and arduino-like controllers, we can say that the model recognized the Canny controller quite successfully.

True, as in the case of the Arduino nano, a lot depends on the angle and lighting.

With the warm light of an incandescent lamp and with an unsuccessful angle, the controller may not only not be recognized, but even be defined as raspberry. True, as in the past case, these angles still had to try to catch in the lens.



Well, the last case is a kind of curtsy for the article about the classification of images in PyTorch . Like last time, the Raspberry Pi 2 single-board computer and its logo are compatible in one frame. Unlike the previous article, in which we solved the classification problem and chose one most probable object for the image, in this case both the logo and the Raspberry itself are recognized.




Part IV: Conclusion


In conclusion, I want to say that despite the inexperience of this small example of working with the Tensorflow Object Detection API, it took both days off and part of Monday, I have no regrets. When at least a little understanding of how to use it all becomes insanely curious. In the learning process, you begin to regard the model as a living one, follow its successes and failures.
Therefore, I recommend to everyone who is not familiar with this one day try and recognize something of their own.

Moreover, as it has risen in the process, you do not even need to buy a real webcam. The fact is that during the preparation of the article, I managed to break my webcam (broke the focus mechanism) and already thought that I would have to abandon everything. But it turned out that with the help of Droidcam you can use a smartphone instead of a webcam (do not count for advertising). Moreover, the shooting quality turned out to be much better than that of a broken camera, and this greatly influenced the quality of recognition of objects in the image.

By the way, since Anaconda has a normal pycocotools buildI found only for Linux, and I was too lazy to switch between operating systems, I prepared this entire article only using open source software. There were analogues of both Word and Photoshop and even a driver for the printer. The first time in my life this happened. It turned out that modern versions of Linux OS and application programs can be very convenient, even for a person using Microsoft OS for more than 25 years.

PS If someone knows how to properly run the Object Detection API
for Tensorflow version 2 and higher, please unsubscribe in PM or in a comment.

Have a nice day and good health!

All Articles