🧙🏻 👼🏻 🛑 Computer vision on Intel OWT WebRTC server with hardware acceleration 🔛 ✨ 👵🏻

WebRTC has simplified (for the most part) receiving and sending video streams in real time. So, you can have some fun with them using machine learning. Last month, I showed how to run Computer Vision (CV) locally in a browser . As I mentioned, locally is, of course, good, but sometimes higher performance is required, and for this we need a remote server. In this post I will talk about how to run OpenCV server models with hardware acceleration on Intel chipsets using the Open WebRTC Toolkit (OWT) with open source code.

I wanted to play with the OWT server since Intel demonstrated the computer vision features of Kranky Geek , and now I was fortunate enough to work with their development team to explore the capabilities of the server. Below I will talk about how to install OWT locally for quick testing, as well as demonstrate some models.

Open WebRTC Toolkit (OWT)

Intel released its Intel Collaboration Suite for WebRTC around 2014. This package consisted of server and client SDKs designed to use Intel hardware. The company continued to expand this set of software, adding new features and improving its capabilities. Later, in 2018, Intel opened the source code for the entire project under the Open WebRTC Toolkit (OWT) brand. They still propose using Collaborate Suite for WebRTC, according to them, the only difference is bundling with additional Intel QA (which is not so rare in open source projects supported by commercial companies). In this post, we will focus on open source OWT.

You can go to the OWT homepage at: 01.org/open-webrtc-toolkit

What does a media server do

An OWT media server can act as a Multipoint control unit (MCU) server, where media files are decoded, processed, and transcoded before being sent back to clients in addition to the more typical Selective Forwarding Unit (SFU) method . Intel's OWT is seen as a real-time media processor with features for the following applications:

Multipoint conferences - SFUs have proven to be the predominant architecture for WebRTC conferences, but MCUs are still needed in scenarios where client side processing is limited (for example, on an IoT device), or in combination with one of the points below.
– MCU , .
– -WebRTC , , RTSP, RTMP, HLS, MPEG-DASH.
– .
SIP-gateway – WebRTC , VoIP-.
– ( ).

The server is built on node.js with MongoDB for the database and RabbitMQ as a message broker. The functions listed in the list above, as well as those not included in the list, are implemented as various Agents connected to the OWT architecture.

In addition, OWT has a client SDK for interacting with a media server. It can also be used in P2P mode.

Acceleration

The architecture was designed to use Intel hardware. This includes most modern Intel processors and an even more accelerated processors with integrated graphics, CPLDs (FPGAs) and specialized processors machine vision Intel (Vision Processing Unit - VPU) . (Here's the project I created using one of their Movidius chips with the Google Vision Kit).

Analytics and Computer Vision (CV)

Anyone who has worked seriously with computer vision has come across OpenCV . OpenCV was originally a project of Intel and still remains so. Intel has a set of tools called OpenVINO (Open Visual Inference and Neural Network Optimization) for optimizing deep learning models on its hardware. It is part of the OpenCV repository . OpenCV includes dozens of pre-trained models, from basic text recognition to self-driving applications.

OWT Analytics Agent is a module for receiving real-time predictions on OpenVINO models. Analystics Agent can send output metadata to the cloud, or you yourself can send it back to the media server to make, for example, annotations for real-time video (I will show it a bit later). The well-known GStreamer library is used to manage a multimedia pipeline.

Architecture

The above diagram is taken from the server analytics guide . It looks complicated, but here you just need to remember that the Analytics Agent acts as another conference participant who can subscribe to the video channel in this conference. After he receives the video stream, you can direct the processing of the stream to various stages using the GStreamer pipeline. In most cases, you will want to perform prediction and classification before returning the video stream back to the MCU, but you can also send the stream and / or output data somewhere else.

Docker installation

Installation will take a little time, since you will need to install the OWT server and Analytics Agent. Fortunately, they have Docker build instructions for easy installation. If you want, you can run OWT + Analytics Agent as 4 separate containers for distributed environments. I decided to leave all my locally in one container in order to simplify the evaluation.

In fact, Intel initially gave me an image gst-owt-all:runfor work, since at the time of writing my article they were updating the documentation for installing the Analytics Agent. The new set is much more understandable. I still recommend that you first familiarize yourself with the standard OWT installation to understand its components and options.

In addition, you need to compile a lot with gcc. Make sure that you have the latest version by running the following command: brew install gcc

At first, nothing compiled, but after running this command everything worked.
In the end, I started building everything on my own. To create an OWT server with Analytics, run the following command:

git clone https://github.com/open-webrtc-toolkit/owt-server.git
 
cd owt-server
git branch gst-analytics
cd /owt-server/docker/gst
curl -o l_openvino_toolkit_p_2019.3.334.tgz http://registrationcenter-download.intel.com/akdlm/irc_nas/15944/l_openvino_toolkit_p_2019.3.334.tgz
http://registrationcenter-download.intel.com/akdlm/irc_nas/15944/l_openvino_toolkit_p_2019.3.334.tgz
docker build --target owt-run-all -t gst-owt-all:run \
  --build-arg http_proxy=${HTTP_PROXY} \
  --build-arg https_proxy=${HTTPS_PROXY} \
  .

After setting up the main OWT server and the Analytics service, you will need to download the necessary models from OpenCV Open Model Zoo and build an analytical pipeline for their use. For the attached examples, all this includes simply running the collector command in bash and copying some files.

Health Check on macOS

Configure docker ports

The docker option --net=hostdoes not work on macOS, so for local launch you need to open the corresponding ports:

Port	Service
8080	Web socket signal port for WebRTC
3004	Web server to download the demo page
30000-30050	UDP Ports for WebRTC

Launch docker

I installed my container like this:

docker run -p 8080:8080 -p 3004:3004  -p 30000-30050:30000-30050/udp --name owtwebrtchacks --privileged -tid gst-owt-all:run bash

Editing default OWT settings for running locally on MacOS

Here you must edit the file webrtc_agent/agent.tomlto recognize these ports.

docker start owtwebrtchacks
docker exec -it owtwebrtchacks bash
vi /home/owt/webrtc_agent/agent.toml

And then replace 0acf7c0560d8with the container name or id. And change the following:

Next, you need to tell the web resource so that the browser displays “localhost” instead of the docker's internal IP bridge (172.17.0.2):

vi /home/owt/portal/portal.toml

Again, on other platforms, you can use the default configuration if you run your container with the option --net=host.

Start server

Now you can start the server:

./home/start.sh

You may receive these errors:

2020-03-31T01:47:20.814+0000 E QUERY    [thread1] Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed :
connect@src/mongo/shell/mongo.js:251:13
@(connect):1:21
exception: connect failed

This is normal while the server is connecting. You will understand that everything works if you see something like:

starting app, stdout -> /home/owt/logs/app.stdout
0 rooms in this service.
Created room: 5e82a13b402df00313685e3a
sampleRoom Id: 5e82a13b402df00313685e3a

Test in browser

Open https: // localhost: 3004 / in a browser on the local machine. You will need to enable the certificate, as the browser will have problems with it.

And besides this, you need to enable the websocket server on localhost: 8080 . You can do this by clicking on the link “ Click this for testing certificate and refresh ”. Alternatively, you can also set #allow-insecure-localhostchrome: // flags to avoid flag issues in Chrome.

Once you do this, go back to https: // localhost: 3004 / and accept the camera resolution. Immediately select your video channel identifier in the “video from” drop-down list and click “startAnalytics”.

Finally, go to the “subsribe video” drop-down list, select the long line pipeline + video ID and click subscribe:

In the image received from the server, you should see that the face is recognized.

Adding OpenCV Models

The Analytics Agent contains the architecture of the OpenCV GStreamer Video Analytics (GVA) plugin . GVA includes a variety of modules that allow you to use various prediction schemes, such as detection, classification and identification, as well as input and output modules for sending video to users (in this case, back to OWT), providing image overlay or data streaming over mqtt.

Pipelining

In practice, these pipelines are implemented by changing some C ++ code. For example, if we look at it /home/owt/analytics_agent/plugins/samples/cpu_pipeline/mypipeline.cc, we will see various pipeline elements:

 source = gst_element_factory_make("appsrc", "appsource");
 h264parse = gst_element_factory_make("h264parse","parse");
 decodebin = gst_element_factory_make("avdec_h264","decode");
 postproc = gst_element_factory_make("videoconvert","postproc");
 detect = gst_element_factory_make("gvadetect","detect");
 classify = gst_element_factory_make("gvaclassify","classify");
 watermark = gst_element_factory_make("gvawatermark","rate");
 converter = gst_element_factory_make("videoconvert","convert");
 encoder = gst_element_factory_make("x264enc","encoder");
 outsink = gst_element_factory_make("appsink","appsink");x

And these pipelines are located in a certain sequence:

gst_bin_add_many(GST_BIN (pipeline), source,decodebin,watermark,postproc,h264parse,detect,classify,converter, encoder,outsink, NULL);

If you want to change any of their elements, you will need to recompile the pipeline using the command:

./home/owt/analytics_agent/plugins/samples/build_samples.sh

Then just copy the compiled libraries on top of the current one used in /home/owt/analytics_agent/lib/.

Getting other models

There is a huge set of models hosted under the OpenCV Open Model Zoo on GitHub . In addition to all the popular public CV models , such as mobilenet, resnet, squeezenet, vgg and many others, Intel also supports a suite that includes a wide range of models useful for detecting objects, for unmanned vehicles and for processing human actions:

action-recognition	head-pose-estimation	person-detection-action-recognition-teacher	semantic segmentation
age-gender-recognition	human-pose-estimation	person-detection-asl	single-image-super-resolution
asl-recognition	image-retrieval	person-detection-raisinghand-recognition	text-detection
driver-action-recognition	instance-segmentation-security	person-detection	text-image-super-resolution
emotions-recognition	landmarks-regression	person-reidentification	text recognition
face-detection	license-plate-recognition-barrier	person-vehicle-bike-detection-crossroad	text-spotting
face-reidentification	pedestrian-and-vehicle-detector	product-detection	vehicle-attributes-recognition-barrier
facial-landmarks-35	pedestrian-detection	resnet18-xnor-binary-onnx	vehicle-detection
gaze-estimation	person-attributes-recognition-crossroad	resnet50-binary	vehicle-detection-binary
handwritten-score-recognition	person-detection-action-recognition	road-segmentation	vehicle-license-plate-detection-barrier

Intel has more information about this here .

Adding Models to OWT Analytics Agent

To add models, you need to clone the repository, and then get the corresponding ones using the Open Model Zoo Downloader tool . After this, you need to make sure that your pipeline contains the appropriate elements (classification, detection, identification) and configure the file /home/owt/analytics_agent/plugin.cfgusing the appropriate parameters.

Plugin testing

I tried several models of face and emotion recognition.

Reference points of the face

Since I already played with the detection of touches to the face , I decided to check out the model facial-landmarks-35-adas-0002 . This model detects 35 reference points of the face.

In my face touch monitoring application, I could add MQTT streaming to the pipeline using gstreamer generic metadata publisher to capture and process anchor points. For example, I could see if the points around the eyes, nose and mouth are darkened, or even combine all this with a model for evaluating a person’s posture .

Emotion Recognition

This is another cool thing. The emotions-recognition-retail-0003 model uses a convolutional network to recognize neutral, happy, sad, surprised, and furious expressions.

It seems that my facial expression is not perceived as neutral, but as sad - perhaps such a long stay in isolation begins to get me :(

Acceleration Optimization

To take advantage of OWT hardware acceleration capabilities, be sure to install the appropriate device in /home/owt/analytics_agent/plugin.cfg- that is, write:

device = "MULTI:HDDL,GPU,CPU"

Unfortunately, I didn’t have enough time to test this, but in addition to CPU and GPU acceleration, you can also take advantage of various machine vision processor (VPU) hardware. These are specialized chips for the efficient operation of neural networks. I bought an Intel Neural Computing Card (NCS) a couple of years ago to launch more advanced CV models on the Raspberry Pi 3.

Of course, you can always find a compromise between processing power and frame rate / resolution.

Recommendations

OpenCV has a long history with a huge community of developers, it ranked 4th among all Machine Learning open source projects at the time of my popularity analysis in mid-2018 . Similarly, gstreamer is another project that has been around for ages. The Intel OWT Analytics Agent is ideally placed to help these communities add real-time streaming analysis to their projects through WebRTC. They should be able to take existing GST models and run them in real-time streaming using OWT.

If you are just starting to experiment with computer vision and want to run models on an OWT server, I recommend starting with more basic OpenCV tutorials. Then you can get to the GVA plugins.. They will require a lot of effort if you are just starting to work with OpenCV, but later you will have no trouble changing the Analytics Agent to launch them. You can optimize the stack to work with your target platform and use various Intel hardware acceleration options to improve performance.

Computer vision on Intel OWT WebRTC server with hardware acceleration