Communication revolution? A new approach allows you to save bandwidth of 100 or more times with audio and video calls



Many people remember that the Silicon Valley series tells of a programmer, Richard
Hendrix, who accidentally came up with a revolutionary data compression algorithm and decided to
build his own startup.

The series’s consultants even proposed a metric with which to evaluate
such algorithms - the fictitious Weissman Score.

Further on the plot, the startup made a video chat using this solution.

A respected community is invited to discuss another, completely unusual
principle of data compression for audio and video calls, which solves the problem from a new,
unexpected side.

If you want to participate in a discussion of this solution, as well as find out what this has in common
concepts with Jonathan Swift and the works of Leo Tolstoy, please, under cat.

Bit of theory


Let’s describe in general terms how modern audio communication works - the principle is the same for
calls over a GSM network, as well as for instant messengers and VOIP networks.

Sound vibrations are transmitted to the smartphone’s microphone, then to an analog-to-digital
converter (ADC or ADC):



Next, encoding is performed by various codecs (G711, G729, OPUS, GSM, etc.),
encryption is added or not added (SRTP, ZPTP, etc.). .d.) and sent to the
data medium .

For example, almost all instant messengers (WhatsApp, Viber, etc.) use the same codecs (recently it is usually Opus), and almost the same slightly
modified protocols (based on SIP, WebRTC).

The public Internet and GSM network or
intranet can also act as a data transmission network :



Encryption is an optional element in this scheme, for example, in most cases,
encryption is not used for SIP telephony.

But in messengers, on the contrary, they usually use their proprietary
protocols to encrypt voice and video.

Then the reverse process occurs - the recipient, having received the data, decodes the received information, then the signal goes to the DAC (digital-to-analog converter) and then goes to the sound amplifier connected to the speaker:



Characteristics of modern codecs:

G.711 64 Kb / s.
G.726 16, 24, 32, or 40 Kbps
G.729A 8 Kb / s
GSM 13 Kb / s
iLBC 13.3 Kb / s (30 ms frame); 15.2 Kb / s (20 ms frame)
Speex Range from 2.15 to 22.4 Kb / s.
G.722 64 Kb / s

Thus, for example, during a 7 minute conversation on WhatsApp or Skype,
about 1 MB will be used up.

Remember these numbers - 1MB for 7 minutes of conversation, we will need them soon.

“Leo Tolstoy as a mirror ... of revolution ...”


Let us recall the most famous novel of this great Russian writer:

“War and Peace” - the epic novel of Leo Tolstoy, describing Russian
society in the era of the wars against Napoleon in 1805-1812. The epilogue of the novel brings the
story to 1820.

The novel "War and Peace" L.N. Tolstoy devoted seven years of hard and hard work. The manuscripts of
“War and Peace” testify to how one of the world's largest works was created : over 5200 finely written sheets were preserved in the writer's archive.


If you now want to read this novel, then it can be easily downloaded.

And this file weighs only ... 1 MB:



The fb2 and epub formats, just like zip, rar, can basically be considered as a kind of
codecs.

Let's think - 7 minutes of our conversation on WhatsApp are equal in volume of traffic to a
great work that has been written for 7 years!

The conversation for 7 minutes was encoded by the opus codec, the novel was encoded by ePub, the volume is the same -
1MB, but what a huge difference!

Gulliver's Travels


Everyone knows this work of Jonathan Swift since childhood, but in fact this book is not for
children.

Gulliver’s Travels is a political satire for adults, of course in the context of the 18th
century.

It is surprising that Swift, being an ardent opponent of his other contemporary,
Newton, in his Gulliver’s Travels not only predicted the discovery of the satellites of
Mars (with a fairly accurate description of their characteristics), but also described a rather interesting
way of communication between people:

“... the project required the complete abolition of all words;
the author of this project referred mainly to his health benefits and
time saving .

After all, it is obvious that every word we pronounce is associated with some wear
, , .

, ,
,
.

…
.

, ,
,
,
. ,
, .
, , ,
; ,
, .

,
, , ,
. , , ,
,
.

,
as a universal language understood by all civilized nations, for furniture and household
utensils are the same or very similar everywhere, so its use can be easily understood.
Thus, envoys can easily speak with foreign kings or
ministers, whose language is completely unknown to them ... ”


So, you probably already know what I'm leading :)

Why transmit air shocks (sounds) for hundreds and thousands of kilometers,
bother with encoding (in order to transmit these air concussions to the addressee as accurately and efficiently as possible), to keep the necessary bandwidth, if the semantic
load of this transmission is minimal, or even tends to zero?

After all, people communicate with each other not with sounds, but with meaning, content, semantics, thoughts ...

The concept of a new communication system is quite simple - on the source side And sound
vibrations are also digitized, but not transmitted immediately to the other side, but
converted to text (Speech To Text) and then the meaningful text from
subscriber A is transmitted , which:

  • can be transmitted with the minimum required data bandwidth (even HF radio communication, etc. is possible)
  • can be encrypted with any strong encryption algorithm

On the B side, the received messages are decrypted and played back as a voice from
subscriber A (Text To Speech).

You can also download on the side of B the so-called the voice avatar of subscriber A, which would
exactly repeat the manner of speech of subscriber A.

A separate channel can transmit background noises and emotions.



All the same is true for video communications - moreover, individual elements have long
existed in applications (various masks, background in Zoom, etc.).

Yes, there are technical issues that are not fully implemented right now -
for example, Speech To Text conversion speed will be critical, but using
predictive AI conversion algorithms you can significantly increase this speed.

The most important advantage is that a minimum bandwidth is required in the data transmission medium
.

Those. this principle can be used not only for ordinary everyday
communications, but also for the military and for long-distance communications with long delays
(space communications, interplanetary - the Moon, Mars, etc. :))

Although this is a description of the concept, it’s actually in our project for several
months a prototype with this principle has been used.

But more about that next time ...

All Articles