👩🏿‍🏫 👨‍❤️‍👨 🎅🏿 Music generator Web Audio API Beginner Experience 🏥 🌮 🥠

Context and Background

I’m 62. Three years ago I decided to try to write a fairly complex system, before that there were only 20 lines in basic in 1981, when 3 months of computer science stuck us in the 5th year of the radio faculty (NSTU, previously NETI). The professional is considered to be the one who owns the subject and earns on it, so in fact I am a beginner.

On the other hand, I have been dealing with music-related algorithms for most of my life, found the development process and hardware, and software, and as far as I could, participated in it. In 1978, he developed and assembled the Mini-Moog synthesizer (I believe it was the first in the Urals), developed and produced the first Soviet sound card with synthesizer for Agat7 (9) in the USSR - the Soviet analog of Apple II, designed the FM synthesis module for one of the factories, collaborated with Cakewalk (USA), PGMusic (Canada), PowerFX (Sweden) - more often under the scheme: “ideas and implementation are from us, the budget is from them”, participated in international music exhibitions MusikMesse (Germany), NAMM (USA), etc.All this is a small part of what I did, of course, not alone, but with the team I had assembled at a particular moment.

Plus, he took an active part in organizing the Department of Informatics (1st in the USSR) at the Novosibirsk Conservatory. From 1983 to 2004, he also taught Musical Acoustics, Informatics, Sound Engineering, and prepared a dissertation on computer modeling of musical performance (ISBN 5-9294-0023-7) ...

Yes, it all started with black-and-white monitors, floppy disks, 256 kB RAM, a personal PC of musical assembly ... The musicians had to explain everything on their fingers, because for the most part these wonderful people are far from everything strict and orderly. And if a trombonist was able to fill a page of text, he was cool, and the cellist who programmed Chizhik-fawn is just a star. By the way, some teachers actively resisted this whole topic - but now try to deprive them of computers!

This experience has become very useful for my development, training and deepening in the subject. Computer science, like physical education, was mandatory for all faculties and a constant flow of students allowed us to conduct many tests, experiments in psychoacoustics, pattern recognition, communicate with theoretical musicians and ... understand that they have practically no exact data and formulas. At least those that are needed by the developer of musical equipment and programs. Musicologists, unfortunately, are essentially historians who describe the past. If they had science, then ... probably, getting to their consultation would be addition, than to the mayor of a city with a million-plus population. How much would a famous musician pay for a “hit formula” and a “sales forecast”? ..

So, let's say we need to get answers that work for such algorithms, for example:

a) a melody sounded (well, even if it has already been recorded with “notes” in MIDI), how to determine where the tonic is? - The musician will do it easily, but a clear answer (algorithm) will not give you;

b) there is a melody, how to choose the right chords for it? .. Yes, the choir conductor will probably do it better than the vocalist, but both, even if they give instructions, are not very clear. Of course, from their point of view, it will be strict. Only when the engineer begins to translate it into an algorithm will there be several difficult to overcome undefined ...

c) or does the drummer play something moderately complicated, or does the saxophonist improvise - how to “take off” the size, strong / weak beats, the beginning / end of a musical phrase, expressive slowdown / acceleration? It's not about EDM (Electronic Dance Music). The performer musician feels and complies with the “rules” conveyed by the teacher in a “do as I do” way, but an engineer needs numbers, graphics, proportions, and all this is practically impossible to obtain from either the performer or the musicologist-theorist.

But it was on such questions that I wanted to find answers. Indeed, a real-sounding work may have a crude model — a musical composition in MIDI format, that is, in essence, an “electronic score”. These are exact data, but how to work with them? In general, the most interesting for me in the end was the analysis and conversion of MIDI data and such algorithms * as:

Performance modeling ¹⁾ ,
Morphing ²⁾ ,
Generation of music ³⁾

And for all this, the definition of tonality, auto-harmonization, auto-phrasing are just separate sub-tasks ... Over time, acceptable answers were received for all these topics, algorithms and programs were created that it even got into Cakewalk / Sonar.

* A small decryption:

1) analysis of quantized MIDI data, recognition of “musical objects” (phrases, figures) and their change in the part of Velocity, NoteOn, Duration, as well as the imposition of Tempo, PitchWeel, Expression, Modulation curves in order to make fuller use of synthesizer resources and achieving greater expressiveness.

2) analysis of both the whole work and individual parts with the goal of transforming them from one size to another (for example, from 4/4 to 6/8 or 7/4), from one harmony / harmony to others. Moreover, that the result would be "edible" - such that, in the opinion of the musician, is the "correct" music without obvious violations.

3) “electronic score” generation - a system which, when using 1) and 2) and playing MIDI data, would produce an output similar in sound to that produced by a musician, arranger using a computer or a live instrument. Those. not something abstract, generated by AI, not only for demonstration in a narrow circle of specialists, but completely “human” and suitable for use.

In this long story, there were quite a few funny, if you look from the present, moments, for example, for the first 386 PC I had to give 2 new Muscovites “Chignon” (IL 2715), each 290 thousand rubles. Or, as a conservatory, I almost acquired a used Minsk 32 in the Siberian branch of the Academy of Sciences - I believe this monster, weaker than a smartphone, seemed to occupy 60-80 m ² and consumed electricity, probably like a bar + sauna ...

So, my employees programmed (we worked with Z80, MOS 6502, "Electronics-60", on the first Win PCs from the GDR, with TMS and Analog Devices signal processors ...), every day I saw "krakozyabra" on the screens and thought that all of them (programmers) are "not of this world." Although the musical score is not easier! And after all, some musicians hear music, looking at the notes, and even get aesthetic pleasure if they like it! In general, it seemed to me that everyone who writes the code is geniuses and this is not mine at all ... But, as the future showed, I was mistaken. I can’t say that a lot of time was lost, but probably the fate of many projects would have been completely different ...

First practical experience

From previous experience, I learned that the most “difficult”, “unpleasant” things for a programmer are 2 things: GUIs and interfaces (that is, switching from one environment to another. For example, receiving MIDI data from an external device), especially when There are no suitable ready-made solutions.

First I took AutoPlay Media Studio (Indigo Rose Software) - it's a bit of a designer for dummies. Initially, apparently designed to quickly create a CD, DVD menu and autorun.exe. But it turned out that there is a sea of functions, and inside (on LUA) you can do almost everything: animation, playing ogg. Thus, most of the "unpleasant" work (by the way, I think, newcomers just stop at such moments) could not be done - everything is ready.

As a result, for 2 months at home after hours I wrote an analogue of a training music program that we developed and officially delivered. Moreover, with another GUI, more interesting functionality - it’s good to make audio and graphic content and test for me is not difficult. He showed it in the office and asked one of the presenters how much time he needed to make such a version, the answer was 6 months! Here, apparently, I thought for the first time: maybe I should not only indulge myself, but start studying something and try to do something serious and ... did I organize the work correctly. In fact, his answer was honest - simply, “they” would not use the constructor, but they would write everything themselves.

Since 2009, I devoted part of the time to working in a recording studio and creating video ads. Once, a friend asked if it was possible to quickly (and not expensively) make a hundred 1-minute clips (apparently for promotion on YouTube). Awful love such tasks! (By the way, I have long known that if a programmer is forced to do something not very creative 3 times, he will write a script). I asked the programmers, and if there is such a program, what can it itself “press” the buttons on the screen? Wow, this is AutoIt!

In general, for fun, I wrote a script that:

Launched Opera, opened the site of some (it seems, Dutch) company specializing in text-to-speech;
On the page I scrolled to RU (voice acting in Russian);
opened a * .txt file in Notepad, where 20-30 jokes were copied in advance from the network, paragraphs;
copied one joke, pasted it into the field on the site;
Launched SoundForge on record, and on the site turned on playback;
recorded and saved the result in a folder in mp3 with the name - “serial number + the first 15 characters from the text of the joke itself”.

It worked and I was happy as a child. After that, I told my friend - there’s no problem making 100 inexpensive clips (of good quality), just pick up pictures, music, headlines, subtitles, descriptions, etc. - I’ll put everything in order (in size, color), put it in folders, make a template + I’ll determine its variations (random!) and ... I can make you not 100, but 200 of these clips (in Vegas Pro with AutoIt it’s easy), watching them selectively, and you’ll already select the right amount for yourself ... You didn’t have to do all this, but thanks for her question!

Later I gradually began to correct something on the site, read something, try something for the sake of sports interest. In particular, to do scripts on JavaScript / CSS for the simple animation necessary for creation of video. So some general ideas and a little experience appeared in PHP, HTML, jQuery, JavaScript, CSS, MySQL. “Crack-overs” almost disappeared and I stopped being afraid of this very topic of “programming”. The final turning point in relation to her and her abilities occurred when I understood 2 points:

this whole sphere is an endless world and even a strong programmer will not always answer the question if it is not his area. Just like in any other field. On the general - yes, well, and the nuances - look for a narrow specialist, but rather dig yourself. Fortunately there is a network.
I solved 99% of the questions that I had during the development myself, and sometimes more efficiently than it was in the examples found on the Internet. At the same time, I understand that by the style of writing, my code could be awful.

Music generator

I don’t remember how I stumbled upon the Web Audio API, by this time I didn’t have an LLC, an IP, or a team, but after 2-3 experiments, when the “sound went”, I swung at the most serious for me the project is a music generator (which I now call AlexAr ). Indeed, in Web Audio there was everything necessary - generators, filters, envelopes, mixers, a processor (in C ++), and jQuery, JavaScript and CSS allowed us to solve all issues with the GUI. In fact, in Web Audio you can do, as it seems to me, both processing and synthesis of sound of any complexity.

Here the real work began, almost, as in youth, when you are 25+. 3-4 times I started over, discarding one version of the system and collecting a new one. Sometimes, radically changing the approach to designing a particular module. At the same time I did a lot of experiments and tested new algorithms that I had not used before. He made several discoveries for himself that would not have happened without starting the system. In particular, in psychoacoustics, in the perception of the “similarity” of music tracks - which parameters will give “different” music, and which, with their formal differences, are “the same” by ear. Some very useful things were found in the use of harmony, in imitation during plucking, in phrasing when changing chords, and in a better understanding of “style accuracy”. It is possible that all this is open and described. But in life it often happenswhich is easier (and more useful) to think of yourself.

Yes, it would be great if the Web Audio API allowed for binding, using VSTi for synthesis, or something else ready. On the other hand, there was an occasion to recall youth and write all the necessary synthesizers, to invent, optimize, both in structure and in algorithms and content. Almost, as in the old days, when an artist not only painted, but also made paints himself ...

As a result, after several kickbacks and alterations, modernizations (which was a good school and brain training), the system became stable and fully operational. Well, not in the form, of course, when it can be transferred to a third party. It’s approximately like a home-made car that only “obeys” the creator and if it breaks, there will be no one to fix it ...

In general, the finished composition for 4-5 minutes, it creates in 40-50 seconds. (on Intel Core i5, 2.8GHz, 12GB RAM). I asked a task, for example, “make 100 Dance tracks”, switched to lunch or went for a walk. He arrived in an hour and, if the system did not crash, he listened and determined %% of the suitability of the result. Then I changed something in the parameters and started again. If I behaved “reasonably”, i.e. set parameters that did not lead to blurring of the style (for example, this will happen if jazz harmony is applied in EDM), then 90-95% of the tracks sounded pretty decent, and if not, then 60-70 %% could be thrown away.

The system has worked, you can rejoice, but it seems that ridiculous but real problems started: “How can I listen to so much music?” And the realization of one more moment: "And who is the author?" A good friend "in the subject," said that "the author is a computer, incl. it doesn’t belong to you ”... If so, then it’s a shame, I entered tunes, phrases, harmonies, forms there, set up a couple of hundred parameters, where sometimes 5 msec are reflected in the result, and the system only generated the result, again according to the algorithm, which I invested in it ... I searched the network - I did not understand that in a similar case with copyright ...

As a result, I released about 3,500 songs with a total playing time of more than 200 hours and stopped, or rather switched - I plan to launch SongModeler - an online arrangement generator based on approaches developed in AlexAr. Well, examples of automatic music generation in AlexAr can be found here .

For too picky listeners, I want to say that I spent more time on programming, harmony, melody, form (i.e. notes), but synthesis and mixing, of course, can be much better. In the end, I checked the ideas, the approach itself - it is true. And for such projects, a team is needed, loners rarely tackle interdisciplinary topics. True, on the other hand, work becomes much more comfortable when there is no one “neither above you nor under you”. But the real irony is that when I had a team and resources, I did not aim at something like that because I thought it was too complicated.

Already somewhere in the middle of working on AlexAr, I realized that it is quite possible that if I started programming earlier, then 10 programmers (sometimes I had much more on staff) could be replaced by three :

I myself (design, programming, testing, content);
very smart - to bottlenecks, to intelligence approaches, new libraries, services (immediately after high school or from 3-4 courses);
productive workaholic for routine tasks, without imagination, who does not itch his hands to apply today what appeared yesterday.

I am sure that the task would be solved faster, and how much nerves and money would be saved! Where does the saving come from? So, before lunch, I sat down to test something - "well, the button is not convenient, and the indicator would be shifted more and to the right." OK - I sat down and did it in half an hour, the mood was 5+. And how would this be decided earlier in my office with the team, when you yourself do not write? At first I would ask how long it would take, then I drew it in PhotoShop, counted the pixels, wrote a TK, included it in the plan, waited a couple of days, or ... refused it at all - it works, it's already good. And so constantly.

Conclusions and motivation

I don’t know how harmoniously I will be able to formulate the actual conclusions, but not yet in my field, but I will try.

, . , , , , . , . , 10 , ( ?).
— . , - macOS Android, . , , . , .
GUI , — . , - . , , -.
- - , - — , , . , , - — . — «». 7 , 6 . — , — . , - . ? — , . - , « » — .. , «» .
, , (software engineer, !) . , , ( ) -, . , , engineer, , «, », .. «Copy Paste».
, , «» , , - . . - … , . — . — . , , , , , .
— , . , , IDE, . , , — NotePad++, FireFox ZIP- — , . «» . , « », .
? . , , — UnDo/ReDo. JavaScript , , — . , , , . UnDo — , , , .

In general, it seems to me that adults, especially those with an engineering background who have not been involved in programming in life, should try. If you know your subject, then 50% is already done. You need logic, a clear understanding of the “physics, nature” of your area, and this, most likely, you have. There is nothing to worry about in “crooked coats” and JavaScript is much simpler than Russian or English. In the end, you may not become a true software engineer, but these skills, even small ones, will add another option to your engineering qualifications. Yes, you will immediately stop looking at IT people as people with a halo over their heads, and you will be able to set a task for them at a more competent level. After all, they go to your area, where you are a pro. Or maybe you are implementing your project on your data, which you have long dreamed about,but the conditions for its implementation did not work out - I don’t know, I think that I was lucky because I was not afraid to try. And anyway - programming for an engineer, THIS IS FREEDOM!

Best wishes to all.

PS: If someone is interested in the history of my work over the years in a more detailed form, then it is here .

Music generator Web Audio API Beginner Experience

Context and Background

First practical experience

Music generator

Conclusions and motivation

More articles: