How to bypass mines of information technology

The article formulates some problems of information technologies (IT) and considers an approach to their solution, which may be of interest to developers of architectures of computer systems and programming languages, as well as business in the field of IT. But all of them, with the exception of some , hardly believe that there are problems, at least in what is described in this article, especially since the industry is developing more than. But, although some problems are not recognized, however, they have to be solved “creepingly” for a long time and gradually. And it would be possible to save manpower and money if you solve them consciously in full and at once.

Neither economics nor social communications are already possible without the use of advanced IT. So let's look at why the technologies currently in use are no longer suitable and what should be replaced with them. The author will be grateful for the constructive qualified discussion and hopes to find out useful information about modern solutions to the “problems” raised.

The ideas that define the architecture of computers have not changed much since the time of von Neumann. It is based on an algorithm in the process of executing a sequence of commands of which data is processed. Therefore, the main actors are the processes (Controlflow), which are provided (according to priorities and hierarchy) computing resources under the control of the operating system (OS).

The sequence and dependence of the processing of all data in the aggregate are described in the central (leading) program. And when creating a new type of data, it is necessary to foresee in the leading program the launch of an algorithm for their timely generation, and organize the moments and method of their use ("rendezvous") with other programs. But for this, it is still necessary to coordinate the data structures that should be found out from the developer (not without the commercial interest of the latter). And if the data sets previously processed by independent leading processes, according to the logic of development of integration, begin to intersect, then it will be necessary to develop a new leading process that integrates previously independent programs.
All this happens continuously as digital technology progresses and develops. And accordingly, more and more forces and means are only required to maintain systems in working condition, which are becoming more monopolistic and less visible. Even at the enterprise level, the number of different classes of data (tables or structures containing data) reaches hundreds and thousands. It is unlikely that where there are specialists who, on the whole, would imagine that, in fact, they are all stored. It happens that with the development of processing systems in databases (DB), “garbage” from data in old structures accumulates, which does not already participate in new processing algorithms, but it can be “picked up” when generating a data request.

It is often easier to add a new add-on over the old ones to new application features than using existing object programming technologies, rather than understanding existing algorithms and data structures. And so many times. It’s easy to see that this is a dead end.

Of course, they seek and find ways out of the impasse. These are database modification systems “on the fly”, and messaging protocols, and cross-platform backbones (buses) for data exchange, etc. And all these technologies and software tools platforms are sometimes updated several times a month. But if this continues further, then the gain from the next development of IT will become less than the costs of this very development. IT aimed at a dead end for the following reasons:

  • -, ;
  • -, , ;
  • -, , ( ), ;
  • , , . , -.

The author does not claim to have comprehensive knowledge of modern system platforms and programming languages, which especially began to develop after he reached retirement age. My programming experience began with the development of drivers for sampling data from telemetry recorders for testing aircraft and rocket engines (including for a lunar rocket).

In those days, all the software for the IVC complex (two mashrooms of almost 200 square meters each) was exhausted by a wired program for the initial input of one punch card and the launch of the code contained on it plus a minimum of routines in the form of a skinny pack of punch cards. The rest who as he can. But for special telemetry recording devices there was no software.

I had to program in machine codes and in absolute addresses, developing at the same time a bunch of drivers for various input / output devices and subprograms, starting with converting decimal numbers to binary codes and formatting back. Well, I would then refer to “we didn’t go through this”, and even give a fall to the fall for six months in Baikonur, where a similar system was deployed - then even there, even on business trips, they gave out quite good sheepskin coats. And when then I finally got there, it wasn’t anymore. Female programmers previously studied there, but since they were from a special trust of another department, they were not supposed to, especially in the summer. By the way, they said that then there was still mounting and shooting ceiling fixtures. And so, when one of the girls first pressed the “Initial Input” key,at the same time, the first shot of the assembly pistol rumbled. Both the girl and the chair were carried away a few meters from the remote control.

Yes, and I could not get distracted from the development of the architecture of the entire software complex and the stages of telemetry processing, although I was not any boss then. So I had to personally and overtime develop an assembler, then debuggers (for 2 different types of computers, one of which was also punched card and the other punched tape) in real time with interception of system interrupts and detection of a loop. As a graduate of the Physics and Technology Institute (MIPT), I had to take on all this binary nonsense, leaving others clear computational algorithms. And they took me to this office because in a neighboring design bureau (which was created after the war and half of the employees at first were engineers and designers from the German Junkers and Messerschmidt factories that were exported to the USSR along with the equipment,staff and their families) I was modeling systems of turbojet engines on the MPT-9 analog computer complex (lamp, the photo below - no better pictures; cabinets are the size of a person, and the little white rectangles are scales of voltmeters per 100 volts) for debugging engine control systems .



Type AVM or digital computer - what's the difference? And I must say, for the graduate of the physical education department of those times there really is almost none. At my faculty, however, this was by no means taught and, as it happened, subsequently not in demand. The principles of operation of both analog and digital computers (half-adders, shift registers and all that) were taught to us at the military department, such as the skills of calculating the parameters of launching target-class Earth-to-Earth missiles to the 5th digit on a slide rule meter length. I believe there is nothing like this now. But when they decided to introduce a programming course to our course, then almost all (!) Students stated that they, as future “pure scientists,” would never need this — and boycotted the lectures. Of course, this did not apply to students of the Department of Computer Engineering - for them every microsecond saved in the routines for BESM-6,gave, according to rumors, a premium of almost 20 rubles. This despite the fact that the scholarship in senior courses was 55 rubles. And we, the strikers, canceled the delivery of term papers on programming - but then I, and many of my fellow students, ended up somehow programming.

Over time, for our computer (well, it was still not BESM-6, but simpler and much less well-known), a translator with Algol-60 appeared, but without libraries of subprograms, there was somehow no need for them. In assembly language, yes with a debugger, it was easy to program anything. Further, it came to the development of interchangeable magnetic tape and disk operating systems (transparent from the side of the application software and the operator interface - this is a possible way out of standing disks) with managers, now they would say, Bat-files. Finally, a task supervisor was developed that ran script bat files to retrieve the data requested by the operator. He even wanted to develop a full-fledged operating system, as it later turned out to be Unix-like, but in connection with the transition to the Ryad type of computers, this did not become inappropriate.

In fairness, I’ll say that I was engaged not only in system software. When it became necessary to develop a fast Fourier transform program for spectral analysis, I had to do this quickly and quickly, especially since I could not find an intelligible description of the FFT algorithm. I had to reinvent it. Well, things like that.

I note that all the developed algorithms and processing systems were not published anywhere because of the secrecy of everything and everything in this department and, moreover, were not according to the profile of the enterprise — just a service department.

Already before the "perestroika", I was invited to head the laboratory of system software at a branch of a specialized Moscow research institute. The task was to develop a distributed computing system in a multiprocessor control controller, including hardware and software. Well, then this research institute, like many others, ceased to exist. I considered it necessary to include this “nostalgia” in the article only with the purpose of showing that the author has some notions about automation systems, possibly slightly outdated.

So, if we need to build an evolving vibrant and attractive social organism and an economy adequate to it, then, as the author believes, it is advisable to change the principles of organization of information technology. Namely:

  • , , .

A global distributed database should be a conglomeration of hierarchically organized databases built on a single principle.

If humanity ever agrees to create an international scientific language in which the relationship "who (what), whom (what), contains (includes), belongs to ..., is absent from ..., when, where, was, will, before, after, now, always, from ... to, if ... then, by what, how, why, etc. " If they were explicitly and unambiguously represented by linguistic constructions and / or relations symbols (which can reflect the relations of data structures described in metadata), then scientific articles could be directly loaded into this knowledge base, and with the possibility of using semantic content.

The architecture and principles of such a database author developed. Some of its variants were introduced and for several years were used without complaints in the clerical system of the city hall with almost a million people.

  • For each type of data, its purpose (and a sufficiently detailed text description), its relationship with other data, and the algorithm for obtaining it from previously received (or calculated) data must be indicated. Similarly, the form of their presentation in a typical user interface should be described and associated tools should be indicated. These characteristics and tools, called metadata , are also ordinary data and therefore must be contained in a database. At least in the database where they were needed, if not presented in a higher level database.

Metadata serves to indicate the potential existence and to ensure the selection of existing data according to their semantic meaning. Local metadata should, if possible, be mapped to metadata in the classifier of a database of a higher hierarchy. The author used some analogue of metadata at the time both in the task supervisor and in the city pension and benefit payment system, the database architecture of which was developed by the author, where he headed the automation department at a time when the payment and indexation algorithms of pensions were changing by the government if not 3 times a month.

This is not to say that they are not dealing with this problem. First, XML standards allow you to characterize data with tags, albeit in linear files. There are more global approaches to the problem: google, for example, “OWL ontology description language”. And here the author proposes extremely radical solutions when the data is stored in the database without reference to any of the original structures, and the structures required by users are formed in accordance with their description in the metadata.

  • Stream calculations should be performed using Dataflow technology , that is, data management should be carried out. New data should be calculated in accordance with the algorithm specified for them, as soon as the necessary initial data appear. Computations must be performed decentrally and in parallel on the network.

Data processing should consist in writing new data to the database, calculated according to the algorithm it compares using a sample that meets the conditions for specifying previously calculated or input source data. New data will be automatically received as soon as the necessary sample is formed - and so on in the entire network of distributed databases. The sequence of actions does not need to be specified (i.e., it is not necessary to write the program code of the control program), since when managing according to data, the next actions are performed upon the fact of the availability of the necessary source ones.

(Approximately the same computing technology is used in the Excel spreadsheet database, where the next data in the cells are calculated as they are calculated in the cells with the source data. And there you also do not need to describe the sequence of commands for this.)

All “programming” comes down to describing new data in metadata with their attributes (including access rights), relationships, display characteristics in the user interface (if necessary), specifying conditions for the attributes of the source data whose values ​​determine their occurrence in the sample, and setting processing algorithm. The algorithm, in more than 99% of situations, comes down to indicating what should be done with the data from a series of samples: add, calculate the average, find the maximum or minimum, calculate the statistical deviations, determine the regression function, etc. by array of the specified selection. In the general case, calculations (the sum of products, etc.) are possible for several samples, for example, A N from the sample {N} and B K from the sample{K} , etc., where k , for example, is in turn a sample of the parameter K N from the sample {N} . etc. Formulas that fit into the cells for calculating new data in the Excel example can similarly compose a description of the algorithm in a software module for obtaining new data from the source data in the Dataflow technology. And for this, as in Excel, you usually will not need to attract professional programmers. Is it sometimes and only for a specific task.

Thus, with few exceptions, the entire scope of tasks of informatization can be created by specialists in applied industries without the involvement of professional programmers. The data they deal with can be described by the same experts in metadata independently (if there is no analogue) and in terms of content, purpose and image they can copy paper documents familiar to them. A simple designer constructor will be enough for this. Any document (also a report or a scientific article) will be not just a text file, but a combination of information from the database, also provided with tools for their presentation and work with them, indicated in the metadata.

Now, to provide such capabilities, browsers are constantly expanding their functionality by adding flash players, options for a Java script, introducing new tags, web services, etc. Metadata would allow to organize, localize and organize these processes.
And you can always organize the receipt of a new arbitrary document (data set) without editing the existing algorithms for the functioning of the entire system, since the same data from a distributed database, in fact, can participate in a variety of different algorithms for obtaining new documents from the desired samples. For the implementation of the integrated accounting of trade in data management and the implementation of a unified distributed database, you don’t even have to send your reports anywhere - just start functioning. The appearing data will be picked up and taken into account automatically.

Developers involved in automation in relation to business processes may indicate that all these features have long been implemented in the specifics of BPM systems. And just on the example of BPM-systems we see how ideas of data management are secretive, i.e. without realizing the essence of the phenomenon, they seep into practice - of course, so far, under the control of a central host program. But, alas, for the BPM system to work, such as the “coolest” “ELMA”, the company should have a programmer with a good command of the C-sharpe programming language. The author was able to participate in the administration of this system.. And without a full-time programmer, you have to adjust your structures and business procedures to the proposed templates. But this approach is no different from the usual practice of acquiring application applications with all the problems of their adaptation and integration.

Data management ideas that were purely mathematically formalized in the form of oriented graphs and movable data in the form of “tokens” turned out to be difficult to implement in practice. In addition, requiring the use of expensive and energy-intensive associative memory. Therefore, the author proposes an implementation in the form of a “coarse-grained” model, where each module completely performs the processing of the source within the local database. Similar modules will work in other local databases, combining the results in a higher level database. When all registered data sources are processed, the new data will receive the status of ready.
If there is no data in the local database, the request is sent to the database of a higher level, etc. And already from it the request will be replicated according to subordinate local databases. The results of its refinement will also be either integrated into the top-level database, and then sent to the local database that initiated the request, or higher in the hierarchy if the request came from above. Thus, it is not necessary to know the email addresses of all sources and recipients. For each local database, it is enough to know only the address of a higher hierarchy database. This principle of organizing transactions allows for an easily scalable and scalable system.
The most simple and clear calculation algorithm can be displayed using flowcharts. They show what data is input to each software module, and where the output calculated in it is transmitted. The author developed the DFCS language for programming parallel computing in a system controlled by data flows , on which all the connections of flowcharts can be described.

In the example in the block diagram below, color parallelograms (large and small) indicate data, and in white blocks - program modules with data processing algorithms. Dotted lines indicate actions performed by electromechanical devices.



In the block diagram it is precisely determined which modules are connected with which, and no associative memory is needed, but some measures of data synchronization should be provided, especially if parallelization of the “narrow” section of the algorithm branch is used. Software modules (PM) in some optimal combination are loaded into a computing unit (WB). Data from the PM to the PM is transmitted through ports that can be represented virtually or by registers of physical devices. When managing data, a module, device or port, file or query to the database are functionally the same and interchangeable. Physical ports are used for exchanging data between WBs over data channels and, possibly, also between PMs in one WB.Data is temporarily stored only in ports (registers) and sometimes, possibly in some queue. And the main data arrays should be stored in the database, which should be executed as separate specialized devices , since the data access interface should be the same - regardless of the structure and relationships of specific data.

Data between devices is transmitted via a data bus along which many devices and modules using it can be placed. Before starting to exchange data, exchanging devices must “grab” the bus so that no one then intervenes. Capture usually occurs according to the algorithm for weighting the addresses of devices on the bus and takes at least as many clock cycles as the bit depth of their addresses.
Howeverthere exists and was even implemented in the hardware in the aforementioned research institute the technology of bus capture in 1 - 2 cycles by devices, regardless of their number . Given the progress of technology, you can use tens or hundreds of data buses for exchange, choosing a free one. The architecture of the computing complex is shown in the figure below. Complexes can be networked by connecting their buses through data transfer adapters.

It is not the operating system (OS) that controls the operation of the modules, but the data movement protocol (SDA) together with the transport program. The traffic control program is very compact and is located in each computing unit.

It is SDA that launches the module if the input data is in the port. The resulting output data is placed in the output port, from where the transport program (or driver) transfers it via the data bus to the input port of the next module connected to the first, and already there its own SDA program starts it. If the modules are in the same computing unit, then the transport program is not used, etc. If the module has worked, but there is no new input, then the SDA stops the execution of this module. The module will resume operation when this port appears in the port. However, such seemingly obvious rules of the SDA in practice are not able to ensure the stable operation of the Dataflow system. So the “right” traffic rules are much more interesting, and, I have reason to think, are realizable .

Thanks to traffic rules, a data-driven system is fundamentally decentralized. The OS, as a system that controls the execution and sharing of tasks, queues, etc., is not needed at all, even for resource management. All such requests are made by accessing services (as library modules) through their ports by placing data with requirements in them. If there are no resources (busy), then the answer will have to wait, possibly with an exit through the timeout port. Due to the complete decentralization of all devices and functions of the computing complex, it is easily scalable, and sets of task blocks can be loaded and duplicated as necessary and if free resources are available. In principle, traffic rules can be supplemented with a parallelization service for flows and additional loading of copies of blocks in front of which a data queue is created. With the successful distribution of modules among the autonomous computing resources of a computer network, a computation pipeline can be implemented,when the result (for past data) at the output is obtained simultaneously with the receipt of the next batch of (new) source data.

So, what steps should be taken to implement advanced IT?

  1. To develop a unified database structure suitable for storing any data in their interconnection, including metadata descriptions.
    In principle, this has been done, but has not been published anywhere (although tested).
  2. Develop a system of hierarchical database organization and transaction technology (up and down) based on metadata to exclude specific addressing to data sources and consumers.
  3. To develop and finally implement somewhere an imitation of Dataflow technology within the framework of existing Web technologies on Web servers using a unified structure database model implemented in relational database technology. At the moment, this would be the most effective investment.
  4. ().
    , , .
  5. , .
    DFCS . .
  6. .1, .
  7. , .
  8. .
  9. .
    .

It is clear that using Dataflow simulation in existing Web technologies (see clause 3), an automated process control system (APCS) cannot be built. To do this, it is necessary to implement Dataflow “in hardware”, which is what they were doing, up to “perestroika”, in the above mentioned research institute for the purpose of use in multiprocessor controllers. But you can easily realize all the possibilities for creating enterprise management systems, developing independent social networks and for managing business processes.

I believe that, first of all, it would be necessary to fulfill item 1, especially since the solution definitely exists, and then items 2 and 3, which can be performed using standard Web technologies. This will already be enough for everyone to independently create a full-fledged system of management, accounting for resources, products and clientele of a distributed enterprise without resorting to professional programmers. Almost the same means can be organized and created "social network" within the department, enterprise and, further, everywhere.

But this does not mean that programmers face unemployment. On the contrary, thanks to a unified data exchange interface and metadata classifiers, the services developed by them ( Software as a Service ) will be able to get the widest field of application, and developers will be paid proportionally. Something similar is of course already being done, but by special means through intermediary transformations of the presentation of data.

ANDthose who will provide system services for integration into Dataflow technologies will be able to get the maximum benefit from the project . This is not advertising, especially since there is neither a developer nor distributors. But it’s obvious that there are many more users who are able to develop their applied tasks in paper images for inexpensively independently and within the framework of an understandable interface (easier than in Excel) than there are those who are willing to pay for expensive professional software, which usually does not cover all aspects of the activity. Moreover, most likely, professional developers of applied software will also use the offered service, because this way they will once and for all solve the problems of data integration as the projects under development develop.

If the simulation of Dataflow in Web technologies is successful, there will be grounds for implementing Dataflow technology in hardware. Most likely, technical development should begin with clause 4 and clause 6, i.e. production of databases in the form of universal devices and, accordingly, abandon file systems. Gigabyte memory should be used in the database interface (where it belongs) to place arrays on data requests. In modules, the main memory is needed only for commands (in read-only mode), and for data it will only need, say, several hundred (or thousands) of registers (ports). For example, with interruption when a state changes. And here, "asking" for something like the latest developments of IBM Research, seemingly "allowing to perform calculations in memory cells." Plus a cache for queuing.

The programming language mentioned in clause 5 may also be needed to program the computing units used in data warehouses (see clause 6). DFCS is characterized by the following features. In each section of the network of modules (and inside any module), the data appears only in belonging to the inputs and outputs, called ports. That is, it is enough to declare the presentation of data in the ports of the modules. Since the order of execution of the modules is determined as the data is ready, there is no need to prescribe a certain sequence of execution of the modules - you only need to describe their switching with each other - it doesn’t matter in what order. That is, the language is declarative. Since it all comes down to instructions with parameters, parsing of any syntactic constructions is not required.The program in the process of "compilation" can be directly loaded into memory.

The modular structure of the flowcharts perfectly matches the top-down programming concept, and the interaction of modules only through ports ensures that encapsulation principles are respected. In addition, the modular principle and a natural interface for data create the best conditions for organizing a collective development of programs.

In the software part of the DFCS language, it is supposed to use labels and transition commands, which seems to contradict the principles of structural programming. However, based on my own programming experience, I can argue that a program with labels and transition instructions is usually more compact and more understandable than with duplicated duplicate copies of blocks and a set of “flags” to exclude transition commands. A similar opinion is shared by some pros .

A brief description of the language can be downloaded from the Yandex disk.

All Articles