Parallel computing model

1. Introduction. Competitive corutinism


Previous articles on the topic of automatic programming were just “flowers”. The "berry" of automatic programming, i.e. for what you need to do it, is a model of parallel computing based on the state machine model. So, let's go ...

The C ++ standard included the long-awaited support for multithreading [1]. But we will neither admire it nor criticize this fact, because work with threads is weighed down by so many conditions, caveats, and features that without real examples that reveal multithreading problems, a discussion of multithreaded programming will be not only hasty, but also fairly biased. Therefore, hereinafter mainly not about flows, but about automata, bearing in mind, of course, the first ones.

The C ++ language is far from the first, supplemented by parallelism constructs. Back in the 60s of the last century, N. Wirth proposed a parallel extension of the ALGOL language [2]. However, the next 60 years have not clarified what should be considered a parallel algorithm and what should be the model of parallel computing. Apparently, such a belated extension of the C ++ language is also connected with this.

Both the long-standing constructs of the ALGOL language, and their more modern analogues in the C ++ language, are just structural parallelization methods that do not introduce a parallel algorithmic model. To justify this, it can be said that the attempts made over the past time to create such a formal model of calculations have failed. Suffice it to say that the same Petri nets did not justify the high hopes placed on them.

As a result, the “spiral” of parallelism development seems to have returned to its source, having undergone only “terminological development”. Former trivial coroutines became suddenly advanced “coroutines” (tracing paper from the English coroutine), and confusion with the concepts of parallel and concurrent in the English-language segment of parallel programming sometimes leads to paradoxical things. For example, the first edition of the book [1] differs from the second edition by replacing the terms “parallel” with “competitive” and “multithreaded” with “parallel”. So figure this out in the “who is who” situation.

2. Parallel automaton model of calculations


Probably no one will dispute that the next qualitative step in the development of programming is connected with the transition to a parallel computing model. But whether this will happen as a result of the evolutionary development of the existing computational model or whether it will be a fundamentally different model is the issue still under discussion. And if theorists are still arguing, then the practically motivated part of programmers is already using structural methods for parallelizing programs.

Separation of duties and increased productivity are considered to be the only reasons for using concurrency. At least, to them or their combinations ultimately reduce or try to reduce all the others [1]. But there is a reason that is rarely talked about, but because of which it is generally worthwhile to engage in parallel programming. Indeed, the speed can be increased by purely hardware methods, and the separation of duties with parallelism is connected in the same way as the daily work of bank employees with a list of their official duties. And only parallel algorithms are a strategy that allows us to defeat the complexity of tasks and increase the reliability of programs. And all this is contrary to the prevailing opinion regarding multi-threaded programming, which turns any parallel program into a complex and unreliable software product.

A parallel system, consisting of many parallel-functioning and actively interacting components, objects, agents, etc., implements an algorithm that is determined in many ways not by the algorithms of the individual components (although they, of course), but by the number of components, the number and kind of connections between them. In order to control this complexity and understand the algorithm of the parallel system, you need not just a parallel computing model, but a model that has, among other things, and even above all, an appropriate theory.

The thesis that “often a parallel program is more difficult to understand ..., and, consequently, the number of errors is increasing”, is to say the least. Yes, the parallel program algorithm can be quite difficult to understand, but if there is a theory it can be "calculated" formally using component algorithms. And from the point of view of designing, implementing and maintaining the component algorithms are much simpler than the algorithm of the system as a whole. When designing simpler components, we will obviously make fewer mistakes than designing a system in one piece. Moreover, debugged components can be part of other systems, reducing complexity, increasing reliability and minimizing their design costs.

3. Serial concurrency


The article [3] described the parallelism of a separate model of a finite state machine. Its channels at the level of execution of transitions specify the parallel execution of the functions / methods associated with it - predicates and actions. At the same time, there are no restrictions on the predicate parallelism. When working, they do not conflict with each other, because do not affect the contents of the memory. Actions, working in parallel, can have common input and output data, as well as change them independently of each other. And all this can be a source of uncertainty in the value of the output data.

Correct operation of actions in the situations described above provides shadow memory. By storing new values ​​in it, one can use the same data within even one action, both as input and output. An example is the model of a rectangular pulse generator, described as y =! Y, where y is the output of the generator. Its C ++ code in the VKPa environment is shown in Listing 1, and the results of the program are shown in Fig. 1.

Listing 1. Rectangular pulse generator
#include "lfsaappl.h"

class FSWGenerator :
    public LFsaAppl
{
public:
    LFsaAppl* Create(CVarFSA *pCVF) { Q_UNUSED(pCVF)return new FSWGenerator(pTAppCore, nameFsa, pCVarFsaLibrary); }
    bool FCreationOfLinksForVariables();
    FSWGenerator(TAppCore *pInfo, string strNam, CVarFsaLibrary *pCVFL);
    virtual ~FSWGenerator(void) {};
    CVar *pVarY;				//  
protected:
    void y1();
};

#include "stdafx.h"
#include "FSWGenerator.h"
// state machine transition table
static LArc TBL_SWGenerator[] = {
    LArc("s1",		"s1","--",	"y1"),			//
    LArc()
};

FSWGenerator::FSWGenerator(TAppCore *pInfo, string strNam, CVarFsaLibrary *pCVFL):
    LFsaAppl(TBL_SWGenerator, strNam, nullptr, pCVFL)
{
    pTAppCore = pInfo;
}
// creating local variables and initialization of pointers
bool FSWGenerator::FCreationOfLinksForVariables() {
// creating local variables
    pVarY = CreateLocVar("y", CLocVar::vtBool, " ");
    return true;
}
// setting output signals
void FSWGenerator::y1() {
    pVarY->SetDataSrc(nullptr, !bool(pVarY->GetDataSrc()));
}


image
Fig. 1. Simulation of the operation of the rectangular pulse generator in the VKPA

In this case, the machine has one state with an unconditional transition (a transition with a dash in the place of the input condition) in the form of a loop marked by the action y1, which implements the inversion of the output variable, which forms a rectangular pulse in the dynamics. Within the framework of the automaton model, the frequency of the pulse signal can be controlled by setting the tact value of the discrete time of the automaton space into which the automaton is loaded.

1. , . . . .


The ability to control the discrete time of the automaton and the presence of many automaton spaces are not the only, but important, distinctive properties of the VKPa environment. Using them, you can optimize the performance of a parallel program. For example, machines that implement data visualization and user dialogs should be placed in slow automaton spaces, and application processes should be distributed among automaton spaces in accordance with priorities and their desired speed, etc. etc.

Within the framework of an automaton model, the value of the generator output is easily related to the current state of the model. The code for the generator model, which already has two states, each of which reflects the state of the generator output, is shown in Listing 2.

Listing 2. Square wave generator on states
#include "lfsaappl.h"

extern LArc TBL_SWGenState[];
class FSWGenState :
    public LFsaAppl
{
public:
    FSWGenState(TAppCore *pInfo, string strNam, CVarFsaLibrary *pCVFL):
    LFsaAppl(TBL_SWGenState, strNam, nullptr, pCVFL) {};
};

#include "stdafx.h"
#include "FSWGenState.h"
// state machine transition table
LArc TBL_SWGenState[] = {
    LArc("s0",		"s1","--",	"--"),			//
    LArc("s1",		"s0","--",	"--"),			//
    LArc()
};


In the new model, states replace the output variable, and this, as can be seen, dramatically simplified the generator model. As a result, we got a “bare” machine, represented only by a conversion table. To monitor its current state “s1” in VKPa, a variable of the fsa (state) type with the name SWGenState. (S1) was created for the machine with the name SWGenState. It takes true value in state s1, and false when the machine is in a different state. Further, this variable is already used by means of displaying the data of the VKPA environment (see the signal trend in Fig. 2).

image
Fig. 2. Modeling the state generator

4. Parallel computing control model


Further, moving towards the creation of a model of parallel processes, it is logical to use many simultaneously functioning and interacting finite state machines, i.e. network of automata. In this case, the problem of choosing a network time model appears, which can be the same for all machines or, in the limit, individual for each. In VKPa, the choice was made in favor of a single time (for more details on synchronous networks of automata, see [5]).

The choice of a single time allows you to create an algebra of automata having operations of composition and decomposition of automata. Using the first one, you can find the resulting automaton, which gives an accurate idea of ​​the operation of a parallel system. And here it is worth recalling the above thesis about the "complexity of understanding" of parallel programs. The presence of the composition operation allows us to solve the "problem of understanding" of the parallel program.

Of course, the resulting automaton for a network of a large number of components can be very voluminous. But, fortunately, an understanding of the operation of subsystems or networks of a small number of components is more often required, for which finding the resulting automaton does not cause big problems. The following RS-flip-flop model example demonstrates this.

The RS trigger model is an example of a simple parallel system. It is especially interesting in the presence of cross feedbacks. Feedbacks, or, in another way, cyclic chains, loops, algebraic loops, etc. are currently a serious problem for structural models of parallel systems. In the general case, it is allowed by introducing into the gap loops of memory elements. This is the standard solution proposed by the theory of automata [4]. The same output is recommended in the person of MATLAB. The VKPa environment is different in that it does not require the introduction of such additional elements for the implementation of loops. Note, and this is very important, real circuits also do not need them (see the RS-flip-flop circuit).

In fig. Figure 3 presents the simplest model of the AND-NOT element, which the RS-trigger circuit consists of. It does not take into account element delays, as well as their type (transport or inertial delays). However, it still contains at least one delay beat. This is the time of transition from one state to another. Listing 3 shows the model code

image
. 3. The model of the element AND NOT

Listing 3. Element model AND NOT
#include "lfsaappl.h"

class FIne :
    public LFsaAppl
{
public:
    LFsaAppl* Create(CVarFSA *pCVF) { Q_UNUSED(pCVF)return new FIne(pTAppCore, nameFsa, pCVarFsaLibrary); }
    bool FCreationOfLinksForVariables();
    FIne(TAppCore *pInfo, string strNam, CVarFsaLibrary *pCVFL);
    virtual ~FIne(void) {};
    CVar *pVarX1;				//  
    CVar *pVarX2;				//  
    CVar *pVarY;				//  
    CVar *pVarStrNameX1;		//    
    CVar *pVarStrNameX2;		//    
    CVar *pVarStrNameY;         //   
protected:
    int x1(); int x2(); int x12();
    void y1(); void y2(); void y12();
    bool bX1, bX2, bY;
};


#include "stdafx.h"
#include "FIne.h"
// state machine transition table
static LArc TBL_Ine[] = {
    LArc("st",		"st","^x12","y12"), 		//
    LArc("st",		"s1","x12^x1",	"y1"),		//
    LArc("st",		"s1","x12^x2",	"y1"),		//
    LArc("st",		"s0","x12x1x2",	"y2"),		//
    LArc("s1",		"s0","x1x2",   "y2"),		//
    LArc("s0",		"s1","^x1",    "y1"),		//
    LArc("s0",		"s1","^x2",    "y1"),		//
    LArc()
};

FIne::FIne(TAppCore *pInfo, string strNam, CVarFsaLibrary *pCVFL):
    LFsaAppl(TBL_Ine, strNam, nullptr, pCVFL)
{ }

// creating local variables and initialization of pointers
bool FIne::FCreationOfLinksForVariables() {
// creating local variables
    pVarX1 = CreateLocVar("x1", CLocVar::vtBool, " 1- ");
    pVarX2 = CreateLocVar("x2", CLocVar::vtBool, " 2- ");
    pVarY = CreateLocVar("y", CLocVar::vtBool, " ");
    pVarStrNameX1 = CreateLocVar("strNameX1", CLocVar::vtString, "name of external input variable(x1)");			//   
    pVarStrNameX2 = CreateLocVar("strNameX2", CLocVar::vtString, "name of external input variable(x2)");			//   
    pVarStrNameY = CreateLocVar("strNameY", CLocVar::vtString, "name of external output variable(y)");		//   
// initialization of pointers
    string str;
    if (pVarStrNameX1) {
        str = pVarStrNameX1->strGetDataSrc();
        if (str != "") { pVarX1 = pTAppCore->GetAddressVar(str.c_str(), this);	}
    }
    if (pVarStrNameX2) {
        str = pVarStrNameX2->strGetDataSrc();
        if (str != "") { pVarX2 = pTAppCore->GetAddressVar(str.c_str(), this); }
    }
    if (pVarStrNameY) {
        str = pVarStrNameY->strGetDataSrc();
        if (str != ""){pVarY = pTAppCore->GetAddressVar(str.c_str(), this);}
    }
    return true;
}

int FIne::x1() { return bool(pVarX1->GetDataSrc()); }
int FIne::x2() { return bool(pVarX2->GetDataSrc()); }
int FIne::x12() { return pVarX1 != nullptr && pVarX2 && pVarY; }

void FIne::y1() { pVarY->SetDataSrc(nullptr, 1); }
void FIne::y2() { pVarY->SetDataSrc(nullptr, 0.0); }
void FIne::y12() { FInit(); }


In fig. 4 shows a diagram of an RS flip-flop and its model in the form of a finite state machine. The arrows on the model indicate the connections between the automata of the network. Here, on the one hand, the states of the models reflect the states of the outputs of the element, and, on the other hand, they are also used as signals for organizing information links between parallel processes. It is this form of the model (with synchronization through states) that makes it quite easy to find the resulting automaton of the network. It is shown in fig. 5 (for the procedure for finding the resulting automaton, see [6] for more details).

Compare the [resulting] algorithm of the parallel RS-flip-flop program and the operation algorithm of a separate AND-NOT element. The difference is striking. In this case, the component algorithms are created by “handles”, and the parallel system algorithm is created implicitly - by the “artificial intelligence” of the network. This is the qualitative difference between parallel programs and sequential ones: changing only communications (at least one), we will get a completely different algorithm of work. And it will definitely not be an RS trigger anymore. And, by the way, another resulting automaton.

image
Fig. 4. Scheme of the RS-FF and network model

image
Fig. 5. The resulting machine network model RS-trigger

The analysis of the resulting automaton in Fig. 5 gives the following "understanding" of the parallel program (and the real trigger, of course). Firstly, when switching from one state to another, the trigger will necessarily go through the “forbidden” state of the outputs (and what do the textbooks say about this?). Secondly, if the trigger is driven into a single state of outputs (in the state “s1w1”), and then two units are fed to the inputs, then it will enter the generation mode, i.e. cyclic switching between the states “s1w1” and “s0w0” and (and have you heard about trigger generation?).

A transition through a forbidden state also occurs in a real trigger, but the generation mode is impossible due to the difference in delays of real elements. Fig. Figure 6 shows the mode of generation of the trigger trigger model, which exists as long as the units at the inputs are stored.

Remark 2. A typical description of the operation of the RS-trigger is given in the overwhelming majority of cases in the form of a truth table. But to do so, realizing that a trigger is a sequential circuit, it is, in fact, deliberately mislead those who study this topic. Well, no trigger cannot have “forbidden states”! But for some reason only a few decide to discover this truth and, especially, discuss the problem of its generation (see, for example, [7]).


Fig. 7 shows the switching of a trigger model between its stable states. Here, a single state of the trigger inputs saves the current state of the trigger outputs, and when this or that input is set to zero, it switches to the opposite state. At the same time, when the trigger is switched, its outputs at the moment equal to one discrete measure take at the same time a single (forbidden by whom?) State.

image
Fig. 6. RS-trigger generation mode

image
Fig. 7. Switching RS-trigger between states

Consider another RS-trigger model, consisting of one state and one action, i.e. similar to the model in Listing 1. Its code is shown in Listing 4. This model, like the generator model, has no predicates and the signal values ​​without any intermediate transformations are input to the action y1. Is this good or bad? On the one hand, it seems that it’s good, because the code has become simpler, but on the other hand ... not really. And we will understand the reasons for this now.

Listing 4. NAND element model from one action
#include "lfsaappl.h"

class FTwoOperators :
    public LFsaAppl
{
public:
    LFsaAppl* Create(CVarFSA *pCVF) { Q_UNUSED(pCVF)return new FTwoOperators(pTAppCore, nameFsa, pCVarFsaLibrary); }
    bool FCreationOfLinksForVariables();
    FTwoOperators(TAppCore *pInfo, string strNam, CVarFsaLibrary *pCVFL);
    virtual ~FTwoOperators(void) {};
    CVar *pVarX1;				//  
    CVar *pVarX2;				//  
    CVar *pVarY;				//  
    CVar *pVarStrNameX1;		//    
    CVar *pVarStrNameX2;		//    
    CVar *pVarStrNameY;         //   
protected:
    int x12();
    void y1(); void y12();
    bool bX1, bX2, bY;
};

#include "stdafx.h"
#include "FTwoOperators.h"
// state machine transition table
static LArc TBL_TwoOperators[] = {
    LArc("st",		"st","^x12","y12"), 		//
    LArc("st",		"s1","x12", "--"),		//
    LArc("s1",		"s1","--",  "y1"),		//
    LArc()
};

FTwoOperators::FTwoOperators(TAppCore *pInfo, string strNam, CVarFsaLibrary *pCVFL):
    LFsaAppl(TBL_TwoOperators, strNam, nullptr, pCVFL)
{ }
// creating local variables and initialization of pointers
bool FTwoOperators::FCreationOfLinksForVariables() {
// creating local variables
    pVarX1 = CreateLocVar("x1", CLocVar::vtBool, " 1- ");
    pVarX2 = CreateLocVar("x2", CLocVar::vtBool, " 2- ");
    pVarY = CreateLocVar("y", CLocVar::vtBool, " ");
    pVarStrNameX1 = CreateLocVar("strNameX1", CLocVar::vtString, "name of external input variable(x1)");			//   
    pVarStrNameX2 = CreateLocVar("strNameX2", CLocVar::vtString, "name of external input variable(x2)");			//   
    pVarStrNameY = CreateLocVar("strNameY", CLocVar::vtString, "name of external output variable(y)");		//   
// initialization of pointers
    string str;
    if (pVarStrNameX1) {
        str = pVarStrNameX1->strGetDataSrc();
        if (str != "") { pVarX1 = pTAppCore->GetAddressVar(str.c_str(), this);	}
    }
    if (pVarStrNameX2) {
        str = pVarStrNameX2->strGetDataSrc();
        if (str != "") { pVarX2 = pTAppCore->GetAddressVar(str.c_str(), this); }
    }
    if (pVarStrNameY) {
        str = pVarStrNameY->strGetDataSrc();
        if (str != "") { pVarY = pTAppCore->GetAddressVar(str.c_str(), this);	}
    }
    return true;
}

int FTwoOperators::x12() { return pVarX1 != nullptr && pVarX2 && pVarY; }

void FTwoOperators::y1() {
// reading input signals
    bX1 = bool(pVarX1->GetDataSrc());
    bX2 = bool(pVarX2->GetDataSrc());
// setting output signals
    bY = !(bX1&&bX2);
    pVarY->SetDataSrc(nullptr, bY);
}
// initialization of pointers
void FTwoOperators::y12() { FInit(); }


If we test the new model in the “shadow memory” mode, then we will not see any differences in its operation from the previous one, that is, and, switching, it will go through the forbidden states and regularly enter the generation mode. If we set up work with data in the usual mode, we will get the results shown in Fig. 8 and fig. 9.

image
Fig. 8. Failure of the generation mode of the RS-trigger model

image
. 9. Skipping forbidden states by the RS-trigger

model Why does the first model, regardless of the mode of working with memory, show stable results, and the second - changes behavior? The reason is predicates. The second model has no predicates, and this is critical for its behavior. But how and why does the presence / absence of predicates affect the parallel program operation algorithm?

The program model of an AND-NOT element, like an automaton program, has two input channels and one output channel. They must match two predicates and one action. The first program is fully consistent with this. The VKPa kernel, which interprets the automaton description, first executes all the predicates of not only a particular automaton, but also the entire automaton space, and only then starts all actions. In this case, in whatever sequence the actions were executed, simulating parallelism, and in whatever mode they worked with memory, the results of predicates on the current clock cycle of the automaton do not depend on them (actions). Therefore, the first program produces the same result.

The second program, although it works directly with the input channels of the machine, reads the input signals as part of the action. Actions, working with input data in the shadow memory mode, write new values ​​to the shadow memory and thereby work with data valid at the beginning of a discrete clock cycle. In the usual mode, they “grab” the instantaneous values ​​established at the time of their change, and thus the algorithm becomes dependent on the moments of memory change. A similar dependence is demonstrated by the second program. And even if predicate methods were introduced into the second model, this would not have any effect on the results of its work. What matters here is not the fact of the existence of predicate methods, but the features of their work in the framework of the automaton programming model.

5. Conclusions


Using the parallel RS-trigger program as an example, we examined some of the properties inherent in any parallel program. We will continue to consider certain general aspects of the functioning of parallel programs as an example of logical (digital) circuits. The choice of the topic of modeling digital circuits here is not accidental. In fact, in “refined form” they represent the work of parallel processes. This makes the analysis of the nuances of concurrency, race, synchronization, dead ends, etc. etc. transparent, clear and simple.

At the same time, no matter how you call programming - “competitive” or parallel, whether you use “coroutines”, coroutines, threads or machines for programming, the result of the [parallel] program must be the same in all implementations. The automatic model of parallel programs within the framework of the CPSU pursues this and only this goal.
Whatever assumptions would be made regarding the implementation of the core of the interpretation of the automata of the VKPa’s environment, all these would be “speculations”, because the result of the work of automatic programs should not be associated with the implementation of a computational model. It can be software (as it is now) or hardware (as I hope in the future), implemented on one core or on their set, in single-threaded or multi-threaded version, etc. etc. all this in no way should affect the results of parallel automatic programs.

And, it seems, the goal was achieved. The RS-trigger model, as one of the possible tests of parallelism systems [8], convinces us of this ... As life has shown, all other parallel programs, provided that the environment has successfully passed the RS-trigger test parallelism implementation, work just as correctly, reliably and stably . By the way, the same MATLAB “RS-trigger test” did not pass, and this already says a lot ...

Literature
1. . ++ . . . . .. – .: , 2012. – 672 .
2. . : . . – .: , 1981. – 360 .
3. . [ ], : habr.com/ru/post/484588 . . . ( 07.01.2020).
4. .. . .: , 1962.
5. .., - .. . – 2- ., . . – .: , 1988. – 480 .
6. .. . [ ], : cloud.mail.ru/public/HwsK/T95PMM8Ed . . . ( 01.02.2020).
7. . . 2- . – .: , 2004. – 432.
8. .. ? “ ”, №10/97, .116-119. [ ], : www.osp.ru/pcworld/1997/10/158015 . . . ( 01.02.2020).

Source: https://habr.com/ru/post/undefined/


All Articles