Genetic Code Analysis II


Illustration melmagazine.com ( Source )

Currently, public networks with channels that are not protected from the intruder are widely used for information exchange. How protection is organized can be readhere.

In the message, the sender protects the integrity, confidentiality, availability of the message, for which the results of the theories of codology, cryptology, steganology are used .

In the present work, we continue to consider only one particular issue - the analysis of message codes.

There is a surge of interest in the study and use of the genetic code (HA) in connection with the development of nanotechnology. But the limitations of the GK model are far from satisfactory for all researchers, and those who are satisfied are still dissatisfied with certain particulars.

The fact is that the existing GC model does not allow explaining many phenomena and experimentally established facts. However, this is not surprising, but most likely natural. The area is relatively new and quite complex, and the time has passed since its discovery is relatively small, the number of people who have devoted their time to it is also very limited. The efforts of individual researchers are aimed at improving the GK model. For this purpose, the properties of proteinogenic amino acids are used (see table 1). The modern rational classification of amino acids is based on the polarity of radicals (R-groups), i.e., their ability to interact with water at physiological pH values ​​(close to pH = 7.0).

Table 1 - Properties of proteinogenic amino acids


Genetics Corrected Description


Familiarity with the descriptions of the Civil Code in a variety of sources leaves a sense of confusion in the texts, definitions and reasoning. If in a living organism science has established and operates a system of information transfer, and this is how molecular biology interprets the merits of pioneering researchers, it would be good to establish an analogy of this system to similar systems in technology for clarity of the picture.

Readers and followers, apparently, do not bother to think about the content of material published by other authors. This reflects the manifestation of the inertia of human thinking and the influence of pressure on the names of authorities on consciousness.

There is no clear and transparent description of either individual concepts or the code itself. We give brief schematic descriptions of such a system in technology and in a living organism.

. ( ), ( ) .
. , : . ( ) , , . , . . . () .

Below in the text are some simplified schemes for transmitting information in technology using a coding system and in living organisms using a coding system created by nature itself. At the same time, all the mandatory elements of the systems and the process of their functioning are named.

In the general scheme of information exchange of subscribers in a communication system using block codes, the following concepts and the corresponding elements of a communication system can be distinguished:

  • Source of messages (information) - texts, archival documents, images of audio, video, etc.
  • The sender of the message in some alphabet;
  • Message - a lot of digitized information words;
  • Encoder - a device or computer program that implements the conversion of a sender message into code words;
  • , ;
  • , ;
  • , , , ;
  • , ;
  • () .

Some of the elements of the system can be combined in one with the same or modified functions. The alphabet can be single (binary) on the transmitting and receiving sides, the source and sender of the message as well as the receiver and user can be one person, the decoder functions can be limited to detecting errors without correcting them, but with the removal of distorted codewords, etc.

What should be from the existing description of the genetic code and functioning of a living organism ?

We consider a cell in the nucleus of which is a set of chromosomes represented by DNA molecules recorded in the form of a sequence of genes separated by β€œcommas”. Each gene is formed by 3 letter codons (triplets) in a 4 letter alphabet.

There are no separators (commas) between codons (triplets) within the gene; triplets (codons, words) are written in a continuous, unbranched stream. Chromosomes in general and individual genes have an information load called hereditary information, which is transmitted to the cells of a new generation as a result of the process of division of the parent cells.

The semantic, informational filling of genes inherited from parents is the physical characteristics of an organism (individual) of a certain type not recorded explicitly. The transmission of symptoms (for example, hair color) is multi-stage: triplet-amino acid-enzyme-protein-organ or body tissue. These signs are not recorded explicitly, but indirectly, through the synthesized proteins. The proteins, amino acids, triplets involved in the synthesis are different for blondes and brunettes. Proteins for blondes (blond parents) will be used in different tissues and organs, providing descendants with the appearance of inherited traits and hair color.

It is assumed that those sets of enzymes that are synthesized in the cell and provide the further formation of the whole variety of proteins necessary for the growth and development of the body, guarantee the emergence of the genotype, which is determined by heredity. The complete list of codons (triplets) is limited to 4 3 = 64, but the composition and sequence of such codons forming the gene is very large. Each amino acid (enzyme, protein) requires a separate set of codons or a gene for its synthesis.

All proteins of a particular organism are unique. An alien protein that enters the body or a distorted protein of its body, taken as a stranger, is rejected by the body. This is the immune system. It is this system that checks the correctness of protein coding using the genome. In other words, the role of codewords is played by proteins synthesized in the body, and the immune system acts as a decoder.

The recipient of a message processed by a decoder should be considered organs and tissues of a living organism that use specific proteins for growth and vital activity. The message user is the organism itself.

It can be assumed that the original chromosome and genes arose originally from the required trait, formed by the list of proteins, and through proteins from the amino acids that formed the desired list of proteins, and, finally, from codons synthesizing these amino acids. So, information about the trait of an organism could be initially recorded in genes and chromosomes, which is stored in them, transmitted during cell division to new generations of cells and organisms. A desirable trait for the organism was fixed and preserved for many, many generations. Although what has been said here contradicts the central dogma of molecular biology, the listed chain can be mentally traced in both directions.

So, what do we come to when comparing two (live and technical) information transfer systems:

  • The source of messages (information) is the cell and in it the DNA source and carrier.
  • , – , ;
  • – () , ;
  • , , , , Β« Β» ;
  • , β€” ;
  • , β€” , , ;
  • – , ;
  • , , – ;
  • () – , .

( )


. , Β« , , Β» . .

Β« , Β» ..
. – , . , .

Distinguish cellular immunity, and together with protein products of their own activity (humoral immunity). The system acts as a whole. It includes approximately 10 12 lymphocytes and 10 20 immunoglobulin molecules, with the task of identifying antigens.

Antigens (Ag) are molecules and cells from animals of the same species ( allogeneic ), of another species ( exogenous ), as well as artificial or synthetic. Allogeneic antigens produced by the body itself, but then modified, are called autologous .

After identification of the antigen, the immune system neutralizes and removes it using special T cells or using antibodies(At), which are produced by B cells. The humoral factors called complement and the properdin system perform the same functions . Phagocytosis and intracellular destruction of Ag are performed by macrophages .

All of these components of the immune system form the body's immunological network.
Such a network sometimes has hypersensitivity, and sometimes immuno-tolerance or immunodeficiency, which is a violation of the norm.

In the first case, an excessive immune response takes place, and in the second, it is manifested by the absence of a selective immune response. The most difficult case is when allogeneic antigens turn into autologous and the body's immune system begins to work against itself. This completes the mapping of systems.

Another approach to the development of a GC consists in representing its elements as algebraic (Galois field) and spatial structures ( see papers ). According to the available descriptions of the Civil Code, the list of its words contains 64 triplets, each of them can be compared to the top of a single cube.

Figure 2 shows such a single six-dimensional cube with 2 6 = 64 vertices according to Yablonsky.

Genetic code (continued)

In our three-dimensional (n = 3) world, in animate and inanimate nature there are amazing phenomena called self-organization and self-assembly of elements, for example, in inanimate nature, the nucleation and growth of crystals. In this phenomenon, the effect of crystallographic laws of nature is manifested. Over time, man discovered these laws, explained them, and placed himself at the service. In 1848, Auguste Brave geometrically deduced 14 types of spatial (translational) lattices formed by identical cells in shape.

In 1890, E. S. Fedorov established the existence of 17 planar and 230 spatial algebraic crystallographic groups. This discovery of the scientist determines, in particular, the possibilities and limitations of nature to build crystals. The property of being a crystal for substances is quite rare. Most substances, even in solutions, prefer to remain (amorphous) in disordered form by emulsions, suspensions or colloids and do not crystallize.

From the point of view of mathematics, crystallographic lattices realize simple and complex types of symmetries. Escher's paintings illustrate many of them. Crystals in spaces of two and three dimensions do not have 5-ray rotation symmetry - this is the crystallographic limitation of our world with 3-dimensional geometry. In a 4-dimensional world, this restriction is removed. Among the existing diversity of mathematics an opportunity to highlight and a narrower class of symmetries - regular polygons on a plane and regular polyhedra in n-dimensional space, Rosenfeld B. V. Karasev .

Table 2 - Regular polyhedrons and their characteristics (case n = 3)

p * - the number of vertices in the face; q * is the number of faces adjacent to the vertex.

Table 3 - Regular polyhedrons and their characteristics (case n = 4)


At each vertex of the polyhedron q identical p-gons converge.
The values ​​(p, q, r) for a regular 4-polyhedron are determined by integer solutions of the inequality sin (Ο€ / p) Β· sin (Ο€ / r)> cos (Ο€ / q). There are only 6 such integer solutions, all of them are listed in table 3.

Mathematics, as usual, provides much more opportunities than nature or man can realize. Although it is possible that our knowledge of nature is very limited. A case of hereditary reflexive behavior of bees is known when they build hexogonal storages for their honey reserves.

From the analysis of the HA and taking into account the additional properties of the code elements from the spatial model of the placement of its elements, it follows that such elements are arranged taking into account the various spatial symmetries of amino acid molecules.

How the 20-vertex dodecahedron is connected (mathematically) with the genetics of living organisms is not completely clear. But the 5-sided faces of the dodecahedron and the result of crystallographic limitations of nature are manifested in the absence of rotational symmetry of the fives of amino acids at the vertices of the faces.

Among the 5 possible correct 3-polyhedra for nature modeling, not the simplest one was selected, but it meets the quantitative requirement of the synthesized cell enzymes (20). So many peaks dodecahedron has. The existing 20 amino acids (cell enzymes) can be mapped to the vertices of the dodecahedron, in a specific order. Indeed, it was possible to place 20 amino acids in space (n = 3) so that their coordinates correspond to the vertices of the dodecahedron, and certain properties of the polyhedron would reflect the symmetric dependences of amino acids.



The figure shows I - the plane of inverse antisymmetry; II - the plane dividing the "antipodes". The intersection of the planes is one of the axes of rotation of the dodecahedron.

The letters A and B with indices (upper and lower) and signs (Β±) denote amino acids that have certain properties (Table 1). So on the left side of Figure 1, all elements above the horizontal plane passing through the center of the polyhedron are marked with , and below the horizontal with βŠ–, which characterize the polarity of amino acids.

In 1968, Rumer Yu. B. proposed and provided a matrix and graph description of conformations (Table 4).

Table 4- Conformations (64) of a 4-link graph and their descriptions (according to Rumer) The


arrangement of elements and graphs in the table is such that adjacent elements in a block differ from each other by only one value (1 bit of information). Thus, it resembles a Gray code.

Model of topological coding of chain polymers. The author identifies three components of the model: topological code; chain coding algorithm; a system of physical operators recreating a coded structure. The model uses the Rumer transforms [7].

For example, triplets AAC, AAU - Asn; AAG, AAA - Lys on the left are converted to the right by replacing the bases C - A; G - U.

Table 5 - Transformation of the matrix of conformations into a triplet HA (according to Karasev V. Luchinin V.)

In the matrix [3 Γ— 3] of the graph, a connected edge connects the vertices with numbers i and i-4 and corresponds to a value of 1.

According to the available GC descriptions, the list of its codons contains 64 triplets, each of which can be associated with a vertex of a unit cube. Figure 2 shows a single six-dimensional cube with 2 6 = 64 vertices.

On the other hand, the expanded Galois field GF (2 6 ), formed by 64 elements and a single hypercube (n = 6) with the same number of vertices, can be associated with 64 triplets .



Figure 2 - A single cube ([11] according to Yablonsky SV) with marked vertices ([4,7] according to Karasev, Rumer) GK elements.

Since the number of vertices and triplets coincides, we can establish a one-to-one relationship between them - a bijection, which is representable by a permutation of elements. Amino acids of HA are assigned to each vertex of a single cube one at a time.


Figure 3 - Parts of the hypercube

Topological code. A 4-unit fragment of a chain polymer (4a), which is transformed into a chain graph (4b), is selected as the initial object. Graph edges (kc) - polymer bonds are incident to the vertices (i, i-1, i-2, ..., i-4) of the end points of the links.

The vertices of the graph x1, x2, ..., x6 are variables taking values ​​0 or 1.


Figure 4 - Four-unit fragment of the chain polymer (a), its graph (b) and the matrix of the graph (c)

Tables of the Galois field. This is an addition table and a field multiplication table, this also includes a table of Galois field elements, which shows various representations of elements and some characteristics of elements;

The left column of the element table is the degree of the primitive element (000010) of the field. These degrees run through all the elements of the field. The following columns: representation of field elements by polynomial, binary vector, decimal number, field element order, multiplicative inverse vector, degree of inverse polynomial, inverse in decimal representation, codeword weight.

Steganography and information protection [1, 2, 12, 13, 14]


It is known that DNA is formed by a sequence of genes, among which there are called exons and introns. Exons encode a protein, initiate its synthesis, and introns do not encode anything. They were even called β€œsilent” genes. Special enzymes remove introns from DNA before protein synthesis begins.

For example, in a person in the genome, almost ninety percent of the introns. For steganographic applications, it is the introns that are of interest. In addition, the degeneracy property of HA allows not only the generation of artificial DNA containers, but also the modification of natural ones.

DNA containers after embedding messages in them should go to the recipient of the message. This can be done in many ways. For example, to introduce into the genome of the organism to which the model of the used DNA molecule belongs. Ordinary viruses show us a successful DNA distribution mechanism.
Definition . Steganography is the science of methods for embedding / retrieving, transmitting (storing) hidden information, in which a hidden channel is organized on the basis and within an open channel using the features of information perception, and for this purpose, such techniques as can be used:

  • complete concealment of the existence of a hidden communication channel,
  • creating difficulties for detecting, retrieving or modifying transmitted hidden messages inside open container messages,
  • masking hidden information in the protocol.

The general concept of steganography is the creation of a hidden channel for transmitting information between the sender (A) and the receiver (B). Thus in one message, called a container or covering message from a large flow of messages in the networks, which is sent by the subscriber A β‰  A subscriber B β‰  B covert (hidden from A and B ) is laid (embedded subscriber A) another message smaller volume ( about the patent can be read here ).

Different conditions and possibilities for the indicated inequalities are considered. Either the first pair or the second pair can be one person, or equality is performed for both pairs of subscribers, although the latter is undesirable.

Back in the 50s of the last century, Richard Feynman carried out a theoretical justification for the possibility of using DNA molecules to organize calculations.
Definition . A steganographic algorithm is a pair of mutually invertible transformations: the direct F: M Γ— B Γ— K β†’ B and the inverse F -1 : B Γ— K β†’ M, corresponding to the triple (M β€” message, pB β€” empty container, K β€” key) result container and the pair (zB is the filled container, K is the key) is the initial message M, and F (m, b, k) = b m, k ;
F -1 (b m, k , k) = m, where m ∊ M; b, b m, k ∊ B; k∊ K.

The steganographic system (GHS) is called the system S = (M, B, K, F, F -1 ), formed by sets of messages, containers, keys and transformations connecting them.

The implementation (concealment) / retrieval of a message by means of the GHS refers to the result of the forward / reverse steganographic transformation with the corresponding values ​​of the arguments.
Definition . Sequencing is the determination of the sequence of nucleotides in a DNA fragment.

The presence and development of computer technology, microbiological technologies made it possible to speak and practically use the structural elements of living cells (DNA, RNA, etc.) as steganographic containers [3,4]. The properties of these elements to store huge amounts of information and to have microscopic dimensions attract the attention of specialists, despite the fact that working with them requires high professional training and the use of specialized expensive equipment

List of used literature:
1. .. . . β€” .: , 2003. 152 .
2. . . . . – .: -, 2002. – 272 .
3. . ., . . // . 2002. . 7. . 274 β€” 278.
4. .. / 23.03.2004, β„–470-2004.
5. . . . – .: , 1966. – 648 .
6. . . – .: ,1976. – 224.
7. . . // . 1968. . 183. .225-226
8. – . . – .: ,
9. . . : . – .: , 1999. – 352 .
10. . . -. . / . . . . . . .: , 1964. . 195 – 219.
11. . .– .: , 1979.–272 .
12. Bancroft F. C. Clelland C. DNA-based steganography. United States Patent β„–6.312.911. November 06,2001.US Patent & Trademark Office.
13. Bancroft F. C. Clelland C. DNA-based steganography. WO0068431. November 16,2000. World Intelltctual Property Organization.
14. Pfitzmann B. Information Hiding Terminologiy, Information Hiding //First International Workshoh. Vol. 1174 of Lecture Notes in Computer Science, Isaac Newton Institute, Cambrige, England, May 1996.- Berlin: Springer-Verlag/pp 347-350.

All Articles