So what is this all about, “protein folding”?



In the current COVID-19 pandemic, there were many problems that hackers gladly attacked. From faceplates printed on a 3D printer and home-made medical masks to the replacement of a full-fledged mechanical ventilator, this stream of ideas inspired and delighted the soul. At the same time, there were attempts to advance in another area: in studies aimed at combating the virus itself.

Apparently, the approach to trying to get to the very source of the problem has the greatest potential for stopping the current pandemic and getting ahead of all subsequent ones. This approach from the category of “recognize your enemy” professes the computing project Folding @ Home. Millions of people registered in the project and donate part of the computing power of their processors and GPUs, thus creating the largest [distributed] supercomputer in history.

But what exactly are all these exaflops used for? Why is it necessary to throw such computational power on folding [stacking] of proteins ? What biochemistry works here, why do proteins need to be stacked? Here's a quick overview of protein folding: what it is, how it occurs, and what is its importance.

First, the most important thing: why do you need proteins?


Proteins are vital structures. They not only provide building material for cells, but also serve as enzymes-catalysts for almost all biochemical reactions. Proteins, whether structural or enzymatic , are long chains of amino acids arranged in a specific sequence. The functions of proteins are determined by what amino acids are located in certain places of the protein. If, for example, a protein needs to bind to a positively charged molecule, the junction should be filled with negatively charged amino acids.

To understand how proteins get the structure that determines their function, you need to go over the basics of molecular biology and the information flow in the cell.

Protein production or expression begins with the transcription process . During transcription, the double helix of DNA, which contains the genetic information of the cell, partially unwinds, giving nitrogen bases of the DNA access to an enzyme called RNA polymerase . The task of RNA polymerase is to make an RNA copy, or transcription, of a gene. This copy of a gene called matrix RNA (mRNA) is a single molecule, ideal for managing intracellular protein factories, ribosomes that produce, or translate proteins.

Ribosomes behave like assembly devices — they capture the mRNA template and map it to other small pieces of RNA, transport RNA (tRNA). Each tRNA has two active regions — a three-base section called an anticodon , which should coincide with the corresponding mRNA codons, and a site for the binding of an amino acid specific to that codon . During translation, tRNA molecules in the ribosome randomly try to bind to mRNA using anticodons. If successful, the tRNA molecule attaches its amino acid to the previous one, forming the next link in the amino acid chain encoded by mRNA.

This sequence of amino acids is the first level of the structural hierarchy of the protein, and therefore it is calledprimary structure . The entire three-dimensional structure of the protein and its functions directly derive from the primary structure, and depend on the various properties of each of the amino acids and their interaction with each other. Were it not for these chemical properties and interactions of amino acids, the polypeptides would remain linear sequences without a three-dimensional structure. This can be seen every time while cooking - in this process, the three-dimensional structure of proteins is thermally denatured .

Long-range bonds of parts of proteins


The next level of the three-dimensional structure, which goes beyond the primary, was given the ingenious name of the secondary structure . It includes hydrogen bonds between amino acids of relatively close action. The main point of these stabilizing interactions comes down to two things: an alpha helix and a beta sheet . The alpha helix forms a tightly twisted portion of the polypeptide, and the beta sheet forms a smooth and wide area. Both formations have both structural and functional properties, depending on the characteristics of their constituent amino acids. For example, if the alpha helix is ​​mainly composed of hydrophilic amino acids, like arginine or lysine , then it is likely to participate in aqueous reactions.


Alpha spirals and beta sheets in proteins. Hydrogen bonds form during protein expression.

These two structures and their combinations form the next level of protein structure - the tertiary structure . Unlike simple fragments of a secondary structure, the tertiary structure is mainly affected by hydrophobicity. The centers of most proteins contain highly hydrophobic amino acids such as alanine or methionine, and water is excluded from there because of the "greasy" nature of the radicals. These structures often appear in transmembrane proteins embedded in the double lipid membrane surrounding cells. Hydrophobic sections of proteins remain thermodynamically stable inside the fatty part of the membrane, and hydrophilic sections of the protein are exposed to the aquatic environment from both sides.

The stability of tertiary structures is also provided by long-range bonds between amino acids. A disulfide bridge serves as a classic example of such bonds.often arising between two cysteine ​​radicals. If at the hairdresser during the permanent hair curling procedure of some client you smelled a bit like rotten eggs, then this was a partial denaturation of the tertiary structure of keratin contained in the hair, passing through the reduction of disulfide bonds using sulfur-containing thiol mixtures.


The tertiary structure is stabilized by long-range interactions, such as hydrophobicity or disulfide bonds.

Disulfide bonds can occur between cysteine radicals in the same polypeptide chain, or between cysteines from different complete chains. Interactions between different chains form a quaternarylevel of protein structure. A great example of a quaternary structure is the hemoglobin in your blood. Each hemoglobin molecule consists of four identical globins, parts of a protein, each of which is held in a certain position inside the polypeptide by disulfide bridges, and is also associated with a heme molecule containing iron. All four globins are connected by intermolecular disulfide bridges, and the whole molecule binds entirely to several air molecules at once, up to four, and is capable of releasing them as necessary.

Modeling structures in search of a cure for the disease


Polypeptide chains begin to fit into the final form during translation, when the growing chain leaves the ribosome - approximately how a piece of wire from an alloy with a memory effect can take complex forms when heated. However, as always in biology, everything is not so simple.

In many cells, transcribed genes undergo serious editing before translation, significantly changing the basic structure of the protein compared to the pure sequence of the gene bases. At the same time, translation mechanisms are often enlisted by the use of molecular accompanying proteins, which are temporarily bound to the nascent polypeptide chain and do not allow it to take any intermediate form, from which they will then be unable to proceed to the final one.

This is all because predicting the final form of a protein is not a trivial task. For decades, the only way to study the structure of proteins was with physical methods such as X-ray crystallography. Only in the late 1960s, biophysical chemists began to build computational models of protein folding, mainly concentrating on modeling the secondary structure. These methods and their descendants require huge volumes of input data in addition to the primary structure - for example, amino acid bond angle tables, hydrophobicity lists, charged states and even structure preservation and functioning on evolutionary time intervals - and all in order to guess how it will be look the final protein.

Today's computational methods for predicting the secondary structure, working, in particular, in the Folding @ Home network, work with approximately 80% accuracy - which is pretty good, given the complexity of the task. Data obtained by predictive models for proteins such as SARS-CoV-2 spike protein will be compared with data from a physical study of the virus. As a result, it will be possible to obtain the exact structure of the protein and, possibly, to understand how the virus attaches to the receptors of the angiotensin-converting enzyme 2 of the person located in the respiratory tract leading into the body. If we can understand this structure, we will probably be able to find drugs that block binding and prevent infection.

Research on protein folding is at the very heart of our understanding of so many diseases and infections that even when we figure out how to defeat COVID-19 with the recent explosive growth of the Folding @ Home network, this network will not stand idle for long work. This is a research tool that is excellent for studying the protein models that underlie dozens of diseases associated with improper protein folding - for example, Alzheimer's disease or a variant of Creutzfeldt-Jakob disease, which is often incorrectly called mad cow disease. And when another virus inevitably appears, we will be ready to start a fight with it again.

All Articles