Supercomputers, genome sequencing and the prospects of defeating the coronavirus

Justin Er, Global Communications Manager

One drop of fluid that a person exhales can contain billions of coronavirus particles. In each of them, with certain variations, there will be about 30,000 nucleotides of DNA. This is an indicator of the density in the viral genome, which is responsible for each feature and specific trait of the virus - from its virulence to transmission mechanisms. Researchers from China and other countries have already identified a number of genomic sequences for the new SARS-CoV-2 coronavirus, providing scientists and doctors with basic knowledge to begin the fight against infection.



UFO Care Minute


COVID-19 — , SARS-CoV-2 (2019-nCoV). — , /, .



, .

, , .

: |

But mapping coronavirus genomes is just the beginning of the work. Exponentially more complex and equally important for understanding the nature of a pandemic is determining the sequence of the genomes of infected people. This raises many questions: why is one virus more contagious than others? What mechanisms lead to the development of pneumonia in some patients and only a mild cough in others? How will individual patients respond to different treatments or vaccines?

Answers must be sought in the field of interaction between the human genome and virus strains. Understanding this process at the DNA level opens the way to the diagnosis of diseases, the development of antiviral vaccines and immunotherapy.



Researchers at BGI Genomics, who developed the first diagnostic test kits to detect COVID-19 disease, are involved in painstaking and large-scale work to decipher the genome of the new coronavirus. To develop an effective vaccine, scientists need massive data sets to identify genetic differences and create potential protection. With such a large-scale operation, terabytes and petabytes of data are generated, with the processing and analysis of which only high-performance computing (HPC) systems can handle .

The speed of genome sequencing is growing along with the growth of computing power. A process that originally took over ten yearsand cost billions, now you can run in just a few hours on clusters of supercomputers running on optimized hardware architecture. Researchers emphasize that the path to creating a vaccine is likely to be long , but such an unprecedented set of tools will help reduce time.


“To analyze and identify the complications of a viral infection, BGI Genomics sequenced hundreds of clinical samples,” said Xiangqin Jin, CTO, BGI Genomics. “Access to the latest technology in high-performance computing and genomic analytics is an important factor in improving analysis efficiency.”


BGI Genomics Researcher Works with T7 Sequencer

Wanting to support BGI and empower COVID-19 researchers, Intel and Lenovo have joined forces. They presented scientists with a supercomputer cluster, and also offered their expertise in the field of software and hardware for the most efficient use.
“We are trying our best to support scientists and doctors who are at the forefront of the fight against the new coronavirus,” explains Milidi Giraldo, Lenovo's leading genomic research and development specialist.


The technology Intel and Lenovo provided for BGI includes an HPC cluster for processing high-performance read operations from the BGI DNBSEQ-T7 sequencer.

Dr. Giraldo has worked on bioinformatics research at the National Institutes of Health (NIH) for several years, contributing to the development of vaccines to fight infectious diseases. Now it helps to foster interaction between scientists and engineers who develop hardware and software for the natural sciences industry.
“We donate equipment and offer our expertise, but the real breakthrough will be provided by the results that BGI researchers and other representatives of the biomedical community will achieve.”

The HPC cluster will study virulence, pathogen transmission patterns, and interactions between the host and the virus. As a result of this work, BGI hopes to optimize its COVID-19 diagnostic kits, gain a lot of knowledge about coronavirus, accelerate the development of an effective vaccine or other protective measures - for example, immunotherapy.

Genome decoding


The genome of every person on Earth can be thought of as a book with a thousand pages. But her unusual text will contain only four letters: AGCT. They denote a combination of nucleotides in DNA and contain instructions for each trait, each feature that makes you who you are: determines your hair color, height and even susceptibility to a disease like COVID-19. Most of these instructions are written the same for all people, but the most important variations are hidden on just a few pages.



To understand how a person is susceptible to infection, researchers must identify the exact “pages” (that is, genes) with appropriate instructions. This can only be done by comparing the “pages” of the maximum possible number of patients - with the identification of general characteristics and the subsequent use of data linking these variations with the response to infection.

When sequencing a genome, one must take into account that 1 ml of a biological sample usually contains millions of different virions, and each of them, in turn, has a genome with approximately 30,000 DNA nucleotides.

This extremely difficult task of decoding and interpreting the genome lies at the heart of the struggle of scientists with the new coronavirus. Understanding the interaction between the corresponding human genes and SARS-CoV-2 may reveal ways to contain or completely stop infection. Scientists will also look for common “pages” in the book of the coronavirus itself - to identify areas of the genome in which the pathogen cannot transfer mutations. This is a kind of Achilles heel, which can open the way to creating a vaccine or seeking treatment.

Vaccine development


In films about epidemics, as a rule, there is a moment when scientists hunt for the only person immune to the disease. This makes some sense: natural immunity can provide a key insight into ways to develop an effective vaccine.

«, , — . — , , . , , . , , — ».



The coronavirus crowns a ring of crown-like thorns, which determined its name.

Consider two patient responses to the virus: one develops life-threatening pneumonia, and the other only has a mild cough. What explains these differences? Reduced immunity? Genetic predisposition? Consequences of another past illness? Age? Floor? Nutrition? The dominance of a specific strain of the virus? Comparing two patients is not easy to answer these questions. And in relation to thousands of people, the complexity of the task increases many times.

However, only a huge array of data will help begin the process of studying countless variations of genes and environmental factors. The more clinical and genomic data scientists have, the better and more accurately they will be able to identify common features in different patients.
Genomics on a population scale is billions of pieces of information. To study the new coronavirus, scientists hope to compare the DNA of tens of thousands of diagnosed patients. This is a complex task requiring serious processing power and high capacity, available only in HPC environments.

HPC Optimization


Remember the “book” of the genome? The one whose first reading took a whole decade? Researchers now analyze the genome in about 150 hours. This, of course, is a fantastic leap, but still, such speed is not enough to deal with the COVID-19 pandemic. Even the selection and sequencing of bits that encode the structure of the protein and the mechanism of propagation of the virus (and these are several "pages" called exomes) usually takes at least 4 hours.

BGI researchers can now access HPC clusters optimized for collecting and analyzing as many as hundreds of genomes and thousands of exomes.
« , BGI , , — - . — : , , ».

Based on Intel's powerful solution, Lenovo has developed an optimized hardware and system architecture that can help drastically reduce the time it takes to decrypt a genome. Lenovo’s Genomic Genomic Research Genomics Optimization and Scalability Tool (GOAST) leverages the open source software Genome Analysis Toolkit and optimized hardware platforms. Choosing the right hardware and software components to accelerate genomic research has required testing of hundreds of HPC configurations.
« , — - . — , . , « », . , , ».

As a result, the entire human genome is sequenced in five and a half hours, and exom in just four minutes. Acceleration reaches 40 times. Supported by a dedicated supercomputer cluster, BGI researchers will soon begin active work using GOAST to study the new coronavirus and create a vaccine.

In the short term, predicting virulence based on dominant strains can help hospitals allocate patients more efficiently — they will know who is at greater risk and which treatments will be most effective. In the long run, in addition to creating a vaccine, knowing the genomic history and origin of the virus will help prevent future outbreaks. In general, this is an incredibly capacious and complex puzzle that needs to be solved.

Modern equipment and technologies will accelerate the recognition of people infected with COVID-19, and the study of the characteristics of the virus genome will contribute to accurate diagnosis, successful treatment and prevention of epidemics.

All Articles