"Pandemic" of scientific publications about COVID-19

In the modern information society, any socially important processes that also affect the safety and health of citizens are accompanied by a stream of false information. The more participants in the process and the more complex the subject area, the wider the space for manipulation and the spread of misinformation. Such misinformation can be more dangerous than the occurrence that created the threat.



Information about the disease COVID-19 today dominates any other and is accompanied by a lot of false information. In this regard, there is a need for reliable information, which with a certain skill can be obtained from peer-reviewed scientific journals.

Many electronic scientific libraries and journals (such as the National Center for Immunization and Respiratory Diseases, JAMA Network, Elsevier) have organized special sections of SARS-CoV-2 coronavirus publications on their websites. However, more than 10 scientific articles are published per day on this topic. Understanding this flow of information is not easy. If the most cited coronavirus publication since 2003 for 18 years has attracted more than 3400 sources (according to Google Scholar), then the article Clinical features of patients infected with 2019 novel coronavirus in Wuhan has already been cited by more than 900 sources, despite the fact that this article was published just a month ago! This situation can be called a “pandemic” of scientific articles about COVID-19.

Let’s try to structure the flow of publications and identify interesting patterns in it. Due to the lack of special knowledge in the field of medicine, this article presents only the results of a bibliometric analysis, without attempting to interpret the revealed facts in the context of virology.

UFO Care Minute


The pandemic COVID-19, a potentially severe acute respiratory infection caused by the SARS-CoV-2 coronavirus (2019-nCoV), has officially been announced in the world. There is a lot of information on Habré on this topic - always remember that it can be both reliable / useful, and vice versa.

We urge you to be critical of any published information.


Official sources

, .

Wash your hands, take care of your loved ones, stay at home whenever possible and work remotely.

Read publications about: coronavirus | remote work

Characteristics of the source data


The source data was information on more than 10,000 academic publications collected on March 20, 2020 using the Google Scholar search engine. Unfortunately, few domestic publications are indexed in this search system due to the fact that the main Russian bibliometric system eLibrary has a strong system of protection against data collection.

In total, three search queries were performed for the following keywords: “COVID-19” , “coronavirus” and “SARS-CoV-2” (Figure 1).

Fig. 1 - Search results for scientific publications by keywords

The name of the disease COVID-19 is used more often in scientific publications than the name of the SARS-CoV-2 virus. Total articles and books on coronaviruses, as well as related topics, according to Google Scholar more than 150 thousand. Yearly statistics for collected publications are shown in Figure 2.

Fig. 2 - Distribution of collected information on publications by year

Two peaks can be observed in the diagram, referring to 2003 and 2012. These peaks correspond to two outbreaks of coronavirus infections: SARS-CoV (severe acute respiratory syndrome, known as SARS) and MERS-CoV (Middle East respiratory syndrome). Despite the fact that the collection of information on publications was focused on newer ones, it is difficult not to notice a jump in scientific activity due to the current situation with the COVID-19 pandemic. This trend can also be observed in the dynamics of citation of the most popular articles on this topic. The publication Identification of a novel coronavirus in patients with severe acute respiratory syndrome about coronavirus since 2003 for 18 years, according to Google Scholar, attracted more than 3400 sources. In this publicationClinical features of patients infected with 2019 novel coronavirus in Wuhan about a new coronavirus in just a month has already more than 900 citations! This situation is called a “pandemic” of scientific articles about COVID-19, as it has affected scientists around the world. The study of such an increased volume of publications requires the use of special analysis methods, which will be demonstrated in this article.

Map of scientific publications on the topic "Coronaviruses"


Analysis in graphical form using maps seems convenient and intuitive. Additional information on scientific publications can be obtained by considering the thematic link between them, as reflected in the citation. Based on the collected data, a citation graph was constructed, the core of which is depicted for convenience as a heat map (Figure 3).

Fig. 3 - Map of scientific publications on the topic "Coronaviruses"

Presented in Figure 3.A, the map forms a semantic space in which each section has a specific thematic focus. The proximity of the publication determines their thematic similarity. The mutual arrangement of thematic sites is determined by the links between the relevant research topics. Those. the closer the two areas of the map are to each other, the more similar they are to each other on the topic of research.

On the map (Figure 3.A), two large clusters can be observed. The cluster located on the left side of the map (sectors 6-11, Figure 3.A) contains the results of studies of coronaviruses conducted before the advent of COVID-19. This is evidenced by the distribution of the number of publications found by the search queries "COVID-19" (Figure 3.B) and“Coronavirus (after 2020)” (Figure 3.B). The publications found by the request “SARS-CoV-2” (Figure 4.E) are present both in the left and in the right (sector 3, Figure 3.A) clusters.

In addition to the topic of clusters, it is important to understand at what time their publications were published. Figure 4 shows the chronology of the placement of scientific articles and books on the subject “Coronavirus”, where the year of publication is indicated in color.


Fig. 4 - Illustration of the chronology of the appearance of scientific publications on coronaviruses.

The earliest publications are located in the upper left corner of the map, publications for 2020 - in a separate group on the right.

Information about the chronology allows us to trace the cause-effect relationships between the regions and the development of topics.

Thematic cluster overview


Let us consider in more detail the main areas of the constructed map (Figure 5).


Fig. 5 - Map of scientific publications on the topic “Coronaviruses” with the thematic areas marked on it.

Publications of the main cluster are devoted to the study of viruses. Its upper part includes earlier publications, in which more attention is paid to the study of the protein structure of viruses. In the lower part of the region are concentrated the results of studies of specific coronaviruses, including SARS (2003) and MERS (2012).

At the end of 2002 and the beginning of 2003, a disease appeared, which was called the “Atypical pneumonia” in the media. The virus spread in Asia. For all the time, more than 8000 cases of infection were noted, more than 800 of them were fatal. The peak of publications, which was noted earlier, is associated with this disease, and the publications themselves are compactly located in the SARS area (Figure 5).

The MERS-CoV area includes publications related to the Middle East Respiratory Syndrome 2012, which was distributed in 23 countries, including Saudi Arabia, Yemen, the United Arab Emirates, France, Germany, Italy.

Three isolated clusters on the left side of the map (zone 3, sector 8, Figure 3.A) relate to the study of viruses in animals (cats, dogs and cattle).

The right side of the map contains publications about COVID-19 and its consequences for society. The COVID-19 cluster has a complex structure and consists of thematic sections related to both the study of the virus itself and the modeling of its spread. There is also a separate area of ​​publications related to the peculiarities of revealing a disease by radiology methods.

Between the two large clusters of the left and right parts of the map there is a “bridge” of about 20 publications (sectors 3 and 4 of zone 2, Figure 3.A). These publications have links for citation, and related publications are located in opposite clusters in approximately equal proportions. Among these publications are topics on the development of a vaccine, on the identification of the origin of the virus, and also on the prognosis of its spread, taking into account the analysis of available data on similar infections.

The constructed map allows you to visually see the "natural" relationship between different research topics of coronaviruses and can be used as an intuitive and visual tool for analyzing the thematic focus of authors' teams, scientific journals and other research objects. This feature will be demonstrated in the following sections.

Analysis of authors' activity


For the publications under consideration, more than 3000 authors were identified, 50 of them (with the largest number of publications) are presented in the diagram (Figure 6).


Fig. 6 - 50 most published authors on the topic “Coronavirus”.

When determining statistics on authors, only their surnames and initials were used. This approach has several disadvantages, since on the one hand, the same people can be considered different due to differences in the spelling of surnames in their native and English languages. On the other hand, two different authors can be recorded as one person if they have the same surnames and initials (this problem is especially relevant for Chinese authors, who are the majority in the topic about COVID-19). For this reason, the actual number of authors and their publications will differ from the statistics provided.

Consider the thematic focus of the most active authors. Figure 7 shows personalized thematic maps of the 7 most published authors. Personal maps were built using the previously published map of scientific publications on the topic “Coronaviruses”.


Fig. 7 –Personal theme cards for the seven most-published Coronavirus authors

Professors Patrick Cy Woo and Susanna Kar Pui Lau are members of the Department of Microbiology at Hong Kong University. The authors have more than 100 publications (of which at least 40 are related to the study of coronaviruses). They have fairly high Hirsch indices, but so far no publications have been recorded on the COVID-19 topic.

Due to the prevalence of the Lee surname, several people can be represented under the Y Li profile at once: Yun Li (Yun Li, a professor at the University of Michigan or a professor at the University of Toronto), Lei Yuan (Lei Yuan, an employee of Wuhan University) and others. For this reason, it does not make sense to analyze the activity of publications of this profile. Similar considerations apply to the profiles of W Li , J Chen, and Y Yang .

Dr. Ziad A. Memish) is currently a senior consultant on infectious diseases and the head of the research department at the Prince Mohammed bin Abdel Aziz Hospital in Riyadh (Ministry of Health of Saudi Arabia). He is also a professor at the College of Medicine at Alfaisal University (Riyadh, Saudi Arabia) and an associate professor at the Department of Global Health. Hubert (Rollins School of Public Health, Emory University, Georgia, USA).

Ziad Memish is recognized by the expert community as a specialist in the fight against disease infections. Member of the Executive Board of the International Society of Infectious Diseases. He has many different awards, a large list of scientific publications and reports at international conferences, is the chief editor of two journals (Journal of Epidemiology and Global Health). Most of his publications on coronaviruses are located in sector 6 of zone 3 (Figure 3.A), which includes publications on the Middle East respiratory disease. Their publication time is for the period of the spread of the disease. At this point, Ziad Memish served as Deputy Minister of Health of Saudi Arabia.

On the subject of COVID-19 on the map of Ziad Memish, there are four publications devoted to the diagnosis and counteraction to the mass spread of the virus.

Thus, as a result of the analysis of personal activity, it can be established that the surge in publications of 2020 refers to Chinese authors, who, due to the prevalence of surnames and initials during bibliometric analysis, can be mistaken for the same people. Researchers with international authority have shown moderate activity in relation to the publication of information on coronavirus and related disease COVID-19.

Analysis of publisher activity


Many information resources (including Habr) for more convenient access to information about COVID-19 on their websites organized special sections where relevant information is aggregated. Simplifying access to verified information is a good way to combat the spread of false information, which can lead to negative consequences. Scientific publishers also use this approach. At the same time, it is necessary to note additional responsibility for ensuring the reliability and quality of the information posted by such organizations. By publishing insufficiently verified information, publishers run the risk of distracting or misleading scientists conducting research, which may lead to a decrease in the effectiveness of the fight against coronaviruses.

In connection with the increased volume of work on the review of scientific articles, it seems interesting to study the activity of publishers in relation to the topic under discussion. To do this, Figure 8 shows statistics on the placement of scientific articles in the corresponding source, and for the sources, comparative estimates of the total number of coronavirus publications found and the number of publications on the topic COVID-19 are also shown.


Fig. 8 - Statistics on the number of collected publications for magazines and bibliometric platforms (the light blue indicates the total number of collected publications on the topic “Coronaviruses”, dark blue indicates the number of publications on the topic COVID-19)

It should be noted that a large proportion of publications on the topic COVID -19 make up the so-called preprintsarticles i.e. articles published prior to their official publication in a peer-reviewed scientific journal (such articles are available from medrxiv.org and arxiv.org). On the one hand, the placement of preprints allows scientists to declare their superiority in obtaining scientific results earlier than others, and on the other, to correct inaccuracies that can be identified before the article is officially published. This reduces the possibility of commercial use of the results of their intellectual property, since the data will be publicly available. A large number of preprints of articles on the topic is not surprising, since due to its relevance, researchers seek to publish the results of their research as early as possible, without waiting for the completion of the review procedures provided for by official scientific publishers.Another interesting feature is the availability of sources that do not have publications on the topic COVID-19, despite the availability of articles on other topics related to coronaviruses. This feature will be discussed in more detail below.

We use the constructed map for the analysis of scientific journals in the same way as we used it to analyze the activity of the authors. Figure 9 shows the thematic maps of the reviewed journals and electronic libraries.


Fig. 9 - Thematic maps of scientific journals and electronic libraries publishing information on the Coronavirus topic

ScienceDirect (sciencedirect.com) . The system of access to scientific journals implemented by one of the largest world publishing houses Elsevier (which is also engaged in maintaining the database of scientific publications Scopus). The system provides access (paid and free) to publications from more than 2600 scientific journals. Criticism of this publishing house is mainly aimed at excessive commercialization of scientific activity.

ScienceDirect represents 14% of publications that fall into the core of the data collected. All topics covered on coronaviruses are covered (Figure 9.A), and the publication dynamics corresponds to general statistics. Topics about the 2003 coronavirus and the 2012 Middle East respiratory disease are highlighted proportionally. The topic of modeling and mechanisms of the spread of COVID-19 disease is presented in a smaller volume compared to the topic of clinical studies of the virus.

Journal of Virology (jvi.asm.org). Journal of Virology is a peer-reviewed journal and has been published since 1967. Currently, articles are published electronically every two weeks. The journal covers the results of studies on the nature of viruses, reports on new discoveries and points out new directions in research. Original research articles cover viruses from animals, archaea, bacteria, fungi, plants, and protozoa. Among the key problems that are being investigated: analysis of the structure of viruses, replication of the viral genome, evolution of viruses, the interaction of viruses and cells, etc.

The thematic map (Figure 9.B) shows that in this journal practically all topics about coronaviruses are covered, for except for COVID-19. Only one publication was collected on this topic (Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus ). In it, instead of the term COVID-19, 2019-nCoV is used, according to which 2 more publications related to the topic COVID-19 were manually discovered on the publisher's website. Such a small number of publications (compared to other publishers), despite the wide coverage of other viral infections, is probably due to the editorial policy, high requirements and careful review of the materials posted (the website indicates that the editor’s average response time for acceptance is 27 days, the time between a positive decision and publication is 11 days).

It is also interesting to compare the chronology of the publication in this journal and in the considered ScienceDirect system. These sources have similarities both in coverage and in the approximate number of publications that fall into the core of the data collected. At the same time, the dynamics of publications in ScienceDirect for outbreaks of viral infections in 2003 and 2012 looks similar, while for Journal of Virology, activity is fading. This may be due to both a decrease in interest in topics of coronaviruses or publishing resources, and targeted editorial policies (for example, additional requirements for the scientific novelty of research technology).

The National Center for Biotechnology Information (ncbi.nlm.nih.gov). The US National Center for Biotechnology Information was established in 1988 to process and store molecular biology data. The NCBI maintains a database of protein domains, DNA, (GenBank) and RNA, medical and biological scientific articles (PubMed), and taxonomy of species (TaxBrowser).

This source contains a little more than 4% of the collected publications in the core. Almost all publications were posted later than 2003 (Figure 9.B), so this source is practically not presented at the top of the thematic map. Also in this source is low coverage of topics related to pet viruses. COVID-19 scientific articles are mainly located in the central part of the corresponding cluster and are devoted to clinical studies of the virus, as well as prediction of its spread.

SpringerLink (link.springer.com) . Access system for scientific journals from Springer Publishing House, specializing in works in the natural sciences. The distribution of publications on the topic “Coronavirus” and over the years in SpringerLink is comparable to Elsevier, but in a smaller volume (about 3 times, Figure 9.G). Among the features in the statistics of publications can be noted a large number of publications dating from 1995, which mainly reveal the results of studies of coronaviruses in animals (including domestic). The main directions of publications on COVID-19 are clinical studies and modeling of consequences.

medRxiv (medrxiv.org). A free online resource for posting full, but unpublished articles and monographs (preprints) in the field of healthcare. The largest number of publications on the subject of COVID-19 is currently published by this very source (Figures 8, 9.E). This source has not been noted in publications on other topics about coronavirus.

Wiley Online Library (onlinelibrary.wiley.com) . Wiley science journal access system similar to Elsevier and Springer. Wiley compiled a selection of over 5,000 open research articles related to COVID-19. Most publications on COVID-19 relate to the results of studies of the structure of SARS-CoV-2.

Oxford University Press (academic.oup.com). The source publishes articles from more than 300 journals in the humanities, social sciences, jurisprudence, science and medicine, two thirds of which are published in collaboration with scientific and professional organizations.
Oxford University Press's publications on Coronaviruses are mainly aimed at the study of specific human coronaviruses. With respect to COVID-19, 16 publications were collected, which are mainly aimed at studying the origin and mechanisms of the spread of the SARS-CoV-2 virus.

Nature (nature.com). It is one of the oldest and most respected scientific journals in the field of natural sciences, has more than a million readers per month. For this journal (Figure 9.I), a “surge” in publications on coronaviruses for 2016 can be noted. These statistics differ from other sources considered. This year, mainly the results of studies on the structure of coronaviruses were published (for example, SARS and MERS: recent insights into emerging coronaviruses ). Publications have a fairly high citation rating due to the credibility of the journal.

All considered sources have convenient search engines and can be used for timely identification of the results of relevant studies of coronaviruses.

Study of publications on the origin of SARS-CoV-2


It is also interesting to use the developed map to study topics on coronavirus that cause controversy and scientific discussion. One of them is the version about the artificial origin of the coronavirus associated with the publication Engineered bat virus stirs debate over risky research . This publication was not found during data collection due to its low rating due to the lack of citation links (which is a strange circumstance because it was published by the reputable publisher Nature). This publication is also not mentioned in the two-page article No credible evidence supporting claims of the laboratory engineering of SARS-CoV-2, which alleges insufficient evidence of the artificial origin of the SARS-CoV-2 virus (Figure 10).


Fig. 10 - Selected publications on a topic related to the origin of SARS-CoV-2

In this regard, the research results published in the previously mentioned article Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS are of particular interest. Coronavirus of the Journal of Virology. However, due to the lack of specialized knowledge in the field of genetic engineering, further analysis is not possible.

findings


Summing up the present review, it is necessary to note the importance of timely access to the results of scientific research to counter misinformation. However, the excess volume of published information, as well as the scientific complexity of the topic, reduces the effectiveness of such a counteraction. A large number of published results increase the burden on both readers and reviewers who verify the correctness of the results. This situation is characteristic not only for rare events like the coronavirus pandemic, but also for the entire scientific industry. Analytics requires new approaches to information processing, one of which was demonstrated in this article.

The information obtained on collected scientific publications that have been corrected to the core may be useful to specialists, therefore, they are presented in the table as a separatexlsx file .

PS In the comments it is interesting to hear the opinion of experts regarding the editorial policy of the Journal of Virology, as well as the reliability of the artificial version of SARS-CoV-2.

All Articles