Strand COVID Mutation Miner
Vamsi Veeramachaneni
Extensive research on COVID and the SARS-Cov-2 genome is taking place across the globe and publications summarizing the research findings are being published at a rapid rate. When a new variant like Omicron (B.1.1.529) is detected and the mutations characteristic to the variant have been identified, the next step is to see if there is information available about the mutations in published literature.
The Strand COVID Mutation Miner tool makes this possible by indexing CORD-19 – a free resource of more than 280,000 scholarly articles about the novel coronavirus made available by the Semantic Scholar team at the Allen Institute for AI. The CORD-19 dataset consists of full text articles and associated metadata from multiple literature databases like bioRxiv, medRxiv and PMC.
At Strand, the entire CORD-19 data is downloaded on a regular basis and processed using a proprietary NLP engine customized for SARS-CoV-2. The NLP engine is aware of the various gene, protein, and nsp names used in the description of SARS-CoV-2 mutations. Mutations that are mentioned without an explicit gene or protein name are also intelligently mapped to the correct location. At present, the Strand COVID Mutation Miner tool contains references to 12,156 articles containing mentions of SARS-CoV-2 mutations. It can be accessed freely using the following credentials (user:covidmm.guest, password: guest123).
The homepage of the tool provides a snapshot of all the identified mutations across the genome, with the height indicating the number of papers mentioning mutations at any genomic location. As expected, spike protein mutations are the ones most mentioned in literature.
When we query for Q498R – a spike protein mutation present in most Omicron samples – the tool retrieves all articles which mention the mutation and displays the top 10 mutations mentioned in these articles.
It also provides links to the retrieved articles and the exact sentences with the mutation mentions.
A preliminary analysis of Omicron samples using the information at outbreak.info suggests there are at least 16 mutations that are present in an overwhelming fraction of the Omicron samples but which are absent (or present to a much lesser degree) in other lineages. The number of papers in the Strand COVID Mutation Miner tool for each of these 16 mutations is shown below along with the corresponding number of hits from PubMed. In all cases, the tool returns more hits with lesser false positives (FPs).
Mutation | Num Publications | PubMed hits |
ORF1a:K856R | 2 | 0 |
ORF1a:A2710T | 0 | 0 |
ORF1a:P3395H | 0 | 0 |
ORF1a:I3758V | 0 | 0 |
ORF1b:I1566V | 0 | 0 |
S:G339D | 5 | 3 (FP) |
S:S373P | 6 | 6 (FP) |
S:S375F | 2 | 1 (FP) |
S:Q498R | 10 | 2 |
S:Y505H | 11 | 0 |
S:T547K | 0 | 0 |
S:N856K | 0 | 0 |
S:Q954H | 2 | 1 |
S:N969K | 0 | 0 |
S:L981F | 0 | 0 |
M:Q19E | 0 | 0 |
The total number of publications related to these mutations is small since the Omicron variant has only been recently identified and the novel mutations in this variant are yet to be studied by researchers. These numbers can be expected to rise in the coming weeks.
A comprehensive literature summary of the above publications is beyond the scope of this post. However, a quick look at the publications reveals some interesting findings regarding S:Q498R:
- In Cell Rep. 2021 Oct 5;37(1):109784. doi: 10.1016/j.celrep.2021.109784, Georgiev et al. evaluate a panel of antibodies obtained from a COVID-19 patient for their ability to neutralize SARS-CoV-2 viruses. They identify one antibody, called 54042-4, as a promising lead candidate to counteract current and future SARS-CoV-2 variants of concern. However, they also mention that experimentation indicates that the S:Q498R mutation may prevent neutralization by 54042-2.
- In Int J Mol Sci. 2021 Oct 7;22(19):10836. doi: 10.3390/ijms221910836), Mequita et al. predict that the S:Q498R mutation can result in increased viral fitness through decreased antibody binding and increased RBD affinity.
The ease with which these publications could be retrieved shows that the Strand COVID Miner Tool can be a valuable aid in keeping track of the growing body of knowledge regarding SARS-CoV-2 variants.
Leave a Reply