Coronavirus in section

Alternative graphic

Above: a diagrammatic section through a coronavirus (CoV). This illustration was based mainly on ViralZone: and, as we shall see, it is a fairly accurate representation according to current knowledge. As illustrated below, the main features are: the large spikes, each consisting of a trimer (three copies bonded together) of the S glycoprotein (S = spike protein). This consists of two functional units (which may be cleaved enzymatically or remain covalently linked according to strain): the S1 head N-terminal) and the S2 tail (C-terminal). These large spikes give the virus its name, since under the electron microscope they resemble a crown of spikes around the virus particle. Note that Coronavirus does not have icosahedral symmetry: it is a helical capsid enclosed in a spheroidal envelope. Nevertheless it is well organised with the spikes and other proteins arranged in a more-or-less definite architecture (Neuman et al. 2006).

Docking to a target cell

Viruses are 'pirates of the cell' so must first enter a host cell before they can replicate. The S1 head contains a receptor-binding domain (RBD) which recognioses and binds to one or more specific receptor molecules in the target cell membrane, in this case the spikes of coranivirus are known to bind to angiotensin-converting enzyme 2 (ACE-2) which is situated in the membrane of certain cells lining the epithelium of the gut and respiratory airways. This enables coronavirus to bind to these cells and infect them, hence coronaviruses can potentially cause respiratory tract and gastrointestinal infections. They may be responsible for about 10 to 20% of common colds, in which the virus infects only the upper respiratory tract (URT) and lacks the necessary virulence to penetrate more deeply into the lower respiratory tract (LRT). However, every so often a strain emerges that can cause LRT, infecting the lungs to cause life-threatening pneumonia in what is known as: severe acute respiratory syndrome (SARS).

In 2003 a SARS CoV1 emerged within the human population which resulted in 8096 cases and 774 deaths: a 9.6% case fatality rate. It is important to understand that case fatality is only the percentage of those diagnosed who die and does not necessarily mean that 9.6% of those infected will die. Similarly an outbreak occurred in the Middle East, the MERS CoV (MERS = Middle-Eastern respiratory syndrome). As of March 2020 there is currently an outbreak of a new strain: SARS CoV2, which having originated in china is now sweeping across other parts of Asian, Europe and America. It is not currently known how deadly this strain will be compared to the current outbreak of influenza, which has likely already killed over 20 000 in the USA alone. (View Worldometer)The total potential for devastation will depend on the fatality rate and the infectivity of the virus. Current estimates suggest that this coronavirus is about 10 times as deadly as a typical flu strain. Computer models ahave generally assumed an R value of between 2 and 3, based on available data. (The R value is the average number of people infected by every infected person). Reported symptoms for this strain typically consist of headaches, cough and difficulty breathing and transmission is by aerosols and contaminated surfaces.

Coronavirus in section
Above:the structure of the coronavirus virion (virus particle). The virion is enveloped and given as about 85 (without spikes?) to 120 nm in diameter. It is approximately spherical. The nucleoprotein (N) binds to the viral single-stranded RNA genome, protecting and packaging it into an (open) helical nucleocapsid. The M (membrane) protein stabilises the structure and determines its geometry (each spike is associated with 4 N and 16 to 25 M proteins in a regular lattice). Each spike consists of a triangular trimer of S glycoprotein and is about 10 nm in diameter at its widest point. There are about 200 to 400 spikes per particle.

Viral RNAs and proteins - technical breakdown

The genomic RNA of coronavirus is a positive-sense single-stranded RNA molecule which acts as both the genome and as mRNA and so is translated directly into viral proteins. The RNA carries a 5' cap and a short 5' leader sequence and poly-A 3' tail to enable it to be translated like host mRNA by host ribosomes (the cell's protein factories). There are at least 14 functionsal ORFs, however, only the open-reading frame (ORF) nearest the 5'-cap gets translated. The first 5' ORF is ORF1a which is translated into polyprotein R1a (pp1a). However, sometimes a programmed ribosomal frameshift occurs which means that a different message is read and this continues onto the second ORF, ORF1b to produce a longer R1ab polyprotein (pp1ab). The production of polyproteins enables the virus to economise on genetic material. Two viral enzymes (viral proteins PLpro and 3CL) cut the polyproteins into the separate constituent proteins. The arrangement of the RNA in human SARS CoV1 is as follows:


ORF1a encodes the following proteins:

R1a: ns1-ns2-PLpro-ns4-3CL-ns6-ns7-ns8-ns9-ns10-ns11

ORF1ab encodes the following proteins:

R1ab: ns1-ns2-PLpro-ns4-3CL-ns6-ns7-ns8-ns9-ns-10-ns-12-RdRp-Hel-ns14-ns15-ns16

These are the early proteins which establish host cell take-over and begin viral replication. They are non-structural proteins. Their functions are not entirely understood but are roughly as follows:

1. PLpro (proteinase, ns3) and 3CL (protease is the whole of ns5, which is the also called the main protease) - cut up the R1a and R1ab polyproteins into their individual proteins. These enzymes, PLpro and 3CL are being investigated as potential drug targets.

2. Non-structural protein nsp3 (PLpro) is found in all coronaviruses and is a large multi-domain protein with domains: X, Y and N and two PL-proteinase domains (PLpro) which act as proteinases to cut the polyprotein. Nsp3 binds ADP-ribose and ADP-phosphoribose (or ADP-ribose-1''-P in which the phosphate is on the first carbon of the ribose ring) by its X domain (Johnson et al. 2010). In SARS-CoV nsp3 contains an additional segment, the SARS-unique domain (residues 366-322) consisting of three globular subdomains joined by short linkers (SUD-N, SUD-M and SUD-C) (Johnson et al. 2010)

3. RdRP (ns12) - RNA dependent RNA polymerase, synthesises viral RNAs assisted by other non-structural proteins, such as a helicase (Hel, ns13), that assemble a replication complex. The ssRNA(+) is first copied into a ssRNA(-) template which is then copied into more ss(RNA(+) molecules, both whole genomic RNA and shorter subgenomic RNAs (see below).

4. Non-structural protein ns8 (or nsp8 if we wish to distinguish the gene from the protein) is a second RdRp which seems to function as a primase for the first RdRp; ns9 (a single beta-barrel) forms dimers and binds RNA and interacts with ns8; ns14 is an exonuclease; ns15 is an endonuclease; ns3 an ADP/ADP-ribose phosphatase; ns16 a ribose methyl-transferase; ns9b binds lipid. non-structural protein nsp10 (coded for by gene ns10) forms a dodecamer and is thought to act as a transcription factor (binding to genetic elements to regulate their expression, Su et al. 2006).  Thus, the function of this protein seems to be in binding RNA.

5. Non-structural proteins nsp1 to nsp16 have some role in the assembly of a viral replication-transcription complex involved in viral genome replication, although the exact functions of some of these proteins is poorly understood.

6. Note that all products of polyprotein 1a are common to polyprotein 1ab except for ns11 (nsp11) which is only transcribed when the programmed ribosome frameshift does not occur (Masters 2006).

There are additional ORFs (for the proteins 3a/b,E,M,6,7a/b,8a/b,N,9b,14) which can not be reached by the ribosomes on the main genomic RNA (since they are not required early on in infection) but are translated from shorter subgenomic RNAs which are synthesised from ssRNA(-) subgenomic templates. These subgenomic negative-strand templates are thought to be synthesised by an unusual process called discontinuous transcription in which the 5' leader is copied first and then the appropriate 3' body sequence for the ORF being transcribed. In SARS CoV-1 this generates the following subgenomic RNAs (the genomic RNA is mRNA1):

1. mRNA2 encoding for S (spike) protein

2. mRNA3 encodes proteins 3a/3b

3. mRNA4 encodes E (envelope protein)

4. mRNA5 encodes M (membrane protein)

5. mRNA6 encodes protein 6

5. mRNA7 encodes proteins 7a/7b

6. mRNA8 encodes proteins 8a/8b

7. mRNA9 encodes proteins N/9b; N is the nucleoprotein

8. mRNA10? encodes protein 14?

Note that these are essentially structural proteins involved in virion assembly, though the functions of some are not understood. Note also that a single subgenomic RNA can encode two different proteins in an either-or manner; this is the case for mRNAs 2, 5, 6 and 7. For example mRNA 7 can be translated to give the protein N or the protein 9b. This occurs by a process called leaky scanning. In leaky scanning the ribosome sometimes skips a weak START codon for the first protein to locate a second START codon (which may be in a different reading frame) for the second protein. This enables the virus to economise on genetic material.

SARS CoV-2 (2019) or COVID-19 has some differences (see ViralZone: In particular there is no 8/8b RNA, no 3b and a possible additional subgenomic RNA for ORF10 which hypothetically encodes an unknown protein.

You can view the genetic sequence of SARS CoV-2 (2019) as isolated from China on the NCBI database: It will be informative to see whether or not other countries have mutated variants. Phylogenetic analysis reveals a close relationship to the coronavirus of bats.

Target cell entry

After attaching to the target cell (which may involve additional steps) the cell is tricked into endocytosing the virus obstructing its surface receptor: the virus and the receptor it is bound to will be enclosed in a membranous vesicle inside the cell. The cell will then attempt to digest and recycle the contents of the vesicle that becomes an endo-lysosomal vacuole. The endo-lysosome can be thought of as the 'stomach' or waste disposal center of the cell. The S2 subunit anchors the spike in the viral envelope, but, as we shall see, also has a critical function for the virus. The host cell will use a suite of enzymes, high acidity and an arsenal of molecular weapons to attempt to destroy the enclosed virus and recycle its components. However, when conditions begin to change inside the vacuole the S2 mechanism of the spike is activated. Specifically, part of the S2 (called the fusion peptide) is exposed by a conformational change in S1/S2 when S1 binds its target receptor (ACE-2) and S2 subsequently gets cleaved inside the endo-lysosome which activate the S2 fusion mechanism.

Antibodies against the S protein have been shown to be effective in blocking invasion by the virion. This represents a potential for vaccine development whereby a vaccine consisting of fragments of S may be able to raise specific antibodies that block the function of the spikes.

(See, for example: He et al. 2015 -

Each S2 contributes a long alpha-helix, the three helices of each trimer forming an anchoring rod. However S2 also contains the fusion peptide (FP) which rapidly unfolds to insert into the host cell (endo-lysomal) membrane (it unfolds to a length of about 20 nm to reach the target membrane). The FP then brings about fusion of the viral envelope with the containing host cell membrane, allowing the contents of the virus particle (virion) to escape into the host cell's cytosol.The subsequent steps in the mechanism is not fully understood in the case of coranavirus but is better studied in HIV and influenza. In HIV membrane fusion is brought about by a 'hairpin-loop' mechanism: the fusion peptide unfolds and penetrates the target cell membrane then the fusion protein bends about a hinge to pull the two membranes close together to overcome their mutual electroststic repulsion and allow the membrane phosopholipids to rearrange into an open pore or channel with positive curvature across the two membranes to form an expanding fusion pore.

Thus, the spikes are important for target cell attachment and viral entry and therefore determine in large part the hosts and range of tissues the virus can infect. Coronaviruses cause zoonoses, or diseases of animals, but have a broad host range. This enables them to occasionally jump from one animal species to another, including humans. Ideally, it is not in the best interests of a virus to destroy its host or to raise an immune response and viruses that are well adapted to their hosts may be able to steel the resources they need without causing significant disease symptoms. Viruses seem to be at their most deadly, however, when they have just jumped to a new host species with which they are not highly compatible. This accounts for the severity of SARS pandemics.

Commandeering host cell machinery

When a virus enters a host cell it must first evade destruction by the host cell's defenses, ideally by evading detection. It must also deactivate any alarm systems within the cell, both to prevent the cell mounting its own defenses and to prevent the cell alerting others, including cells of the immune system. this is its first priority, then it must take control of cell machinery, in particular the cell's protein and nucleotide factories to make more copies of itself.

To achieve these aims, the virus must carry its own 'computer program' in the form of genetic material with which to reprogram the host cell. The genetic material of coronavirus consists of RNA rather than DNA and the coronavirus has the largest RNA genome of any known virus at 27 to 32 kb. This is important: larger viruses generally utilise DNA to store their program since it is a more stable medium than RNA. However, smaller viruses can cope with RNA which due to its lability mutates faster than DNA which gives RNA viruses a very high rate of mutation and hence of evolution.

Packaging such a large RNA genome poses problems. Many RNA viruses are helical, with many copies of a nucleoprotein (N) binding to the RNA to protect, stabilise and package it into a helix of ribonucleoprotein which often forms a definite rod-shaped ribonucleoprotein particle (RNP). In the case of influenza, its modest-sized 13.5 kb genome is packaged into 8 separate ribonucleoprotein particles: the genome consists of 8 molecules of RNA. The larger genome of coronavirus consists of a single RNA molecule which appears to package into a fairly loose and open helix which fills most of the virus particle, but is concentrated in the outer region just below the envelope.

The architecture of SARS CoV, as currently understood, has been well illustrated by David S. Goodsell who kindly contributed his work to the Protein Data Bank (

Coronavirus in section

Illustration by David S. Goodsell, RCSB Protein Data Bank; doi: 10.2210/rcsb_pdb/goodsell-gallery-019. David Goddsell's biomolecular illustrations are truly fabulous! This illustration of coronavirus (shown embedded in host mucus) represents the structure according to the state of our knowledge. Note the folding and twisting of the helical RNA-protein particle (ribonucleoprotein particle) shown in magenta inside the virion envelope.

The genetic material of coronavirus consists of ssRNA(+), that is single-stranded RNA of the positive sense. Positive RNA has the same genetic code as the mRNA (negative sense is complementary) and can be directly translated by host ribosomes into proteins. In this way two polyproteins, R1a and R1ab, are synthesised and cut-up by viral enzymes into early proteins involved in the replication of viral RNA. Some of the proteins produced have an unknown function but the virus must gain control of the cell first, subverting its machinery for its own purposes. this is often accompanied by an attempt to shut down synthesis of host cell proteins and make viral proteins instead and also involves evasion of host cell defenses. At least protein N enters with the viral RNA and this may have a role in overriding the host cell's systems as it can bind DNA.

Once secure, the virus modifies the internal membranes of the cell (the endoplasmic reticulum, Golgi complex and associated membranes) to create a virus factory. Later genes are transcribed and translated (from subgenomic RNAs) to produce structural proteins and assembly of virus particles takes place. The host cell membranes act as a scaffold for virus assembly and are also thought to shield the dsRNA that the virus produces from detection by host cell systems (such as protein kinase R, PKR). The host cell does not generally form dsRNA (double-stranded RNA) but many viruses have to, as an intermediate in genome replication, as when coronavirus copies plus-strands of RNA into minus-strands and full genome-sized negative strands into new plus-stranded genomes.


Assembly, packaging and escape of progeny from the host cell

Once the various viral components, genomic RNA (gRNA) and structural proteins have been manufactured, they must be packaged into new virions. This requires the N protein to oligomerise (that is for N subunits to bond together into higher order structures) whilst binding to gRNA and assembling the ribonucleoprotein helix. The N protein contains two RNA-binding domains: the N-terminal domain (NTD) and the C-terminal domain (CTD). The CTD also contains a dimerisation domain that binds to another N protein molecule. In some betacoronaviruses, such as MHV (mouse hepatitis virus, not in the same family as human hepatitis viruses) and BCoV (bovine coronavirus) which belong to the A lineage of betacoronaviruses there is a genome packaging signal (PS) consisting of about 95 nucleotides about 20.3 kb from the 5' end of the gRNA. This puts the PS within the gene coding for nsp15 (a subunit of the replicase-transcriptase complex needed for viral nucleic acid synthesis). This is predicted to fold as a stem-loop in the gRNA secondary structure (due to hydrogen-bonding RNAs will fold into more complex shapes that can have functional significance). Additionally, the CTD of N contains a linker connecting it to the N3 domain which binds to M. Thus, N binds gRNA and also binds to the viral envelope protein M, allowing the structure of the virion to assemble. Additionally the linker between the CTD and NTD of N binds to nsp3 (a subunit of the viral RNA replicase). Viruses lacking a functional PS will also package subgenomic RNA (sgRNA) into the virion, that is they can no longer specifically recognise the gRNA for packaging (Kuo et al. 2016). SARS-CoV belongs to the B group of betacoronaviruses and does not have a PS homologue but it has been shown to have a functionally equivalent domain in the N CTD (Kuo et al. 2016).

The N-protein not only dimerises but forms tetramers and octamers, at least in the crystal structure, and this positions the two RNA binding domains of each N monomer along a double groove in which the genomic RNA (gRNA) is thought to sit (on the outside of the ribonucleoproteins) (Chen et al. 2007). The negatively charged phosphate backbone of the gRNA is predicted to sit in teh grrove with the bases projecting outwards but protected by aromatic residue side-chains that intercalate between them. The virus needs to package its large gRNA into a very small volume and it has been suggested that to do this the gRNA must be supercoiled and thus the ribonucleoprotein complex must be a very flexible structure that twists and folds on itself. Single-stranded RNA can adopt more stable conformations than double-stranded DNA, which is more rigid molecule. Cryo-electron microscopy reveals material in the centre of the virus, though it is likely (and generally considered) that the genome is most compact near the periphery: packing nucleotides in the very central part of a virus imposes problems since the RNA would have to be folded very tightly and entropy opposes such conformations (that is they require more energy to achieve) and a limit is reached before the nucleic acid will break when bent too tightly. It is possible that some of the material inside the virion could be additional viral proteins.

The final structure has a well-organised architecture with the base of each spike (S-trimer) surrounded by four N subunits. The S and M proteins interact as do the M and N proteins resulting in a final stoichiometry ranging from 1S-trimer : 16M : 4 N to 1S-trimer : 25M : 4N. The viral components thus have a well-organised architecture and arrange into fairly precise lattices (Neuman et al., 2006).

The virus progeny must eventually escape from the infected cell before it is destroyed and as it's resources are consumed. The virus needs to find new host cells to feed upon. The spikes are embedded in an envelope, consisting of a phosopholipid bilayer, stolen from the internal membrane systems of the host cell the virus was produced in and escaped from. The virus acquires its envelope before escaping from the host cell (in some viruses an envelope is acquired when escaping across the plasma membrane by budding) when budding from the ER-Golgi compartment of the cell. In the case of coronavirus, escape is by exocytosis of the enveloped virion. The mechanism of budding in coronavirus does not use the usual ESCRT machinery of the host cell, it is ESCRT-independent budding. This is where the E protein likely plays a role. This protein is a pentamer, five sub-units spanning the viral envelope with a central channel between them: it is a cation channel, allowing positive ions to flow across it when open; it likely acts as a viroporin. In ESCRT-independent budding (see ViralZone: the membrane of the host cell compartment encloses around the virion with the help of viral ion channels called viroporins which allow an influx of positive ions to occur at the point where the membrane must fuse. This influx of ions depolarises this region of the membrane, removing electrostatic repulsion and so lowering the energy barrier for membrane fusion. (Function of E-protein: Ruch and Machamer 2012; Nieto-Torres et al. 2014; Surya et al. 2018).

Evidence suggests that the E-protein also plays a role in commandeering the host cell by overcoming host defenses, in particular it appears to inhibit apoptosis (the host cell's self-destruct mechanism which is triggered whenever the cell becomes compromised, such as by virus infection). It also appears to play some role in trafficking the virions through the cell's secretory pathway (through the endoplasmic reticulum and Golgi apparatus). (See review by Ruch and Machamer, 2012).

In the diagram of coronavirus above, which is based on the structure as represented in the ViralZone database, E pentamers are shown at one end of the virion. This represents the basal end where it separated from the host cell membranes. This process also helps the final packing of the genetic material by helping to close the virion into a spheroid. Prior to budding the virion is assembled as a two-dimensional array, using the host membrane as a scaffold. In particular, the M protein organises this region of the membrane; host membrane proteins are excluded and M interacts with N (this interaction is shown by a conformational change in the model as the M protein tail contacts the N protein directly beneath the viral envelope. M also interacts with S and these interactions ensure the correct spacing and stoichiometry of these components: the virion is well ordered. Other proteins possibly incorporate into the viral envelope, such as 3a and 7a and some strains of coronavirus also have additional smaller spikes formed of hemagluttinin-esterase (HE) which may assist attachment to host cells but also degrades carbohydrate chains in the extracellular matrix to allow escaping virus particles to diffuse freely and reach new targets.

Protein 3a is an envelope protein that possibly forms a tetramer that may function as an ion channel and may have a role in budding (Lu et al. 2006). The respective roles of E and 3a in budding is not clear.

The N protein packages the RNA genome. Analysis of the crystal structure of the N protein suggests that the N units form N-N dimers, about 4.5 nm in length, which may then interact in pairs to form tetramers and octamers and thus forming a helix of stacked tetramers. Spiral grooves of basic subunits exposed on the outer surface of these assemblies are thought to accommodate the viral RNA which may sit in the grooves with its phosphate backbone inside and aromatic amino acid residues are thought to intercalate with the RNA bases to protect them.

Below: the coronavirus replication cycle

Coronavirus replication cycle

The virus adheres to the surface of the host cell (step 1). This is likely to be be a two-step process, but one key step is specific adhesion of the S spike protein to ACE2 (Angiotensin Converting Enzyme 2) a protein in the host cell membrane acting as a receptor. ACE2 has a normal role in lung tissue in helping to regulate blood pressure, but also suppresses inflammatory response in the lungs as part of a two-component control system with ACE1 which up-regulates inflammation. Binding of the  virus to ACE2 down-regulates this receptor and may be responsible, at least in part, to the severe immune reaction that apparently damages lung tissue in worse-case patients (Zhang et al. 2020).Binding to ACE2 triggers endocytosis of the attached virion (step 2). the host cell attempts to digest the virion, and the presence of a host cell protease within the endocytic vesicle is thought to activate the S-protein membrane fusion mechanism, fusing viral and endocytic membranes and triggering uncoating of the virus and release of its genetic material into the cell (step 3). The +ssRNA genome is immediately translated by host cell ribosomes (purple) to make early proteins such as viral replicase (step 4). Viral replicase uses the +ssRNA as a template to make minus-stranded full-length genomic ssRNA and shorter subgenomic -ssRNAs (5). These negative-sense RNAs are then used as templates to make more copies of the viral genome and subgenomic viral messenger RNAs (step 6).

Coronavirus replication cycle

The subgenomic+ssRNAs are used to make the other viral proteins (step 6). the virus modifies the internal membrane system of the host cell to make a virus factory (step 7). Viral proteins assemble on the endoplasmic reticulum (ER) which matures into the endoplasmic reticulum Golgi intermediate compartment (ERGIC) where viral genomic +ssRNA is packaged with the proteins, aided by the nucleoprotein compacting the genetic material (step 8). Completed virions bud off inside Golgi vesicles (step 9) which fuse with the host cell's surface membrane by exocytosis to release the new virion.

The coronavirus family (Coronaviridae)

All the SARS strains of coronavirus are Betacoronaviruses. Alphacoronaviruses  may cause less serious infections in humans and may also infect animals. Gammacoronaviruses have been found in birds and Beluga Whales and Deltacoronaviruses are also found in birds. Less closely related is the Torovirus subfamily. Torovirus infects animals and humans and may cause fecal-oral gatroenteritis in humans.

Torovirus in section

The toroviruses are interesting due to their unusual morphology. Although their genetic sequences differ substantially from the coronavirus subfamily, their general genetic architecture, replication and transcription are sufficiently similar for them to be included in the Coronaviridae family. Torovirus packages its 28 kb genome into a more compact and distinct helix than in coronavirus. This helix is rod-shaped inside the host cell but assumes its curved toroidal shape upon budding. This torus-like ribonucleoprotein particle gives the virus its name and also determines its shape. Shape in toroviruses varies, but the envelope generally fits quite tightly around the ribonucleoprotein torus such that the final virion is generally a biconcave disc or C-shaped (both shown in section above, in both plan and side-view). At least some strains also have the smaller HE spikes (shown in the diagram above on the right side of the virion only, though distributed throughout when present). The virion is a similar size as for coronavirus (about 120 nm diameter).

The ssRNA(+) genome of torovirus encodes the following (likely depending on strain):


Again a ribosomal frameshift allows the synthesis of the 1ab polyprotein (pp1a)  in addition to the 1a polyprotein (pp1a) and these polyproteins are again processed to form non-structural proteins involve din replication, such as the viral polymerase (rdRp). Four subgenomic RNAs encode the following:

1) mRNA-1 encodes S

2) mRNA-2 encodes M

3) mRNA-3 encodes HE

4) mRNA-4 encodes N

There is also a possible 5th subgenomic RNA of unknown function.

Coronavirus evolution - FAQ: Is a vaccine possible?

Coronaviruses are indeed one of the agents responsible for common colds and no vaccine exists for the common cold, so can one vaccinate against (or develop antivirals to) SARS CoV-2? One might argues that if one can not vaccinate against the common cold then one can not vaccinate against SARS CoV. However, there are many more strains of virus causing common colds than the serious SARS strains and so we do not need a vaccine that works against all strains. Yes coronaviruses mutate rapidly, but so does influenza and any vaccine would need updating at frequent intervals. However, it is potentially possible to develop a vaccine against the more serious strains particularly by targeting an essential system of the virus that evolves relatively slowly.

Here is an example of one comparison between two different SARS-CoV-2 virus isolates (one from Wuhan, China, the other from the USA) - download text file.

When a protein is absolutely essential to a virus, then the key component of that protein generally evolves slowly, since most mutations would render the virus ineffective and so there is less 'room for flexible change'. Clearly the immune system can combat SARS CoV and it is known that antibodies are produced that target the S glycoprotein. Certain sequences on the S protein are essential for its function, such as binding to the host receptor and the fusion peptide mechanism. In viruses in general, these essential sequences can change and evolve but generally more slowly than many other regions of the same protein - they are relatively conserved sequences. A vaccine could potentially consist of the necessary conserved peptides (assuming they fold into the correct shape and can be administered without adverse side-effects). A requirement is that the region being targeted is exposed on the viral surface so that antibodies generated by a vaccine can reach the target. For less accessible targets drugs with an appropriate delivery system can potentially be developed. Another potential antiviral target are the polyprotein-processing enzymes (the equivalent have been useful targets in HIV therapy, see for a brief overview).

The importance of finding sequences that are both relatively conserved and critical to viral function and accessible to drug action can not be overstated. SARS CoV-2 is already mutating rapidly with the genetic sequence of strains differing between isolates from different countries (Longxian et al. 2020; Phan 2020). the virus is closely related to strain RaTG13 from bats. There are signs that the virus has undergone both positive (enhancing beneficial traits) and negative (purifying opr removing harmful traits) natural selection since jumping to a human host (Longxian et al. 2020). The receptor-binding domain (RBD) of S1 has diverged considerably from RaTG13 which may possibly be the result of genetic re combination between RaTG13 and another unidentified coronavirus (Longxian et al. 2020). there is evidence that the virus is already evolving in response to antiviral drugs (see EDITORIAL - Virus against virus: a potential treatment for 2019-nCov(SARS-CoV-2) and other RNA viruses. Cell Research (2020) 30:189-190; Thus, although a vaccine is possible to develop an effective vaccine is no easy task.

With more research into the molecular systems of SARS CoV, further potential drug targets may emerge. For example, the viral polymerases or other viral RNA processing enzymes, as long as the target is sufficiently different from host proteins so as to limit side-effects. Targeted delivery to infected cells could potentially reduce drug toxicity.

Additional modes of therapy may include using an engineered virus to combat SARS CoV-2 ( by using the CRISPR/Cas13d system, a technique developed from the natural ability of bacteria to develop immunity to their own viruses, to rapidly generate modified adeno-associated virus particles that target and destroy CoV RNA. These modified virus particles could carry manufactured guide RNAs and the RNA-editing enzyme CAS13d. The guide RNAs would locate CoV RNA and then CAS13d would destroy them. See also: This approach could potentially keep pace with the rapid evolution of this virus. Different variants that have evolved different genetic sequences have already arisen.

Another approach being considered makes use of a bioengineered adenovirus vector. This approach has been successful in immunizing against rabies and the bacterium Mycobacterium tuberculosis (Ronan et al., 2009. PloS One 4: e8235) in animal models. A cargo gene encoding the antigen to generate immunity against, e.g. the spike glycoprotein of rabies or coronavirus, replaces one or more genes of the adenovirus. Adenovirus genetic material enters the host cell's nucleus where there is the possibility of it becoming integrated into the host cell's genetic makeup. However, this happens only rarely since adenovirus lacks the necessary machinery to integrate genetic material into the host. Addition of retroviral integration machinery allows the cargo gene to be integrated into the host cell's DNA where the gene can be be expressed, resulting in synthesis of the desired pathogen protein.

The host cell processes this protein antigen and presents fragments of it to the immune system. This is particularly effective at activating a key antiviral component of the immune system: cytotoxic T cells (generating cell-mediated immunity, CMI). Potential issues have been flagged, in particular whether such a vaccine could be carcinogenic. Careful consideration of which genes to knock-out from the adenovirus may possibly minimise this risk. There is also the alternative of allowing the adenovirus vector to be able to replicate or not. Poxviruses have also been researched as an alternative vector in this type of vaccine. Pre-immunity to adenovirus may also impair the effectiveness of such vaccines, though this can be minimised by a suitable choice of serovar.

FAQ - Was SARS CoV-2 manufactured?

Not necessarily. It is certainly closely related to a bat strain and has and is currently undergoing rapid natural evolution. However, it has been suggested that its ability to bind the human receptor and hence be an effective parasite in humans was the result of recombination with an unknown strain. If true, then this could have been a natural event, e.g. if the bat strain infected the same individual as a human CoV strain, or vice versa. However, it could also be the result of laboratory manufacture. In short the genome of SARS-2 coronavirus appears natural and is mutating and evolving is it spreads around the world, but an initial artificial construction or deliberate release of a natural strain can not be ruled out. The debate on whether a hybridization or recombination between two strains gave rise to SARS-CoV-2 is currently being bashed out in the scientific literature but since regions of this virus have evolved very rapidly it may be hard to settle with certainty. There remains, to date, no evidence that the virus was manufactured and ample evidence that its mutations are compatible with natural mutation. The question of what kind of contact between bats and humans was responsible for its jump into humans, however, remains unanswered.

The hypothesis that the virus escaped from a lab in Wuhan was put forward in a Chinese research paper that was later withdrawn and was based upon circumstantial information. This paper hypothesized that this escape was accidental and involved a non-engineered strain isolated from a natural source (as such research was ongoing at the time). This is a plausible hypothesis (media attempts to dismiss the idea on the notion that 'scientists have shown the virus not to be an engineered strain' are fallacious, since such a discovery does not rule out a natural strain escaping from a lab). However, this hypothesis remains unverified. This paper also cast doubt on the notion that the food markets of Wuhan were the epicenter of the first animal to human transition event. The exact origins of the virus remain unknown and can not be investigated for political reasons: the Chinese government has resisted calls for an independent investigation, an unfortunate if understandable position.

Finding the Origins of SARS-CoV-2

One of the chief characteristics of coronaviruses is their diversity. In large part this comes about by a process of recombination. The RNA-dependent RNA-polyerase (RdRp) that coronaviruses use to replicate their genomic material inside the host cell by using genomic RNA molecules as templates to make more copies of these molecules, has the interesting property of easily jumping from one RNA molecule to another part-way through duplication. This means that the final newly synthesized genome can incorporate part of its sequence from one template, another part from a second template, and so on. Now, if a cell is infected by a single strain of the virus this has no consequence since all the template molecules will be identical. However, if two different strains co-infect the same cell then recombination may result in a hybrid virus taking segments of its genetic material from each parent. This is one mechanism of generating new strains and species within these viruses. (Incidentally, the fact that viruses contain genetic information that evolves is the key reason I personally class viruses as living systems: the notion that viruses are not living simply makes no sense).

Phylogenetics uses mathematical methods to examine related genetic sequences and construct the most probably evolutionary tree connecting them. It enables scientists to see which organisms are most closely related and hence to infer evolutionary pathways. It has been shown that the evolutionary family tree for SARS-CoV-2 depends on the region of its genome analyzed: due to past recombination events the genome of SARS-CoV-2 is a mosaic of segments inherited from different ancestors (see review by Sallard et al., 2021).

The bulk of of the SARS-CoV-2 genetic sequence is very similar to bat viruses infecting the bat genus Rhinolophus but the key region that enables infection of human beings, that encoding the S1 spike protein (the 'key' that binds to the ACE receptor to gain access to the host cell) is more closely related to that in pangolin coronaviruses. It was the acquisition of this S1 region that made it much easier for the virus to infect humans. However, the relationship to known pangolin viruses is not very close, suggesting that the immediate parent virus is of a type unknown. This suggests that the strain underwent considerable further changes, from known strains, either before or after its initial transmission to a human host. Such transmission could have occurred either directly from a bat or via a pangolin (or other?) intermediate host.

An alternative hypothesis to natural recombination is that the virus acquired its S1 region under selective pressure in a lab. This could have occurred, for example, during 'gain of function' studies in which the virus may have been grown for many generations on cultured human cells in order to better understand the disease process in humans. Coronaviruses can sometimes infect human cells in culture much more easily than they can infect the human host. This is one way a new strain of a virus can be produced without leaving the tell-tale signs of 'cut-and-paste' genetic engineering (such as endonuclease restriction sites). However, there is no concrete evidence to support the lab-origin theory at present.

Both the natural origin and lab-leak theories are difficult to prove conclusively. I doubt that science will be able to pinpoint the origin of this virus with reasonable certainty. Initially I was optimistic that science could resolve this issue, but current data is simply insufficient.

Coronavirus spike protein

The structure of a betacoronavirus spike (S) trimer modeled from a sequence of a hegehog strain (NCBI: YP_009513010.1, Novel CoV related to MERS from European hedgehogs. PMID: 24131722. Corman, V.M., Drexler, J.F. and Drosten, C. 2014. J. Virol. 88(1): 717-224). The monomer was modeled in Phyre 2 and the trimer constructed in SymmDock. The view above is looking down onto the globular head, showing its 'triangle of triangles' geometry. Below: side-view showing the S2 stem domains and the flaring out of the tails predicted to anchor the spike beneath the virus envelope.

Coronavirus spike protein

Vaccine Development

A number of vaccines are being rapidly developed in the hope that such vaccines will at least impart partial immunity making it hard for the virus to grow within a vaccinated population. The approaches are diverse, but one of these is the approach I would have considered taking: the vaccine undergoing development by AstraZeneca and the University of Oxford, UK. The stated aim of the vaccine is generate immunity against the virus spike protein to prevent the virus gaining entry to host cells. It is a DNA vaccine (though RNA could potentially be used as an alternative) and makes use of an engineered adenovirus vector carrying a DNA plasmid with the virus spike gene (or perhaps a part of it) inserted. The adenovirus will infect host cells but without causing disease (there is the option of allowing the virus to replicate or not) but inserting its DNA cargo into the host cell cytosol. Once inside the spike gene will be transcribed into mRNA and translated into spike proteins that should be expressed on the modified host cell's surface. This can not only trigger antibody production by B cells, but can also stimulate a strong antiviral response from T cells that recognize foreign proteins expressed on the surface of cells 9since this is an indication of viral infection).

Such a vaccine of course invites controversy. It has been claimed that the vaccine genetically engineers cells, claims rubbished by mainstream media. The truth of course is more subtle. Yes new genes are injected into a certain fraction of host cells of a specific type (such as mucosal epithelium cells) so the cells have been 'genetically modified' but they have not been 'genetically engineered' in the strict sense. Genetic engineering is the attempt to introduce a permanent change into an organisms genetic material. This would require the plasmid to insert into the host's own DNA in the cell nucleus. Should such a cell replicate then the inserted DNA would be copied, along with the host's own DNA, and passed on to each daughter cell. The plasmids in DNA vaccines are not designed to do this (though it remains an option) and only persist within the cell for a certain length of time (plasmid half-life is generally several hours in mammalian cells). Nevertheless, there is always a low probability that the plasmid will accidentally become incorporated into a host cell, affecting accidental genetic engineering. this is a potential cancer concern as such cells have a low probability of becoming cancerous. This risk can be mitigated, e.g. by using suicide vectors that cause any cell modified in this way to apoptose (self-destruct).

Plasmid vectors are natural to bacteria, where plasmids may persist and replicate along with the cell and are generally derived from bacterial plasmids. This raises another issue: bacterial DNA can trigger an immune response, resulting in anti-DNA bodies being produced by the host resulting in autoimmune disease. This can be mitigated by using as little bacterial DNA in the palsmid as possible.

Vaccines may have side-effects and these side-effects are not always picked up until after vaccine deployment. DNA vaccines have yet to be deployed in human medicine. There is a potential problem with public acceptance of vaccines. This has, in my considered opinion, resulted from the widening gulf between governments and their peoples: a gulf which is clearly evident around the Earth today. Mass surveillance, the abuse of technology by governments, media hype and bad policing decisions have contributed to an air of mistrust and trust of government authorities is evidently very low, across the globe. The current strategy seems to be to discredit and silence critics and doubters, a strategy I expect will backfire and stiffen 'conspiracy theories' and increase resistance. I think an open approach in which the risks of vaccination and the steps taken to mitigate these risks should be discussed and the detailed contents and mechanisms of any vaccine clearly and completely explained. Secrecy strengthens 'conspiracy theories', and rightly so. People should then be allowed to make an informed decision: I think it would be a violation of medical ethics to make vaccination compulsory. Vaccination should remain optional.

Coronavirus computer model

Above: A 3D model of coronavirus with the matrix protein incorporated into a semi-regular array as reported in the literature (data on the precise geometrical arrangement of the matrix protein are not available). The smaller spheres making up the envelope are the phospholipids, the larger spheres the M protein. The E protein pentamers are shown in yellow. It is difficult from the current literature to ascertain the density of the M protein, and at least one popular model has deliberately underestimated M protein density for clarity. Below is a version of our model with a lower M protein density.

Coronavirus computer model

Koch's Postulates and SARS-CoV-2

What evidence is there that SARS CoV-2 causes the disease known as COVID-19?

In 1890, Robert Koch published his four postulates for verifying a microorganism as a causative agent of disease.

  1. The microorganism should be abundant in all individuals with symptoms of the disease, but not in healthy individuals.
  2. The microorganism should be isolated from diseased individuals and grown in pure culture.
  3. The cultured microorganism should cause disease when introduced into a healthy individual.
  4. The microorganism should be isolated from the experimental subject from (3) and shown to be identical to the original microorganism.

These postulates are not absolute, for example, asymptomatic carriers may be apparently healthy whilst still carrying the disease, requiring postulate (1) to be relaxed. It is said that Koch's postulates can not be applied to viruses since viruses can not be grown in pure culture: they require host cells in which to replicate. Nevertheless, viruses can be co-cultured with cells. Let us look at what has been done with CoV-2, a good example is the study by Imai et al., 2020, but there are a number of others. In this study two clinical isolates of coronavirus were taken from individuals showing mild symptoms of COVID-19, one in Tokyo, the other in Wisconsin. These were cultured by growing them on VeroE6/TMPRSS2 cells (an immortalized cell line derived from African Green Monkey kidney endothelial cells which supports the growth of many viruses well, including CoV-2) and, perhaps more significantly, cultured human lung cells (from the alveoli). One of the lung cell lines supported rapid growth of the virus. It has been hypothesized that the variability in susceptibility of different lung cell lines to CoV-2 is due to variability in expression of ACE2, the receptor for the virus spike glycoprotein.

Electron microscopy studies show the virus replicating within the cells and demonstrated that it was indeed a coronavirus. In my view this satisfies the first and second of Koch's postulates convincingly well (even if the culture was not 'pure' in the sense that it required a coculture of host cells).

What about postulates 3 and 4? The same group (Imai et al., 2020) inoculated the cultured virus into Syrian Hamsters (which other groups have shown to be a potentially suitable animal host for the virus) intranasally with varying doses. The hamsters developed a disease resembling human COVID, with substantial lung lesions (and infection of the brain which is currently suspected to occur in at least some people). The time course for the disease was similar to that in humans, with the animals largely recovering after 2 to 3 weeks. The virus was isolated from respiratory and brain tissues of the infected hamsters. Furthermore, reinfection of recovered animals showed that the animals had developed immunity. In conclusion, studies such as this provide pretty conclusive evidence that SARS CoV-2 is a causative agent of respiratory disease in humans.

Antibodies to Coronavirus

Why are antibodies against the spike protein effective?


Anand,K., Ziebuhr, J., Wadhwani, P., Mesters, J.R., Hilgenfeld, R. 2003. Coronavirus Main Proteinase (3CLpro) Structure: Basis for Design of Anti-SARS Drugs. Science 300: 1763-1767.

Cavanagh, D. 2006. Coronaviridae: a review of coronaviruses and toroviruses. Coronaviruses with special emphasis on first insights concerning SARS: 1-54.

Chen, C.-Y., Chang, C.-K., Chang, Y.-W., Sue, S.-C., Bai, H.-I, Riang, L., Hsiao, C.-D. and Huang, T.-H. 2007. Structure of the SARS coronavirus nucleocapsid protein RNA-binding dimerisation domain suggests a mechanism for helical packaging of viral RNA. J. Mol. Biol. 368: 1075-1086. doi:10.1016/j.jmb.2007.02.069

de Groot, R.J. 2006. Structure, function and evolution of the haemagglutinin-esterase proteins of corona- and toroviruses. Glycoconj J. 23: 59-72.

He, Y., Lu, H., Siddiqui, P., Zhou, Y., and Jiang, S. 2005. Receptor-Binding Domain of Severe Acute RespiratorySyndrome Coronavirus Spike Protein Contains Multiple Conformation-Dependent Epitopes that Induce Highly Potent Neutralizing Antibodies. J. Immunol. 174: 4908-4915.

Imai, M., Iwatsuki-Horimoto, K., Hatta, M., et al., 2020. Syrian hamsters as a small animal model for SARS-CoV-2 infection and countermeasure development.  PNAS: 117(28): 16587-16595.

Johnson, M. A., Chatterjee, A., Neuman, B. W., & Wüthrich, K. 2010. SARS coronavirus unique domain: three-domain molecular architecture in solution and RNA binding. Journal of molecular biology 400(4): 724–742.

King, A.M.Q., Adams, M.J., Carstens, E.B. and  and Lefkowitz, E.J. (eds). 2012. Virus taxonomy: classification and nomenclature of viruses, Ninth Report of the International Committee on Taxonomy of Viruses. Elsevier AP (pub). ISBN : 978-0-12-384684-6

Koopmans, M. and Horzinek, M.C. 1994. Toroviruses of animals and humans: a review. Advances in Virus Res. 43: 233-273.

Kuo, L., Koetzner, C.A. and Masters, P.S. 2016. A key role for the carboxy-terminal tail of the murine coronavirus nucleocapsid protein in coordination of genome packaging. Virology 494: 100-107.

Longxian, Lv, Gaolei Li, Jinhui Chen, Xinle Liang, Yudong Li, 2020. Comparative genomic analysis revealed specific mutation pattern between human coronavirus SARS-CoV-2 and Bat-SARSr-CoV RaTG13

Lu, W., Zheng, B.-J., Xu, K., Schwartz, W., Du, L., Wong, C.K.L., Chen, J., Duan,S., Deubel, V. and Sun, B. 2006. Severe acute respiratory syndrome-associated coronavirus 3a protein forms an ion channel and modulates virus release. PNAS 103(33): 12540-12545.

Masters, P.S. 2006. The molecular biology of coronaviruses. Advances in Virus Res. 66: 193-292. DOI: 10.1016/S0065-3527(06)66005-3

Masters, P.S. 2019. Coronavirus genomic RNA packaging. Virology 537: 198-207.

Neuman, B.W., Adair, B.D., Yoshioka, C., Quispe, J.D., Orca, G., Kuhn, P., Milligan, R.A., Yeager, M. and Buchmeier, M.J. 2006. Supramolecular Architecture of Severe Acute Respiratory Syndrome Coronavirus Revealed by Electron Cryomicroscopy. J. Virol. 80(16): 7918-7928.

Nieto-Torres, J.L., DeDiego, M.L., Verdia-Baguena, C., M. Jimenez-Guardeno, J.M., Regla-Nava, J.A., Fernandez-Delgado, R., Castano-Rodriguez, C., Alcaraz, A., Torres, J., Aguilella, V.M., Enjuanes, L. 2014. Severe acute respiratory syndrome coronavirus envelope protein ion channel activity promotes virus fitness and pathogenesis. PLOS Pathogens 10(5): e1004077.

Phan, T. 2020. Genetic diversity and evolution of SARS-CoV-2. Infection, Genetics and Evol. 81: 104260.

Ruch, T.R. and Machamer, C.E. 2012. The coronavirus E protein: assembly and beyond. Viruses 4: 363-382. doi:10.3390/v4030363 

Sallard, E., Halloy, J., Casane, D, Decroly, E. and Helden, J.v. 2021. Tracing the origins of SARS‑COV‑2 in coronavirus phylogenies: a review. Environmental Chemistry Letters (2021) 19:769–785;

Sawicki, S.G. and Sawicki, 1998. D.L. A new model for coronavirus transcription in Coronaviruses and Arteriviruses, edited by Enjuanes et af. Plenum Press, New York.

Smits, S.L., van Vliet, A.L.W., Segeren, K., el Azzouzi, H., van Essen, M. and de Groot, R.J. 2005. Torovirus non-discontinuous transcription: mutational analysis of a subgenomic mRNA promoter. J. Virol. 79(13): 8275-8281.

Snijder, E.J. and Horzinek, M.C. 1993. Toroviruses: replication, evolution and comparison with other members of the coronavirus-like superfamily. J. Gen. Virol. 74: 2305-2316.

Su, D., Lou, Z., Sun, F., Zhai,Y., Yang, H., Zhang, R., Joachimiak, A., Zhang, X.C., Bartlam, M. and Rao, Z. 2006. Dodecamer structure of severe acute respiratory syndrome coronavirus nonstructural protein nsp10. J. Virology 80(16): 7902-7908. DOI: 10.1128/JVI.00483-06.

Surya, W., Li, Y. and Torres, J. 2018. Structural model of the SARS coronavirus E channel in LMPG micelles. Biomembranes 1860: 1309-1317.

Sutton, G., Fry, E., Carter, L., Sainsbury, S., Walter, T., Nettleship, J., Berrow, N., Owens, R., Gilbert, R., Davidson, A., Siddell, S., Poon, L.L.M., Diprose, J., Alderton, D., Walsh, M., Grimes, J.M., Stuart, D.I. 2004. Structure 12(2): 341-353. ISSN 0969-2126, (

Thiel, V., Herold, J., Schelle, B. and Siddell, S.G. 2001. Viral replicase gene products suffice for coronavirus discontinuous transcription. J. Virol. 75(14): 6676-6681.

Xu, Y. Lou, Z., Liu, Y., Pang, H., Tien, P., Gao, G.F., and Rao, Z. 2004. Crystal Structure of Severe Acute Respiratory Syndrome Coronavirus Spike Protein Fusion Core. J. Biol. Chem. 279(47):  49414-49419.



Article history

Created on 17 March 2020

Updated: 18 Mar 2020, 19 Mar 2020, 21/3/2020, 11 April 2020, 18 April 2020, 22 May 2020,                         15 July 2020, 3 June 2021