The Molecular Architecture of Viruses
Above: a podovirus, a virus which infects bacteria (a bacteriophage or phage).

Viruses are superb molecular nanomachines! They are truly minute, many around 100 nm or
one ten millionth of a millimetre in diameter and yet they have considerable structure. Recall
that they are essentially protein shells called capsids enclosing genetic material (RNA or DNA,
depending on virus type). This genetic material contains a biological computer program which
reprograms the infected cell to make more copies of the virus. The sole function of the virus
particle or virion is to deliver this genetic program to a suitable host.

A study of their form and function is an excellent way to convey many aspects of molecular
biology and biological physics. Such a study conveys a strong sense of the adaptable and
mechanical nature of proteins and how the genetic code links to protein form and function.
Many aspects of virus biology can and have been modeled by the application of physics,
especially thermodynamics, from assembly, DNA packaging, DNA injection and membrane
fusion. These make excellent student projects and for this reason I wont give details here! I will
give but one example in brief: I recently had a group of students calculate the entropy of virus
capsid assembly using exact calculations, which is quite an achievement due to the immense
numbers (factorials) involved, but made possible by modern computing technology.

Advances in scientific methods have made possible detailed analyses of virus structure and
function. For example, cryo-EM (EM = electron microscopy) in which samples are embedded
and frozen rapidly, e.g. in liquid nitrogen, and then sectioned and imaged. Freezing removes
artifacts introduced by chemical fixation (the bonding of fixatives to such tiny structures may
distort them on a molecular scale and lowers resolution). Similarly, using chemical stains to
better visualise sections also distorts structures, but digital processing of images removes the
need to use a stain in many cases. This allows a visualise of structures almost intact and as
they would appear in life, but frozen in time. Many particles are visualised and then a computer
constructs an average image (thus increasing signal to noise ratio). The computer can also
stack imaged sections to reconstruct a 3D representation.
Virus Particle Assembly: packaging genetic material

The above cutaway of the podovirus P22, which infects Salmonella bacteria, was modeled in
Pov-Ray based loosely on data published by Lander
et al. (2006) obtained from cryo-EM
studies. The following is a summary of the stages in P22 virion (virus particle) assembly; 'gp'
means 'gene product' and refers to the various viral proteins, each of which has a designated
number, e.g. gp5 (blue) forms the main protein shell or
capsid enclosing the DNA (which is
double-stranded DNA in this virus). Some of these proteins are structural, forming the body of
the virion, as shown above, whilst others are functional, assisting in the reproduction of virions
but not forming a part of the virion infectious particle itself. The capsid of P22 is about 60 nm in

One corner or vertex of the capsid is open and here is inserted the
portal complex (in red)
which is formed from 12 copies of gp1 and has a 12-fold axis of symmetry. This creates a
symmetry mismatch with the 5-fold symmetry of the open capsid vertex in which the portal
complex is inserted. This complex allows DNA (green) to enter and leave the capsid. The DNA
is wound round as if on a spool. The DNA towards the capsid wall forms three well-defined
close-packed layers of DNA which is almost crystalline. A simple calculation can be done which
predicts a pressure inside the capsid of about 20 atmospheres of pressure. DNA is negatively
charged (it has ionised phosphate groups along its backbone) and these charges repel, such
that it takes considerable force to pack naked DNA so close together. Viruses utilise
molecular motors to package their DNA so tightly. It has been suggested that the portal
complex rotates as DNA passes through it during packaging, but this has not been proven.
The DNA in the central region of the capsid is less tightly packed, possibly because DNA is
reluctant to curve around in circles that are too small and too tight.

The portal complex has been shown to have at least two distinct conformations. Proteins can
adopt different stable shapes or conformations depending on how they interact with other
molecules. Changes in conformation involve movements of electric charge through the protein
structure, causing parts of the amino acid chain making up the protein to flex or rotate as the
protein changes into another stable form. This likely involves quantum tunneling and
conformational change in a protein is quite possibly a quantum mechanical event. When free,
the portal complex has a different conformation than when it is attached to a packaged virion
et al., 2006). One possible interpretation of this is an open and closed state.

Assembly of the P22 virion

Step 1
About 415 copies of the capsid protein gp5 assembles a procapsid with the help of about 300
copies of the scaffolding protein gp8; gp1 forms the
portal complex.

Step 2
The gp3/gp2 DNA packaging/terminase complex assembles on the gp1 portal complex and
loads the procapsid with viral DNA through the gp1 portal; gp2 is the large subunit of the
complex and is an ATP-powered molecular motor.

Step 3
Once the capsid is full, it has been suggested that electrostatic repulsion from coils of DNA
surrounding the portal complex triggers a conformational change (to the high pressure state)
closing the portal. The packaging complex stops loading DNA and gp2 cuts the DNA (the viral
DNA is produced as a
concatemer or several copies joined together in series); slightly more
than one single copy of the 41.7 kbp genome is loaded (1 kbp = 1000 DNA base pairs) with
each capsid head holding about 43.5 kbp. This strategy of packing the head until it is full is
headful packaging. The procapsid expands into a larger, more icosahedral and
thinner walled mature capsid.

Step 4
The gp3/gp2 complex dissociates and the tail complex proteins gp4 and gp10 attach to the
portal complex, possibly helping to close it. (Lander
et al., 2006, identified additional material
blocking the channel for DNA ejection through the tail, which could be a protein. This is shown
in grey in the picture above).

Step 5
Six trimers of gp 9 attach (trimer = group of 3 proteins bound together in a specific
conformation) to the tail complex. These tail spikes are the 'legs' of the bacteriophage (these
are not locomotory but involved in adhesion to a target cell prior to infecting it by injecting the
viral DNA into the target cell through the needle which is formed of gp26.

Ejection Proteins (E Proteins)

Viruses sometimes need to inject several proteins into their host along with their genetic
material. In P22, an estimated 12 copies of gp7, 12 copies of gp16 and 30 copies of gp20 are
incorporated into the virion. These are ejected from the virion along with the DNA during the
infection process (along with a fourth protein: gp26). Of these, gp16 and gp26 are directly
involved in DNA and protein ejection from the capsid. Proteins gp4, gp10 and gp26 plug the
portal in the packaged virion, but release the blockage during DNA injection. The role of gp26
is to penetrate the host cell membrane, allowing the viral DNA and ejection proteins to enter.

In the diagram above, the ejection proteins (in purple) are shown situated in a cylinder just
above the gp1 portal as suggested by Lander
et al., 2006. This is speculative, cryo-EM gives
the basic arrangement of matter (and elemental analysis can be used identify the make-up of
the atoms giving rise to the EM image) but identifying which protein is rich is problematic.
Another research group (Olia
et al. 2011) used Lander et al.'s cryo-EM data but carried out
X-ray crystallography on isolated gp1 complex to determine its shape and then superimposed
this onto the EM data and arrived at a different model summarised by the diagram below:

Arunmanee W., M. Pathania, A.S. Solovyova, A.P. Le Brunc, H. Ridley, A. Baslé, B. van den
Berg, and J.H. Lakey, 2016. Gram-negative trimeric porins have specific LPS binding sites that
are essential for porin biogenesis.
PNAS E5034–E5043.

Bartual, S.G., J.M. Otero, C. Garcia-Doval, A.L. Llamas-Saiz, R. Kahn, G.C. Fox and M.J. van
Raaij, 2010. Structure of the bacteriophage T4 long tail fiber receptor-binding tip
Proc. Natl.
Acad. Sci. U.S.A.
107: 20287-20292.

Basle, A., G. Rummel, P. Storici, J.P. Rosenbusch and T. Schirmer. Crystal structure of
osmoporin OmpC from
E. coli at 2.0 A. J. Mol. Biol. 362: 933-942.

Ceres, P., S.J. Stray and A. Zlotnik, 2004. Hepatitis B virus capsid assembly is enhanced by
naturally occurring mutation F97L.
J. Virol. 78: 9538-9543.

Katen, S. and A. Zlotnik, 2009. The thermodynamics of virus capsid assembly.
455: 395-417.

Kelley, L.A., S. Mezulis,  C.M. Yates, M.N. Wass and M.J.E. Sternberg, 2015. The Phyre2 web
portal for protein modeling, prediction and analysis.
Nature Protocols 10: 845-858.

Lander, G.C., L. Tang, S.R. Casjens, E.B. Gilcrease, P. Prevelige, A. Poliakov, C.S. Potter, B.
Carragher and J.E. Johnson, 2006. The Structure of an infectious P22 Virion Shows the Signal
for Headful DNA Packaging. Science 312: 1791-1795.

Madej, T., C.J. Lanczycki, D. Zhang, P.A. Thiessen, R.C. Geer, A. Marchler-Bauer and S.H.
Bryant, 2014.. MMDB and VAST+: tracking structural similarities between macromolecular
Nucleic Acids Res. 42 (Database Issue): D297-303.

Olia, A.S, P.E. Prevelige Jr., J.E. Johnson and G. Cingolani, 2011. Three-dimensional structure
of a viral genome-delivery portal vertex. Nat. Struct. Mol. Biol. 18: 597-603.

Wu, W., J.C. Leavitt, N. Cheng, E.B. Gilcrease, T. Motwani, C.M. Teschke, S.R. Casjens,
A.C. Steven, 2016. Localization of the Houdinisome (Ejection Proteins) inside the
Bacteriophage P22 Virion by Bubblegram Imaging.
MBio. 7(4): e01152-16.

External links




Wolfram Language

Article created: 2 April 2018
Article updated: 7 April 2018
Article updated: 22 April 2018
Check back for updates
Above: the T4 bacteriophage. Note the long and jointed tail fibres and the needle-like 'feet'
which bind to molecular targets (LPS and OMPC) on the target cell. This phage infects the
Escherichia coli. Below is a molecular model of the foot. The globular collar (blue) is
proximal and is connected to the needle-like domain which ends in the head domain at the
bottom (green). This head domain is thought to fit into a pocket on the OMPC target protein.
OMPC is an outer membrane protein in the outer membrane of
Escherichia coli and forms
channel pores (it is a porin consisting of a trimer of OMPC subunits). The foot is a trimer of
Below a 'ribbon-view' of the same model showing the 7 iron ions (in orange) which occupy the
hollow core of the foot and hold the structure together. Each ion is bonded to 6 histidine
residues which surround it (2 from each chain).
Images taken from the 3D computer model provided by Bartual et al. (2010) and obtained
from the National Library of Medicine (NLM) MMDB database (Madej
et al. 2014). One of the
probable receptors for the T4 foot is the outer membrane protein OmpC, a view of which is
shown below. This model, as well as that for the foot, is shown as represented in UCSF
Chimera. The source file for OmpC was downloaded from the NCBI protein databank (PDB,
National Library of Medicine (NLM)) and was originally uploaded by Basle
et al. 2006 and
obtained by X-ray diffraction of crystallised OmpC. (The brown 'squiggles' are alkanes which
co-crystallised, presumably from the solvent; attempting to remove the solvent with Chimera's
dock prep tool was unsuccessful).
The T4 foot model also had some water solvent co-crystalised with it which was removed in
Chimera before docking using PatchDock (Duhovny
et al., 2002; Schneidman-Duhovny et al.
2005). This was an attempt to verify the findings of Bartual
et al. (2010). The distal end of the
foot (residues 932 to 959 on each of the three gp37 polypeptide chains). The highest scoring
binding mode, highest scoring in that it gave the best geometric shape complementarity score
(i.e. the best fit by matching the shape of the binding region on the foot with that on OmpC)
confirmed their result. This showed that the most favourable model is for the foot to fit into the
depression between the three OmpC subunits on the extracellular side.
OmpC belongs to a class of proteins called porins. Each subunit forms a barrel-like structure
and sits upright in the bacterial outer membrane (which contains phospholipids in its inner
leaflet and LPS in its outer leaflet) with the pore spanning the membrane, allowing molecules
that are small enough (and water soluble enough) to cross the outer membrane freely. Three
such pores fit together to form the proin molecule and the T4 foot tip docked preferentially in
the middle of the three trimers on their outer face.
A divalent cation, such as calcium, has been shown to bind to each porin subunit on its
outerside towards its outer face which acts as a binding site for the LPS lipids of the outer
membrane (Arunmanee,
et al. 2016). The model of OmpC we have used crystalised with a
magnesium ion in a similar position and this formed three electrostatic bonds with the T4 foot
(to lysine 945, glycine 942 and asparagine 959).
Bacteriophage Tail Fibres: binding to a potential host

Some bacteriophages have much longer tail fibres, such as the T4 bacteriophage.
The crystallographic analysis showed that the gp1 dodecameric ring had an upright tube
accounting for the matter (electron density) attributed by Lander
et al. (2006) to the ejection
proteins. Which model is correct? Let us model the proteins in Phyre2 (Kelley
et al., 2015) an
online tool which builds theoretical models of proteins based on their known amino acid
sequence. A single subunit of gp1 from a related virus, Salmonella phage ST160 (this virus is
in the podovirus family and the gp1 proteins within this family are all similar):
gp1 subunit
Projecting from the main body or 'hip domain' of the protein is a long barrel domain (top) and a
shorter leg domain (bottom right). The barrel consists of a single long alpha-helix (alpha-helices
are shown in red) whilst the hip consists mainly of alpha-helices with some beta-strands (blue).
gp1 dodecamer from above
gp1 dodecamer
gp1 dodecamer from below
gp4 dodecamer from above
Above, a gp1 dodecamer modelled by docking the monomer prepared in Phyre2 with
SymmDock ( The barrel at the top projects into the
Above: the gp1 dodecamer seen from above (looking down along the barrel).
Below: the gp1 dodecamer seen from below.
Below: the gp4 dodecamer (collar) as seen from above. The bottom of the gp1 dodecamer (the
leg domains coloured green in these models) fits into the top of this ring.
In our model, note that the tips of the barrel are splayed outwards. This could be an artifact of
modeling, or is it real? Olia
et al. (2011) depicted the barrel as a straight tube along its entire
length and suggested that it makes up for the short tails of podoviruses by acting to smoothly
accelerate the DNA during ejection (rather as a rifle barrel accelerates a bullet along it as the
bullet is under sustained pressure). (essentially this would function as a DNA gun). Clearly,
the ejection proteins and gp1 can not occupy the exact same space. The problem is that
cryo-EM makes it hard to distinguish proteins from DNA, especially if the proteins are
surrounded by DNA. If we allow the barrel to funnel outwards, however, then the ejection
proteins could still occupy a central protein core above the gp1 funnel, perhaps something
like this:
Wu et al., 2016, were aware of these interpretation problems and so they carried out an
experiment to generate 'bubblegrams'. If a cryo-frozen sample is under the electron
microscope beam for long enough, then the electrons damage the proteins, apparently
knocking off hydrogen atoms which form bubbles of hydrogen gas. DNA, however, is largely
unaffected and proteins wrapped in DNA bubble quicker since the DNA helps trap in the
hydrogen gas. By measuring how long it takes for a bubble of gas to form when precisely
irradiating the virion core, the location of the internal ejection proteins can be determined to
quite a degree of accuracy. Wu
et al. (2016) also tested mutant P22 lacking one or all of the
ejection proteins. They concluded that the gp1 barrel does indeed form a funnel-like structure
at its end with a core of ejection proteins above it, similar to our third model. Thus, the barrel
is shorter than Olia
et al. (2011) suggested and our DNA gun is more like a pistol than a rifle.
perhaps the funnel helps guide DNA and proteins into the 'infection conduit, the hollow
channel which carries them through the gp1 portal and out of the virion and into the host cell
during infection.
P22 Model version 3
P22 Model 3 labeled
P22 podovirus
Capsid Architecture

The part of the virion forming a protective shell enclosing the genetic material is the capsid.
This is made up of protein subunits called
capsomeres. The exact arrangement of the
capsomeres varies considerably. Viral capsids have variable geometry, but many approximate
an icosahedron which may be angular or expanded so as to approximate a sphere, depending
on virus type. A regular icosahedron consists of 20 equilateral triangular faces and 12
vertices. In icosahedral viruses these subunits typically exist in one of two states: pentamers of
five polypeptide subunits occur at the 12 vertices (sometimes one vertex is modified as a portal
vertex through which genetic material is inserted during packaging when the virion is
assembled). Hexamers of 6 polypeptide subunits occur on the faces and edges of the capsid.
The individual proteins or polypeptides making up the capsomeres (hexamers and pentamers)
are sometimes called protomers. The model below illustrates a T = 16 capsid.
T16 capsid face
T16 capsid vertex
Above: a T = 16 icosahedral capsid centered on a hexamer (left) and a pentamer vertex
(right). It is possible to move from one vertex to an adjacent vertex by moving 4 capsomeres in
a straight line (4 x 4 = 16, hence the triangulation number, T, is 16). An example of such a
virus is herpesvirus
(accept that the herpesvirus capsid also has skew, see below). This model
is simplified since it ignores the interactions between capsomeres.

The assembly of viral capsids is a remarkable process. I
n some cases the same protomer will
fit into pentamers and hexamers and a single sufficiently flexible protein subunit is often all that
is needed to assemble the viral capsid. Engineers adopt similar solutions when constructing
geodesic domes which have a similar architecture: many copies of a single structural subunit
can assemble the dome, which is also very strong because of its use of triangles. Other
capsids are, however, more complex and some require temporary scaffolding proteins for their
assembly. Remarkably, some of these complex structures will assemble spontaneously due to
the large entropy increase when ordered water molecules surrounding isolated proteins in
solution become displaced as the subunits 'snap' together, increasing the disorder (and hence
the entropy) sufficiently for the process to be spontaneous. Some viruses, however, require an
extra energy source such as ATP for capsid assembly.

It is possible to model or calculate the Gibbs free energy change for capsid assembly in some
cases. The equations are shown below (e.g. see Katen and Zlotnik, 2009):
For example, hepatitis B virus (HBV) has a (T = 4) capsid composed of 120 subunits, each of
which is a dimer (making 240 protomers in total) giving N = 120. The above analysis was carried
out on HBV by Ceres
et al. (2004). Since they are dimers they have a two-fold symmetry axis,
giving j = 2. Each dimer makes contact with 4 neighbouring dimers, so C = 4 and CN/2 = 240 (the
factor of 1/2 accounts for the fact that each subunit accounts for half a contact). Almost all
synthesised capsomeres end up incorporated into a capsid so the final concentration of dimer is
very low. Using sensible values for this allows the association constant for capsid assembly to be
obtained and the change in Gibbs free energy to be calculated.

An unusually high degree of accuracy is required for this calculation (standard spreadsheet
packages will fail as well conventional computational methods due to underflow/overflow) and an
approximation method can be used, however, the Wolfram language used to be able to carry out
the calculations rapidly and gave the expected answers (though I have no guarantee of its
accuracy the answers were in the right ball-park
and the trends given were sensible) but recent
changes to the Wolfram language means that it will no longer carry out these calculations at all
at least not by default (there may be settings that can be adjusted somewhere but I have not
found them). Java can process large numbers using its BigDecimal class, however, this class
does not incorporate functions to raise a BigDecimal to a power or to take the natural logarithm
of such a number. Cornell University's BigDecimalMaths class contains such code, and the
power calculation can be carried out but it still lacks the precision needed to compute the
logarithm! (Maybe the method of computation can be tweaked to make it more accurate?).
, the calculation in Wolfram gave a result of around -10 kJ/mol. This is appreciably less than
zero and the capsid is predicted to self-assemble, driven by entropy.

Each vertex has pentagonal symmetry: its is surrounded by 5 triangular faces. Each face is
made up of protein subunits called capsomeres (each capsomere may consist of one or more
protein subunits). Each triangular face is made up of one or more basic traingular units, each
such basic triangle consists of 3 capsomeres. In the simplest case, each facet consists of a
single basic triangle. In adenovirus, for example, each facet consists of 25 basic triangles. A
capsomere with 5-fold symmetry, called a pentamer, sits at each vertex, whilst capsomeres with
6-sided symmetry (hexameres) sit along the edges and make up the face itself. With 3
pentameres plus 18 hexameres in each face (21 capsomeres in total) we can fit 25 basic
triangles in each. The
triangulation number (T) is the number of such basic triangles which
can fit into one face of the icosahedron. For the simplest capsids, T = 1, for adenovirus T = 25.
Gibbs free energy cange for virus capsid assembly
T4 end-cap triangulation T = 13laevo
Adenovirus T = 25 facet
Triangulation number
Above: one facet of adenovirus is made up of  21 capsomeres which sit at the vertices of 25
basic (imaginary) triangles giving T = 25. Note that since some capsomeres occupy the edges
and vertices of the icosahedron, the total number of capsomeres is not simply 20T, but works out
to be 10T + 2 or 252 in this case. (Alternatively, we can take
n as the number of capsomeres
along one edge (6) and use the formula given above with

Not all viruses share this theme, since some have a
skewed capsid geometry. In general the
triangulation number is obtained on a triangular 0or hexagonal) grid with two axes, k and h. We
then place a capsomere in each hex (or at the apex of each triangle, see below) and count how
many capsomeres we need to move along h and then k to move from one vertex (pentamere) to
another vertex (pentamere). We then apply the formula: T = h^2 + hk + k^2 to find T. This is
illustrated for some viruses below:
When the capsid is 'skewed' the hexameres are no longer arranged with their midline along the
edges. An example of this is the T4 bacteriophage head. The T4 head is an elongated (prolate)
angular icosahedron with rounded edges and is about 86 nm wide and 119.5 nm long. There are
5 equilateral faces forming each 'end-cap' and 10 elongated faces forming the mid-section (20
faces in total) and one vertex contains the phage neck which attaches to the tail rather than a
usual pentamere. If we look at one of the end faces we see that it corresponds to T = 13l, where l
means laevo' or left-handed. This is because in going from one vertex to the other, we take 3
paces along h (h = 3) and then one pace along k to the left (some viral capsids can be
right-handed or 'dextro' (d). This gives a triangulatin number T = 3^2 + (3 x 1) + 1^2 = 9 + 3 + 1 =
13. This is illustrated below:
In T4, each capsomere is made up of viral proteins: the protein gp23 (gp = gene product) has a
piece cleaved off to form the active gp23*, 6 subunits of which make up the bulk of the hexamere
(shown in cyan), whilst gp24 is similarly modified to form gp24*, 5 copies of which make up each
pentamere (shown in red). The proteins gp23 and gp24 are cleaved during head maturation by a
viral protease. The viral protein Soc stabilises the capsid and forms hexagons around the gp23*
(shown as green dashed line). One copy of the protein Hoc occurs in the centre of each
hexamere (shown in yellow). In total there are 960 copies of gp23* (forming 160 hexameres), 55
copies of gp24* (5 per vertex with the 12 vertex occupied instead by a portal protein complex),
840 copies of Soc and 160 copies of Hoc.

The elongated facets of the mid-section of the T4 head have T = 20. The rule for deriving this
number is different than that for an equilateral facet and is illustrated below:
T4: triangulation of capsid mid-section