A detailed look at the T4 bacteriophage
The T4 bacteriophage is a complex and highly evolved virus which infects the bacterium Escherichia coli.
The T4 phage is quite large for a virus: the head (capsid) is 119.5 nm tall and 86 nm in diameter, whilst
the tail is 100 nm long and 21 nm in diameter. The virion is composed of over 2000 protein subunits,
which are the products of over 50 different genes.

The T4 Genome

The T4 genome is dsDNA which is wound tightly within the head (capsid). The genome consists of 168
903 bp and is thought to code for 289 proteins, though not all have yet been characterised, in addition to
coding for 8 tRNAs and at least 2 small RNA molecules of unknown function. The DNA is specially
modified HMC-DNA, meaning that (16%) of the cysteine bases are chemically modified into glucosylated
hydroxymethyl cytosine (HMC). This makes the DNA resistant  to endonucleases (nucleic acid degrading
enzymes), such as host endonucleases which digest foreign DNA or the T4 endonucleases which digest
host DNA.

The glucose molecules added to the cysteine residues also increases the stability of the DNA since the
-OH and -H groups of the glucose can hydrogen-bond to neighbouring bases. This may be especially
important as the genome is low in G+C base pairs at 34.5% G+C (which stabilise DNA since they are
joined by triple hydrogen-bonds whereas T-A bases are joined only by double hydrogen-bonds). In some
regions both strands are transcribed and both may be translated into proteins. The T4 dsDNA
approximates D-form DNA (poly(dA-dT)) which is overwound with only 8 bp per turn and a wider and
shallower major groove and a deeper and narrower minor groove. This form of DNA is possibly
transcribed and replicated faster as it may unzip more easily.

Viruses economise on genetic material. In order to find a new host cell, viruses have to produce many
progeny which means minimising the quantity of protein used in capsid construction which reduces the
space available for packaging the genome. T4 has several mechanisms to make the most of its DNA.
These are explained below.

1. Nested Genes

Some genes may actually encode several proteins (e.g. genes 16, 17 and 49) by having multiple START
codons. At least five T4 genes have multiple STARTs. Such nested genes may have diverse functions.
For example, in phage lambda one open-reading frame (ORF) has two nested genes whose proteins
differ by only two amino acids. One of these proteins forms pores in the host cell membrane to induce
cell lysis when the progeny are ready to escape. The other delays formation of the pore and so has an
opposite or antagonistic function. together these two proteins help regulate the time of lysis.

2. Framshifting

Nested genes may have different reading frames, meaning that a
frame-shift can produce an entirely
new coded protein. Sometimes the different frames may be read in the same direction (e.g. 30.3') or in
opposite directions (e.g. repEA, repEB involved in initiation of replication at OriE). Some viruses also
utilise programmed frameshifts in which the nucleotide seqience codes for a frameshift of 1 bp forwards
or backwards, creating a new code from the same base sequence (recoding). This does not occur in T4

3. Introns

At least three T4 genes have introns: gene td (codes for thymidylate synthase), nrdB which codes for a
subunit of the aerobic ribonucleotide reductase and
nrdD which codes for anaerobic ribonucleotide
reductase. These introns are
group 1 self-splicing introns and are designated I-TevI (td), I-TevIII
nrdB) and I-TevIII (nrdD). The first two introns themselves code for protein, both code for an
endonuclease. These endonucleases recognise a 'homing site' and cleave it to form a DSB
(double-strand break) which can recombine with the intron allowing it to 'infect' any intron-free strains
that may be present in the same host cell. Thus these endonucleases are called

These introns may also have a function in regulating deoxyribonucleotide synthesis. In order to splice
themselves efficiently they require efficiently translating ribosomes in the upstream exon. If the host cell
enters stationary phase (meaning that nutrients are limiting and the cells not dividing - unfavourable
conditions for phage synthesis and for the progeny to infect new cells) then T4 replication pauses until
favourable conditions return. The lack of nutrients may slow translation as the ribosomes have to wait for
amino acids, which prevents intron splicing such that the host gene mRNA is not translated into
functional proteins. Since these proteins are used in DNA synthesis this halts DNA replication.

3. Translational Bypassing

This occurs, for example, in T4 gene 60. A 50 bp sequence in the coding region of the mRNA may be
skipped, resulting in an alternative protein product. This is a high-efficiency bypass site, meaning that it
is bypassed often. An additional low-efficiency bypass occurs at the junction of gene 56 with gene 69.

4. High Gene Density

The T4 genome has a high gene density, twice as high as that of Escherichia coli, which also has a
compact genome. Only 9kb (5.3% of the genome) are non-coding. In contrast about 2% of bacterial
genomes are non-coding and 98% of the human genome.

5. Proteins with Multiple functions

Additionally a single gene may code for a protein with multiple roles, e.g. the T4 RNA ligaseA (coded for
by gene 63,
rnlA) also catalyses tail fiber attachment to the baseplate during assembly.

Of all these genes only 62 (occupying about half of the genome) are essential for successful replication
under controlled laboratory conditions, however, the others probably serve to increase efficiency of
replication in variable and unpredictable natural conditions. These essential genes code for the
replisome, nucleotide-precursor complex, several transcriptional regulatory factors and most structural
and assembly proteins. Amber mutations in these genes (in which a coding codon is mutated into a
STOP codon causing premature termination of translation) prevents successful replication.

Non-essential genes are involved in nucleotide biosynthesis, recombination, DNA repair, nucleases
which degrade host DNA, proteins which prevent superinfection, proteins which inhibit lysis and progeny
escape when the phage/host ratio is high and proteins which inhibit host replication and transcription. In
some cases the missing functions may be provided by host enzymes or these proteins may increase the
efficiency of phage replication (increase the burst size) and are clearly advantageous if not critical.
Mutations in the primase (gene 61) and topisomerase (genes 39, 52 and 60) may be compensated for
by other mechanisms which prime DNA synthesis.

Finally a few genes appear to be identical copies of one-another: {58, 61 }, { 2, 64} and {4, 50, 65}.

Structural Proteins and Virion Assembly

More than 40% of the genome encodes 53 of the 54 proteins required for phage particle synthesis: 24
for head morphogenesis, 22 for tail morphogenesis and 7 for tail fibre synthesis (including one for tail
fibre attachment). Of these 54, 5 are catalysts and not actual structural components. The head, tail, tail
fibres and whiskers are synthesised by separate pathways, then the head and tail join before 6 tail fibers
are attached to form a complete virion.

Of the 24 proteins required for head synthesis, 16 are needed for prohead formation and further
maturation to form a mature head, of which 10 are absolutely essential and only one host-encoded
(GroEL); 5 are for DNA-packaging and 3 complete and stabilise the head.

Head Assembly

The T even phage head is an icosahedron with triangulation number (T) = 13, but which is elongated
along the fivefold axis of symmetry into a prolate icosahedron by the insertion of a near-equatorial band
of 20 capsomeres in the T2 phage. Prolate icosahedrons are defined by both T and Q. There are 3T
protein subunits (protein molecules or protomers) per face in an isometric icoshedron (i.e. an
icosahedron whose height = its width) and 60T in total. For T = 13 this would give 780 protomers. At
each vertex 5 protomers assemble into a pentameric capsomere, whilst the faces consist of structural
units of 6 protomers each, forming hexagonal capsomeres. This would give a total of (12 x 5) 60
protomers at the vertices (12 pentameric capsomeres) and 120 capsomeres of 6 protomers each. This
would give 132 capsomeres in total. However, in prolate icosahedra additional hexameric capsomeres
are inserted. For such geometries the number of capsomeres is given by 5(T + Q) + 2, where for T4 Q =
20, giving 5(13 + 20) + 2 =  167 capsomeres. However, one vertex is the portal vertex which attaches to
the tail so T4 has 166 capsomeres in its capsid. Eleven of the 12 vertices are formed of pentamers of
gp24 (5 x 11 = 55 pentamers of gp24). This leaves 155 hexameric capsomeres, made up of (155 x 6)
930 copies of gp23.

During head assembly, an
initiator complex forms first, which is a dodecamer (12-mer) of gp20 (gp =
glycoprotein) - that is a ring of 12 copies of gp20 which attaches to the inside of the inner membrane of
the host. A
scaffold of 576 copies of gp22 and 72 copies of gp21 in a complex attaches to the initiator
complex. The initiator complex forms one vertex of the head to be, called the portal vertex (as DNA will
enter the capsid through it). Pentamers of gp24 form the other 11 vertices of the prolate icosahedron.
The capsid structural proteins, gp23 and gp24 assemble around the scaffold. The scaffold is then
removed by the gp21 T4 prohead protease which also cleaves gp23 and gp24 to increase the space
inside the
prohead to accommodate more DNA. The prohead then detaches from the host cell
membrane to become an 'empty small particle' (ESP) which becomes an ISP (initiated small particle)
when DNA begins packing inside it and the prohead expands by 15% in linear dimension, resulting in a
doubling of internal volume to form an ILP (initiated large particle) packed with DNA and which later
develops into a mature head.

T4 proteins gp13, gp14 and 6 trimers of gp wac (whisker or fibritin protein which forms the whiskers) bind
the portal vertex to complete the head which then binds an assembled tail. The phage proteins soc and
hoc are non-essential but are added to the head after expansion and may serve to stabilise it, indeed its
is established that soc acts as a protein clamp to stabilise the head. It has been shown that soc stabilises
the capsid against highly alkaline pH, extreme temperatures and osmostic shock. The function of hoc is
less clear.

DNA Packaging

The replicated phage DNA is formed as a double-stranded concatemer (several copies of the genomes
joined along the same DNA molecule). A terminase complex ((gp16, gp17, gp17' and gp17'') binds to the
DNA and then binds to the portal vertex to form the
packasome. ATP hydrolysis provides the energy
needed by the packasome to package the DNA into the head and then the terminase cuts the DNA when
the head is full. It is thought that the dodecameric gp20 neck rotates as the DNA is packaged. The head
is pacakaged with one complete copy of the DNA plus about 3%. The genome is circularly permuted so
the precise location of the cut does not matter as long as a complete genome is packaged. The
terminase complex apparently inserts the DNA into the phage head, translocates the DNA and then cuts
the DNA when packaging is complete.

Sometimes errors in assembly occur. Defective virions may be produced with unusually long heads or
unusually short heads. Those with short heads will lack the entire genome and so cannot infect and
complete replication alone, but may do so in a host which is superinfected (infected by more than one
phage) with one of the phages having the missing genes. Those with isometric capsids can only hold
about 70% of the genome, whilst some giant mutants can hold up to 12 copies of the genome!

Tail Assembly

The tail consists of two concentric tubes - the contractile outer tail-sheath tube and the inner tube.
The outer tail sheath consists of 144 copies of gp18 arranged into 24 hexameric rings with each ring
rotated 17 degrees to the one beneath it (in a right-handed helix). This sheath is 98.4 nm long when
non-contracted but contracts to 36 nm (with the twist or angle between adjacent rings increasing to 32
degrees) when the needle is deployed.

baseplate consists of a central hub (possibly utilising genes 5, 27, 29, 26, 28 and 51) to which 6
wedges bind. Hub assembly is initiated by gp29. Each wedge is formed when gp10 and
gp11 bind one-another, followed by the sequential binding of: gp7, gp8, gp6, gp53 and gp25. once the
wedges bind the hub, gp9, gp12, gp48 and gp54 also bind (6 trimers of gp9, gp10 and gp11, 2 trimers of
gp3 and single trimers of gp5 and gp27; 12 non-trimeric copies of gp8 bind). The baseplate is formed in
association with the inner surface of the host cell inner membrane with a 30 nm fibre (apparently part of
gp7) attaching each of the 6 corners to the membrane. When host cell lysis occurs then the baseplates
detach from the host membrane.

short tail fibres, trimers of gp12 with gp11 at the tip, are incorporated into the baseplate, these are
responsible for secondary and irreversible binding to the target cell surface when initiating infection. A
lysozyme, consisting of a gp5-gp27 complex is apparently inserted in the baseplate as a
(hetero)hexameric structure attached to the needle end of the inner tube. Protein gp5 is cleaved during
assembly with the new C-terminus covering the active site until penetration when the C-terminus moves
out of the way to expose the active site.
Above: when at least three of the long tail fibres bind to sugars on the LPS (lipoplysaccharide)
of the target cell outer membrane, the baseplate undergoes a conformational change -
opening out into a star configuration which deploys the six short tail fibres (otherwise hidden
in the baseplate) which form a secondary and more stable adhesion with the target cell
surface. Simultaneously, the tail sheath contracts (below) as a wave of contraction travels
along its length (presumably from bottom to top) and the inner tail-tube with its 'needle-like' tip
penetrates the target cell envelope, with the help of lysozyme in its tip, injecting the DNA into
the host. This mechanism presumably uses stored elastic or strain energy in the various
proteins. DNA entry requires a membrane potential (possibly uses the proton gradient?).
Evidence also suggests that the whiskers bend back out of the way of the contracting tail
sheath (not shown here).
Chaperone-like proteins (gp51, gp38 and gp57A) are needed for complete tail assembly. Chaperones are
proteins which help other proteins to fold correctly. Hub assembly is assisted by gp51, short and long tail-
fiber assembly by gp57A.

Long Tail-Fiber assembly

The socket of each long tail-fiber consists of gp9 which provides a flexible socket when the tail fibers are
down in the expanded position ready for adhesion to a suitable host. The proximal segment of each long
tail-fiber consists of gp34, the distal segment of gp37 whilst gp35 and gp36 attach the distal fiber to the
proximal fiber. The gp37 tip has a
hypervariable region which differs between different t even phages
and confers host specificity (recognising specific sugar residues in the LPS of a specific host bacteriual
species or strain).


The 6 whiskers are made of the protein wac (fibritin) and are thought to bind to the knee of the tail fiber
during assembly to facilitate attachment of the long tail-fiber to the baseplate. The whiskers also appear
to bind the knee when the phage is in the
retracted configuration. When environmental conditions are
not suitable the long tail-fibers retract, being drawn up towards the tail and head. The whiskers act as an
environmental sensor and allow the long tail-fibers to drop down into the
expanded configuration when
conditions are suitable. The flexible socket protein gp9 is presumably also involved in these changes of

Evoiding Superinfection

Superinfection is the infection of a host by more than one parasite of a given species. This reduces the
resources available to each parasite and although superinfection does sometimes occur parasites often
try to avoid it. The T4
imm (immunity) gene is expressed about 4 minutes post-infection and if DNA from
another T4 or a related phage attempts to gain entry to the same host then the product of
imm causes
the newly injected DNA to remain in the periplasm where it is degraded by nucleases.

Viral DNA Replication

The T4 encodes all the components of its own replisome (unusual amongst viruses). It encodes its own
DNA polymerase (gene 43), sliding clamp loader (gene 44 and 62), sliding clamp (gene 45), helicase  
loading protein (gene 59 at least
in vitro), helicase (gene 41), primase for lagging strand synthesis (gene
61) and single-stranded DNA binding protein (gene 32). T4 proteins RNase H (gene
rnh) and DNA ligase
(gene 30) seal Okazaki fragments on the lagging strand (though host enzymes can substitute for these).
Otherwise replication is very similar to that of DNA in bacteria such as
Escherichia coli. The number of
replisomes is limited and only one of several origins of replication (Oris, sing. Ori) is used. Although the
onset of DNA replication depends on Oris, most T4 replication forks are initiated at more-or-less random
positions along the genome using intermediates of  
recombination as DNA primers.

Viral Gene Transcription

transcription of viral genes occurs in three main phases: early, middle and late. Early transcription
occurs almost immediately after infection and involves at least 39 promoters (Pe, early promoters). These
promoters are stronger than host promoters with which they compete for the host's sigma-70 transcription
factor (a sigma factor is a detachable component of DNA-dependent RNA polymerase, RNA-P, which
detaches soon after RNA synthesis begins and which determines the specificity of the polymerase for
different promoters). There are about 650 of the sigma-70 and 2000 host core RNA-P which the virus
utilises as well as the host. The T4 protein gpAlt enters the host along with the T4 DNA and this is a mono-
ATP-ribosyltransferase which ADP-ribosylates one alpha subunit of RNA-P to make transcription of early
T4 genes more favourable. Early gene products include the proteins ModA and ModB which are also ADP-
ribosyltransferases which also ADP-ribosylate both alpha subunits of host RNA-P to reduce host gene
transcription (they replace a positive charge with a negative one). Thus early transcription focuses on
taking control of the host cell to divert host resources to T4 synthesis. Many early proteins are lethal to
the host.

Middle transcription occurs a few minutes later into infection and involves 30 T4 promoters (Pm, middle
promoters) and these depend on T4 MotA (a transcriptional activator protein, not to be confused with the
E. coli protein of that name) and the protein AsiA which modifies RNA-P for middle gene transcription,
reducing transcription of both host and early T4 genes. Thus, a switch occurs to the transcription of the
middle set of genes.

Late transcription focuses on virion synthesis: head, tail and fiber synthesis and the production of
structural proteins, assembly proteins and recombination genes (see below). It involves about 50 late T4
promoters (Pl, late promoters) which are activated by gp33 and sigma-55, a t4 sigma-factor.

Host Cell Lysis

When about 100-150 progeny T4 phages are assembled, host cell lysis is triggered - the host cell is
ruptured to release the new T4 particles so that they may be carried by advection and diffusion to new
target cells. (We say that the
burst size is 100-150 for T4). Lysis involves a T4 lysozyme (coded for by
gpe gene) and a T4 holin (coded for by the gpt gene). The T4 holin forms a pore  in the inner
membrane to allow the lysozyme to reach the peptidoglycan cell wall and degrade it. If more pahges
attack the cell more than five minutes after infection then lysis is delayed, since this indicates a lack of
new hosts for the progeny. This involves the rI protein which regulates T4 holin assembly and t4 gprIII
extends lysis inhibition further.

Suggested Reading

Miller, E.S., Kutter, E., Mosig, G., Arisaka, F., Kunisawa, T., and W. Ruger, 2003. Bacteriophag T4
Microbiology and Molecular Biology Reviews 67: 86–156.

Mesyanzhinov, V.V., Leiman, P.G., Kostyuchenko, V.A., Kurochkina, L.P., Miroshnikov, K.A., Sykilinda, N.
N., and M.M. Shneider, 2004. Molecular Architecture of Bacteriophage T4.
Biochemistry (Moscow) 69:

Baumann, R.G. and L.W. Black, 2003. Isolation and Characterization of T4 Bacteriophage gp17
Terminase, a Large Subunit Multimer with Enhanced ATPase Activity.
J. Biol. Chem. 278: 4618-4627.

Fokine, A., Zhang, Z., Kanamaru, S., Bowman, V.D., Aksyuk, A.A., Arisaka, F., Rao, V.B. and M.G.
Rossmann, 2013. The Molecular Architecture of the Bacteriophage T4 Neck.
J. Mol. Biol. 425: 1731-1744.

Iwasaki, K., Trus, B.L., Wingfield, P.T., Cheng, N., Campusano, G., Rao, V.B.,and A.C. Steven, 2000.
Molecular Architecture of Bacteriophage T4 Capsid: Vertex Structure and Bimodal Binding of the
Stabilizing Accessory Protein, Soc.
Virology 271: 321-333.

Article last updated: 31/1/2015  
T4 bacteriophage, Pov-Ray model
T4 bacteriophage bound, Pov-Ray modle
T4 bacteriophage contracted, Pov-Ray model
T4 bacteriophage retracted, Pov-Ray model
Above: T4 in its retracted configuration. When environmental
conditions are unfavourable, the adhesive tail fibers are
retracted - the 'knee' pulls up to the whisker. In this 'dormant'
phase T4 can neither bind to a target cell nor initiate infection.
t4 bacteriophage more detailed model
Above: a more detailed model of T4, showing the structure of the tail fibers.