Proteins are a principal ingredient in living things – they serve both biochemical functions (e.g. enzymes are
biological catalysts) and biomechanical functions – e.g. actin and myosin enable muscles to contract. Proteins are
of the order of 10 nanometres in diameter, but in some cases individual protein subunits can assemble into much
larger structures that may be millimetres in length and micrometres in diameter. However, the detailed machinery
of proteins, which typically involves moving parts (such as hinges, springs, rotors etc.) are only several
nanometres in size – thus proteins are nanomachines and form the main working parts of cells. If proteins were
much larger then they would probably resemble tough plastic in consistency and some proteins may have steel-
like strength.

Proteins are large complex molecules, but if broken down by hydrolysis with hot hydrochloric acid they yield the
same basic chemicals – about 20 different types of smaller chemical units called amino acids (strictly these are a-
amino acids or 2-aminocarboxylic acids) – molecules containing both an amine group (-NH2) and a carboxylic acid
group (-COO-).

Amino acids have the general formula: RCH(NH2)COOH, where R is a hydrogen atom or an organic group – each
of the 20 amino acids has a different organic R group. This R-group is called the side-chain. Biological amino
acids are
alpha-amino acids, which means that both the amine and the carboxylic acid groups are joined to the
same carbon atom (the alpha atom).
Amino acids are white crystalline solids that dissolve in water (and only dissolve sparingly in organic solvents). It is
the differing chemical properties of the 20 or so amino acids that gives the protein its chemical and mechanical
Above: a generalised amino acid – amine group shown in orange, carboxylic acid group in blue,
alpha carbon in red and R-group in green.
Above left: glycine (aminoethanoic acid) is the simplest alpha-amino-acid with R = H.
Above right: the next simplest amino acid is
alanine (2-aminopropanoic acid) with R = CH3.

Amino acids can bond to one another by forming an amide bond (or peptide bond) between the amine group of
one amino acid and the carboxylic acid group of another. This is a
condensation reaction as water is removed in
the process:
The  -C(=O)-NH- group between the two amino acids is called a peptide link and the product of two joined amino acids
is called a
dipeptide. Many amino acids may be joined together in this way to form a polypeptide.

The simplest proteins are single polypeptide chains of at least 40 amino acids in length (shorter chains are simply
referred to as polypeptides rather than proteins) but polypeptide chains may be as long as several hundred amino
acids joined together (amino acid residues). Many proteins consist of several polypeptides bonded together.

Primary structure

The primary structure of a protein is simply the sequence of amino acids in the polypeptide chain, such as: -Glu-Gly-
Phe-Met-Met-Glu- … . Each amino acid has a three letter short-hand code and a single-letter code for writing out
these long sequences:

  • Glycine: Gly, G
  • Alanine: Ala, A
  • Valine: Val, V
  • Leucine: Leu, L
  • Isoleucine: Iso, I
  • Serine: Ser, S
  • Threonine: Thr, T
  • Aspartic acid: Asp, D
  • Glutamic acid: Glu, E
  • Glutamine: Gln, Q
  • Lysine: Lys, K
  • Arginine: Arg, R
  • Cysteine: Cys (or CySH), C
  •    (Cystine: CySSCy is a cysteine dimer – two Cys joined by a –S-S- disulphide bridge)
  • Methionine: Met, M
  • Phenylalanine: Phe, F
  • Tyrosine: Tyr, Y
  • Tryptophan: Try, W
  • Histidine: His, H
  • Proline: Pro, P
  • Hydroxyproline: Hypro
  • Asparagine: Asn, N

This list has 21 amino acids (including hydroxyproline, which is a modification of proline). Rarely, in certain proteins in
certain organisms, two other amino acids may occur in proteins: selenocysteine (like cysteine but with selenium
replacing the sulphur) and pyrrolysine (in methanogenic archaeobacteria). These 20 to 23 amino acids that occur in
proteins are the standard amino acids, but each of these may be chemically modified into other amino acid types that
are NOT components of proteins, but serve other functions.
The primary structure, or amino acid sequence is critical in determining how the polypeptide will fold, since the various
side-chains of the amino acids will form different types of bonds with one another.

Secondary Structure

The secondary structure of proteins arises when amino acid side-chains bind to one another (by forming hydrogen-
bonds primarily but also van der Waal’s bonds) in a local region of the polypeptide chain, forming a local structure
called a motif. Polypeptides fold to give one or more motifs. Motif types include alpha-helices, beta-barrels, beta-
pleated sheets, beta-propellers and beta-hairpins.

Alpha-helices: the repeating amino acid sequence: -Met-Ala-Leu-Glu-Lys- (MALEK) folds to form a right-handed helix,
called an alpha-helix. A polypeptide may contain no, one or more than one alpha helices. There are 3.6 amino acids
per turn of the helix and the helix is 0.54 nanometres wide. Proline (confers rigidity) and glycine amino acids disrupt
alpha-helixes and may form straight linkers between adjacent motifs.

Beta-pleated sheets are, as the name indicates, pleated sheets (like corrugated iron) 5-10 amino acids long.

Beta-barrels are formed when beta-sheets fold around to form a hollow cylinder. One function of beta-barrels is in
forming aquaporins – protein channels that span cell membranes and allow water to cross the membrane by diffusing
through the hollow channel formed in the middle of a group of parallel beta-barrels.
Tertiary Structure

The tertiary structure of a protein is determined by longer-range interactions between amino-acid side-chains (such as
hydrogen-bonds, salt-bridges, hydrophobic bonds forming water-avoiding cores, electrostatic forces between charged
amino acids and water molecules and by additional chemical changes which occur after the amino-acid sequence has
been determined and
disulphide sulphur bridges). The tertiary structure forms protein domains (sometimes called folds,
though I prefer not to use this general term for such a precise meaning).

Domains are three-dimensional protein structures which may contain several motifs, for example, the zinc finger domain
contains an alpha-helix and two antiparallel beta-strands and can bind to the major groove of DNA, allowing a protein to
grab hold of a DNA molecule.
Quaternary Structure

Quaternary structure occurs when more than one polypeptide chain with tertiary structure (called a protein subunit)
comes together to form a more complex protein. Not all proteins have quaternary structure, since some proteins consist
of single polypeptide chains. These subunits are bonded together primarily by salt bridges, hydrogen bond, hydrophobic
bonds and the strong disulphide bridges (-S-S-) which form between two sulphur containing cysteine amino acids
(methionine is the other amino acid that also contains sulphur but cannot form disulphide bonds).

Some examples of proteins with quaternary structure are given in the introductory picture at the top of the page.

A brief introduction to organic chemistry.



Lipids (fats and oils)

Nucleic acids

Back to BioTech...
Protein types