Protein is an important nutrient that builds muscles and bones and provides energy. Protein can help with weight control because it helps you feel full and satisfied from your meals.

The healthiest proteins are the leanest. This means that they have the least fat and calories. The best protein choices are fish or shellfish, skinless chicken or turkey, low-fat or fat-free dairy (skim milk, low-fat cheese), and egg whites or egg substitute. The best red meats are the leanest cuts (loin and tenderloin). Other healthy options are beans, legumes (lentils and peanut butter), and soy foods such as tofu or soymilk.

Protein is an important part of every diet and is found in many different foods. Lean protein, the best kind, can be found in fish, skinless chicken and turkey, pork tenderloin and certain cuts of beef, like the top round. Low-fat dairy products like milk, yogurt, ricotta and other cheeses supply both protein and calcium.

Protein is crucial for tissue repair, building and preserving muscle, and making important enzymes and hormones.

Lean meats and dairy contribute valuable minerals like calcium, iron, selenium and zinc. These are not only essential for building bones, and forming and maintaining nerve function, but also for fighting cancer, forming blood cells and keeping immune systems robust.

The word protein was first coined in 1838 to emphasize the importance of this class of molecules. The word is derived from the Greek word proteios which means "of the first rank".

This chapter will provide a brief background into the structure of proteins and how this structure can determine the function and activity of proteins. It is not intended to substitute for the more detailed information provided in a biochemistry or cell biology course.

Proteins are the major components of living organisms and perform a wide range of essential functions in cells. While DNA is the information molecule, it is proteins that do the work of all cells - microbial, plant, animal. Proteins regulate metabolic activity, catalyze biochemical reactions and maintain structural integrity of cells and organisms. Proteins can be classified in a variety of ways, including their biological function (Table 2.1).

How does one group of molecules perform such a diverse set of functions? The answer is found in the wide variety of possible structures for proteins.

In the English language, there are an enormous number of words with varied meaning that can be formed using only 26 letters as building blocks. A similar situation exists for proteins where an incredible variety of proteins can be formed using 20 different building blocks called amino acids. Each of these amino acid building blocks has a different chemical structure and different properties.

Each protein has a unique amino acid sequence that is genetically determined by the order of nucleotide bases in the DNA, the genetic code. Since each protein has different numbers and kinds of the twenty available amino acids, each protein has a unique chemical composition and structure. For example, two proteins may each have 37 amino acids but if the sequence of the amino acids is different, then the protein will be different. How many different proteins can be formed from the twenty different amino acids? Consider a protein containing 100 different amino acids linked into one chain. Since each of the 100 positions of this chain could be filled with any one of the 20 amino acids, there are 20100 possible combinations, more than enough to account for the 90-100 million different proteins that may be found in higher organisms.

A change in just one amino acid can change the structure and function of a protein. For example, sickle cell anemia is a disease that results from an altered structure of the protein hemoglobin, resulting from a change of the sixth amino acid from glutamic acid to valine. (This is the result of a single base pair change at the DNA level.) This single amino acid change is enough to change the conformation of hemoglobin so that this protein clumps at lower oxygen concentrations and causes the characteristic sickle shaped red blood cells of the disease.

The unique structure and chemical composition of each protein is important for its function; it is also important for separating proteins in a protein purification strategy. Each of these differences in properties can be used as a basis for the separation methods that are used to purify proteins. Because these differences in protein properties originate from differences in the chemical structure of the amino acids that make up the protein, we need to explore the structure of amino acids and their contribution to protein properties in more detail.

Amino acids are composed of carbon, hydrogen, oxygen, and nitrogen. Two amino acids, cysteine and methionine, also contain sulfur. The generic form of an amino acid is shown in Figure 2.1. Atoms of these elements are arranged into 20 kinds of amino acids that are commonly found in proteins. All proteins in all species, from bacteria to humans, are constructed from the same set of twenty amino acids. All amino acids have an amino group (NH2) and a carboxyl group (COOH) bonded to the same carbon atom, known as the alpha carbon. Amino acids differ in the side chain or R group that is bonded to the alpha carbon. (Figure 2.2) Glycine, the simplest amino acid has a single hydrogen atom as its R group - Alanine has a methyl (-CH3) group.

The chemical composition of the unique R groups is responsible for the important characteristics of amino acids such as chemical reactivity, ionic charge and relative hydrophobicity. In Figure 2.2, the amino acids are grouped according to their polarity and charge. They are divided into four categories, those with polar uncharged R groups, those with apolar (nonpolar) R groups, acidic (charged) and basic (charged) groups.

The polar amino acids are soluble in water because their R groups can form hydrogen bonds with water. For example, serine, threonine and tyrosine all have hydroxyl groups (OH). Amino acids that carry a net negative charge at neutral pH contain a second carboxyl group. These are the acidic amino acids, aspartic acid and glutamic acid, also called aspartate and glutamate, respectively. The basic amino acids have R groups with a net positive charge at pH 7.0. These include lysine, arginine and histidine. There are eight amino acids with nonpolar R groups. As a group, these amino acids are less soluble in water than the polar amino acids. If a protein has a greater percentage of nonpolar R groups, the protein will be more hydrophobic (water hating) in character.

A protein is formed by amino acid subunits linked together in a chain. The bond between two amino acids is formed by the removal of a H20 molecule from two different amino acids, forming a dipeptide. (Figure 2.3) The bond between two amino acids is called a peptide bond and the chain of amino acids is called a peptide (20 amino acids or smaller) or a polypeptide.

Each protein consists of one or more unique polypeptide chains. Most proteins do not remain as linear sequences of amino acids; rather, the polypeptide chain undergoes a folding process. The process of protein folding is driven by thermodynamic considerations. This means that each protein folds into a configuration that is the most stable for its particular chemical structure and its particular environment. The final shape will vary but the majority of proteins assume a globular configuration. Many proteins such as myoglobin consist of a single polypeptide chain; others contain two or more chains. For example, hemoglobin is made up of two chains of one type (amino acid sequence) and two of another type.

Although the primary amino acid sequence determines how the protein folds, this process is not completely understood. Although certain amino acid sequences can be identified as more likely to form a particular conformation, it is still not possible to completely predict how a protein will fold based on its amino acid sequence alone, and this is an active area of biochemical research.

The final folded 3-D arrangement of the protein is referred to as its conformation. In order to maintain their function, proteins must maintain this conformation. To describe this complex conformation, scientists describe four levels of organization: primary, secondary, tertiary, and quaternary (Figure 2.4). The overall conformation of a protein is the combination of its primary, secondary, tertiary and quaternary elements.

Four levels of Organization of Protein Structure:

Primary Structure refers to the linear sequence of amino acids that make up the polypeptide chain. This sequence is determined by the genetic code, the sequence of nucleotide bases in the DNA. The bond between two amino acids is a peptide bond. This bond is formed by the removal of a H20 molecule from two different amino acids, forming a dipeptide. The sequence of amino acids determines the positioning of the different R groups relative to each other. This positioning therefore determines the way that the protein folds and the final structure of the molecule.

The secondary structure of protein molecules refers to the formation of a regular pattern of twists or kinks of the polypeptide chain. The regularity is due to hydrogen bonds forming between the atoms of the amino acid backbone of the polypeptide chain. The two most common types of secondary structure are called the alpha helix and ß pleated sheet. (Figure 2.4)

Tertiary structure refers to the three dimensional globular structure formed by bending and twisting of the polypeptide chain. This process often means that the linear sequence of amino acids is folded into a compact globular structure. The folding of the polypeptide chain is stabilized by multiple weak, noncovalent interactions. These interactions include:

o Hydrogen bonds that form when a Hydrogen atom is shared by two other atoms.

o Electrostatic interactions that occur between charged amino acid side chains. Electrostatic interactions are attractions between positive and negative sites on macromolecules.

o Hydrophobic interactions: During folding of the polypeptide chain, amino acids with a polar (water soluble) side chain are often found on the surface of the molecule while amino acids with non polar (water insoluble) side chain are buried in the interior. This means that the folded protein is soluble in water or aqueous solutions.


Covalent bonds may also contribute to tertiary structure. The amino acid, cysteine, has an SH group as part of its R group and therefore, the disulfide bond (S-S ) can form with an adjacent cysteine. For example, insulin has two polypeptide chains that are joined by two disulfide bonds.

Quaternary structure refers to the fact that some proteins contain more than one polypeptide chain, adding an additional level of structural organization: the association of the polypeptide chains. Each polypeptide chain in the protein is called a subunit. The subunits can be the same polypeptide chain or different ones. For example, the enzyme ß-galactosidase is a tetramer, meaning that it is composed of four subunits, and, in this case, the subunits are identical - each polypeptide chain has the same sequence of amino acids. Hemoglobin, the oxygen carrying protein in the blood, is also a tetramer but it is composed of two polypeptide chains of one type (141 amino acids) and two of a different type (146 amino acids). In chemical shorthand, this is referred to as a2ß2 . For some proteins, quaternary structure is required for full activity (function) of the protein.

Some proteins combine with other kinds of molecules such as carbohydrates, lipids, iron and other metals, or nucleic acids, to form glycoproteins, lipoproteins, hemoproteins, metalloproteins, and nucleoproteins respectively. The presence of these other biomolecules affects the protein properties. For example, a protein that is conjugated to carbohydrate, called a glycoprotein, would be more hydrophilic in character while a protein conjugated to a lipid would be more hydrophobic in character.

Proteins are typically characterized by their size (molecular weight) and shape, amino acid composition and sequence, isolelectric point (pI), hydrophobicity, and biological affinity. Differences in these properties can be used as the basis for separation methods in a purification strategy (Chapter 4). The chemical composition of the unique R groups is responsible for the important characteristics of amino acids, chemical reactivity, ionic charge and relative hydrophobicity. Therefore protein properties relate back to number and type of amino acids that make up the protein.

Size of proteins is usually measured in molecular weight (mass) although occasionally the length or diameter of a protein is given in Angstroms. The molecular weight of a protein is the mass of one mole of protein, usually measured in units called daltons. One dalton is the atomic mass of one proton or neutron. The molecular weight can be estimated by a number of different methods including electrophoresis, gel filtration, and more recently by mass spectrometry. The molecular weight of proteins varies over a wide range. For example, insulin is 5,700 daltons while snail hemocyanin is 6,700,000 daltons. The average molecular weight of a protein is between 40,000 to 50,000 daltons. Molecular weights are commonly reported in kilodaltons or (kD), a unit of mass equal to 1000 daltons. Most proteins have a mass between 10 and 100 kD. A small protein consists of about 50 amino acids while larger proteins may contain 3,000 amino acids or more. One of the larger amino acid chains is myosin, found in muscles, which has 1,750 amino acids.

Separation methods that are based on size and shape include gel filtration chromatography (size exclusion chromatography) and polyacrylamide gel electrophoresis.

The amino acid composition is the percentage of the constituent amino acids in a particular protein while the sequence is the order in which the amino acids are arranged.

Each protein has an amino group at one end and a carboxyl group at the other end as well as numerous amino acid side chains, some of which are charged. Therefore each protein carries a net charge. The net protein charge is strongly influenced by the pH of the solution. To explain this phenomenon, consider the hypothetical protein in Figure 2.5. At pH 6.8, this protein has an equal number of positive and negative charges and so there is no net charge on the protein. As the pH drops, more H+ ions are available in the solution. These hydrogen ions bind to negative sites on the amino acids. Therefore, as the pH drops, the protein as a whole becomes positively charged. Conversely, at a basic pH, the protein becomes negatively charged. pH 6.8 is called the pI, or isoelectric point, for this protein; that is, the pH at which there are an equal number of positive and negative charges. Different proteins have different numbers of each of the amino acid side chains and therefore have different isoelectric points. So, in a buffer solution at a particular pH, some proteins will be positively charged, some proteins will be negatively charged and some will have no charge.

Separation techniques that are based on charge include ion exchange chromatography, isoelectric focusing and chromatofocusing.


Literally, hydrophobic means fear of water. In aqueous solutions, proteins tend to fold so that areas of the protein with hydrophobic regions are located in internal surfaces next to each other and away from the polar water molecules of the solution. Polar groups on the amino acid are called hydrophilic (water loving) because they will form hydrogen bonds with water molecules. The number, type and distribution of nonpolar amino acid residues within the protein determines its hydrophobic character. (Chart of hydrophobicity or hydropathy)

A separation method that is based on the hydrophobic character of proteins is hydrophobic interaction chromatography.


As the name implies, solubility is the amount of a solute that can be dissolved in a solvent. The 3-D structure of a protein affects its solubility properties. Cytoplasmic proteins have mostly hydrophilic (polar) amino acids on their surface and are therefore water soluble, with more hydrophobic groups located on the interior of the protein, sheltered from the aqueous environment. In contrast, proteins that reside in the lipid environment of the cell membrane have mostly hydrophobic amino acids (non polar) on their exterior surface and are not readily soluble in aqueous solutions.

Each protein has a distinct and characteristic solubility in a defined environment and any changes to those conditions (buffer or solvent type, pH, ionic strength, temperature, etc.) can cause proteins to lose the property of solubility and precipitate out of solution. The environment can be manipulated to bring about a separation of proteins- for example, the ionic strength of the solution can be increased or decreased, which will change the solubility of some proteins.

Biological Affinity (Function):

Proteins often interact with other molecules in vivo in a specific way- in other words, they have a biological affinity for that molecule. These molecular counterparts, termed ligands, can be used as bait to fish out the target protein that you want to purify. For example, one such molecular pair is insulin and the insulin receptor. If you want to purify (or catch) the insulin receptor, you could couple many insulin molecules to a solid support and then run an extract (containing the receptor) over that column. The receptor would be caught by the insulin bait. These specific interactions are often exploited in protein purification procedures. Affinity chromatography is a very common method for purifying recombinant proteins (proteins produced by genetic engineering). Several histidine residues can be engineered at the end of a polypeptide chain. Since repeated histidines have an affinity for metals, a column of the metal can be used as bait to catch the recombinant protein.

Although DNA can be isolated and amplified from thousand year old mummies, most proteins are more fragile biomolecules. Therefore, laboratory reagents and storage solutions must provide suitable conditions so that the normal structure and function of the protein is maintained. To understand how the structure of proteins is protected in laboratory solutions, it is necessary to understand how that structure can be destroyed.

Proteins can denature, or unfold so that their three dimensional structure is altered but their primary structure remains intact.(Figure 2.7) Many of the interactions that stabilize the 3-D conformation of the protein are relatively weak and are sensitive to various environmental factors including high temperature, low or high pH and high ionic strength. Protein vary greatly in the degree of their sensitivity to these factors. Sometimes proteins can be renatured but often the denaturation is irreversible.

Proteins can also be broken apart by enzymes, called proteases, that digest the covalent peptide bonds between amino acids that are responsible for the primary structure. This process is called proteolysis and is irreversible. Cells contain proteases that are found in lysosomes, membrane bound organelles inside the cell. When cells are disrupted, lysosomes break and release these proteases, which can damage the other proteins in the cell. In the laboratory, it is therefore necessary to minimize the activities of cellular proteases to protect proteins from proteolysis. Methods used to minimize proteolysis include working at lower temperatures (4C), and adding chemicals that inhibit protease activity.

Sulfur groups on cysteines may undergo oxidation to form disulfide bonds that are not normally present. Extra disulfide bonds can form when proteins are removed from their normal environment. Reducing agents such as dithiothreitol or ß-mercaptoethanol are often added to prevent undesirable disulfiate bond formation.

Proteins readily adsorb (stick to) surfaces, thereby reducing their available activity. To prevent significant loss, do not store dilute solutions of proteins for prolonged periods of time. Always dilute them right before use.

The composition of the extraction buffer is important for maintaining structure and function of the target protein. To prevent denaturation, the buffering pH is based on the pH stability range of the protein. Other components such as ionic strength, divalent cations (Ca++ and Mg++), or reducing agents (dithiothreitol or ß-mercaptoethanol) may be needed to maintain activity. In making the extract, cells are lysed and proteases (enzymes that degrade proteins) are released from their intracellular compartments. To prevent proteases from digesting the target protein, two strategies are commonly followed: 1) The extract is kept cold. The activity of proteolytic enzymes is greatly reduced by cold temperatures. For this reason, the protein purification process is often conducted in cold rooms. At the very least, an effort is made to keep the extract at 4?C. 2) Protease inhibitors are sometimes added to the mixture to prevent degradation by proteases. The drawback to this strategy is that the inhibitors must eventually be removed, along with other contaminant proteins.

Peptides and polypeptides

Glycine and alanine can combine together with the elimination of a molecule of water to produce a dipeptide. It is possible for this to happen in one of two different ways - so you might get two different dipeptides.

In each case, the linkage shown in blue in the structure of the dipeptide is known as a peptide link. In chemistry, this would also be known as an amide link, but since we are now in the realms of biochemistry and biology, we'll use their terms.

If you joined three amino acids together, you would get a tripeptide. If you joined lots and lots together (as in a protein chain), you get a polypeptide.

A protein chain will have somewhere in the range of 50 to 2000 amino acid residues. You have to use this term because strictly speaking a peptide chain isn't made up of amino acids. When the amino acids combine together, a water molecule is lost. The peptide chain is made up from what is left after the water is lost - in other words, is made up of amino acid residues.

By convention, when you are drawing peptide chains, the -NH2 group which hasn't been converted into a peptide link is written at the left-hand end. The unchanged -COOH group is written at the right-hand end.

The end of the peptide chain with the -NH2 group is known as the N-terminal, and the end with the -COOH group is the C-terminal.

A protein chain (with the N-terminal on the left) will therefore look like this:

The "R" groups come from the 20 amino acids which occur in proteins. The peptide chain is known as the backbone, and the "R" groups are known as side chains.

Note: In the case where the "R" group comes from the amino acid proline, the pattern is broken. In this case, the hydrogen on the nitrogen nearest the "R" group is missing, and the "R" group loops around and is attached to that nitrogen as well as to the carbon atom in the chain.

I mention this for the sake of completeness - not because you would be expected to know about it in chemistry at this introductory level.

Now there's a problem! The term "primary structure" is used in two different ways. At its simplest, the term is used to describe the order of the amino acids joined together to make the protein. In other words, if you replaced the "R" groups in the last diagram by real groups you would have the primary structure of a particular protein.

This primary structure is usually shown using abbreviations for the amino acid residues. These abbreviations commonly consist of three letters or one letter.

Using three letter abbreviations, a bit of a protein chain might be represented by, for example:

If you look carefully, you will spot the abbreviations for glycine (Gly) and alanine (Ala) amongst the others.

If you followed the protein chain all the way to its left-hand end, you would find an amino acid residue with an unattached -NH2 group. The N-terminal is always written on the left of a diagram for a protein's primary structure - whether you draw it in full or use these abbreviations.

The wider definition of primary structure includes all the features of a protein which are a result of covalent bonds. Obviously, all the peptide links are made of covalent bonds, so that isn't a problem.

But there is an additional feature in proteins which is also covalently bound. It involves the amino acid cysteine.

If two cysteine side chains end up next to each other because of folding in the peptide chain, they can react to form a sulphur bridge. This is another covalent link and so some people count it as a part of the primary structure of the protein.

Because of the way sulphur bridges affect the way the protein folds, other people count this as a part of the tertiary structure (see below). This is obviously a potential source of confusion!

Within the long protein chains there are regions in which the chains are organised into regular structures known as alpha-helices (alpha-helixes) and beta-pleated sheets. These are the secondary structures in proteins.

These secondary structures are held together by hydrogen bonds. These form as shown in the diagram between one of the lone pairs on an oxygen atom and the hydrogen attached to a nitrogen atom:

Important: If you aren't happy about hydrogen bonding and are unsure about what this diagram means, follow this link before you go on. What follows is difficult enough to visualise anyway without having to worry about what hydrogen bonds are as well!

You must also find out exactly how much detail you need to know about this next bit. It may well be that all you need is to have heard of an alpha-helix and know that it is held together by hydrogen bonds between the C=O and N-H groups. Once again, you need to check your syllabus and past papers - particularly mark schemes for the past papers.


Denaturation occurs because the bonding interactions responsible for the secondary structure (hydrogen bonds to amides) and tertiary structure are disrupted. In tertiary structure there are four types of bonding interactions between "side chains" including: hydrogen bonding, salt bridges, disulfide bonds, and non-polar hydrophobic interactions. which may be disrupted. Therefore, a variety of reagents and conditions can cause denaturation. The most common observation in the denaturation process is the precipitation or coagulation of the protein.