GENE STRUCTURE OF PROKARYOTES AND EUKARYOTES. STRUCTURAL AND REGULATORY GENES; GENES OF T-RNA, R-RNA. ORGANIZATION OF INFORMATION FLOW IN THE CELL. REGULATION OF GENE EXPRESSION.
MOLECULAR MECHANISMS OF HUMAN VARIATION
I. Structure of genes of prokaryotes and eukaryotes. Structural genes, regulatory genes, t-RNA, r-RNA. Genetic code and its characteristics.
1. Gene as a unit of genetic function. Structure of genes of prokaryotes and eukaryotes
2. Classification of genes.
3. General principles of organization of genes (operon, promoter, operator, terminator, structural genes, regulator gene).
4. Genetic code and its characteristics.
DNA is the genetic material for all cellular organisms and most viruses, although some viruses use RNA.
THE CHEMICAL NATURE OF NUCLEIC ACIDS
DNA was first discovered only 4 years after the publication of Mendel’s work. In 1869 a German chemist, Friedrich Miescher, extracted a white substance from the cell nuclei of human pus and from fish sperm nuclei. The proportion of nitrogen and phosphorus was very different from any other known constituent of cells, which convinced Miescher he had discovered a new biological substance. He called this substance “nuclein,” since it seemed to be specifically associated with the cell nucleus.
Because Miescher’s nuclein was slightly acidic, it came to be called nucleic acid. For 50 years little work was done on it by biologists because nothing was known of its function in cells and there seemed little to recommend it to investigators. In the 1920s the basic chemistry of nucleic acids was determined by the biochemist P. A. Levine. There were two sorts of nucleic acid: ribonucleic acid or RNA, which had a hydroxyl group attached to a particular carbon atom, and deoxyribonucleic acid or DNA, which did not. Levine found that DNA contained three basic elements
1. Phosphate (PO4) groups
2. Five-carbon sugars
3. Four nitrogen-containing bases – adenine and guanine (double-ring compounds called purines) and thymine and cytosine (single-ring compounds called pyrimidines); RNA contained the pyrimidine uracil in place of thymine.
From the roughly equal proportions of the three elements, Levine concluded correctly that DNA and RNA molecules are composed of units of these three elements, strung one after another in a long chain. Each element, a five-carbon sugar to which is attached a phosphate group and a nitrogen-containing base, is called a nucleotide. The identity of the base is the only difference that distinguishes one nucleotide from another.
To identify the various chemical groups in DNA and RNA, it is customary to number the carbon atoms of the organic base and the ribose sugar and then to refer to any chemical group attached to that carbon by that number. In the ribose sugar, four of the carbon atoms together with an oxygen atom form a five-membered ring. We number the different carbon atoms as 1′ to 5′, starting just to the right of the oxygen atom (the “prime” symbol [‘] indicates that the number refers to a carbon of the sugar, as opposed to the organic base). The phosphate is attached to the 5′ carbon atom of the ribose sugar, and the organic base to the 1′ carbon atom. There is in addition a free hydroxyl (OH) group attached to each 3’ carbon.
The presence of the 5′-phosphate and the 3′-hydroxyl groups is what allows DNA and RNA to form a long chain of nucleotides: these two groups can chemically react with one another. The chemical reaction between the phosphate group of one unit and the OH group of another causes the elimination of a water molecule and the formation of a covalent bond linking the two groups. The linkage is called a phosphodiester bond because the phosphate group is now linked to the two sugars by means of two ester (—O—) bonds. Further linking can occur in the same way, since the two-unit polymer resulting from the condensation reaction we have just described still has a free 5′-phosphate group at one end and a free 3′-hydroxyl group at the other. In this way, many thousands of nucleotides can be linked together in long chains.
A single strand of DNA or RNA is a long chain of nucleotide subunits joined together like cars in a train.
Any linear strand of DNA or RNA, no matter how long, will always have a free 5′-phosphate group at one end and a free 3′ OH group at the other. This difference between the two ends of the nucleic acid molecule allows us to refer unambiguously to a particular end of the molecule; we can talk about the 5′ end or the 3′ end with no possibility of confusion. It is important to recognize that a DNA or RNA molecule has this intrinsic directionality. By convention, the sequence of DNA bases is usually expressed in a 5′ to 3′ direction. Thus the base sequence “GTCCAT” refers to the sequence 5′ pGpTpCpCpApT-OH 3′
with phosphodiester bonds indicated by “p.” Note that this is NOT the same molecule as that represented by the reverse sequence, 5′ pTpApCpCpTpG-OH 3′.
Levine’s early studies indicated that all four types of DNA nucleotides were present in roughly equal amounts. This result, which later proved an error, led to the mistaken belief that DNA was a simple repeating polymer in which the four nucleotides occurred together in simple repeating units (for instance… GCAT … GCAT … GCAT. … GCAT). In the absence of sequence variation in such a repeating chain, it was difficult to see how DNA might contain the hereditary informatioecessary to specify even the most simple of organisms. For this reason, Avery’s experiments on transforming principle, although crystal clear, were not readily accepted at first. It seemed more plausible that DNA was no more than a structural element of the chromosomes, with proteins playing the central genetic role.
The key advance came after World War II, when Levine’s chemical analysis of DNA was repeated using more accurate techniques than had previously been available. Quite a different result was obtained. The four nucleotide bases were NOT present in equal proportions in DNA molecules after all. A careful study carried out by Erwin Chargaff showed that differences did exist in DNA nucleotide composition. Chargaff found that the nucleotide composition of DNA molecules varied in complex ways, depending on the source of the DNA. This strongly suggested that DNA was not a simple repeating polymer and might after all have the information-encoding properties required of genetic material. Despite DNA’s complexity, however, Char-gaff observed an important underlying regularity: the amount of adenine present in DNA molecules was always equal to the amount of thymine, and the amount of guanine was always equal to the amount of cytosine. Chargaff s results are commonly referred to as Chargaff’s rules:
1. The proportion of A always equals that of T, and G is similarly equal to C:
A = T, G = C
2. From the above rule, it follows that there is always an equal proportion of purines (A and G) and pyrimidines (C and T).
Chargaff pointed out that in all natural DNA molecules, the amount of A = T and the amount of G = C.
THE THREE-DIMENSIONAL STRUCTURE OF DNA
As it became clear that the DNA molecule was the molecule in which the hereditary information was stored, investigators began to puzzle over how such a seemingly simple molecule could carry out such a complex function. The significance of the regularities pointed out by Chargaff, although not immediately obvious, soon became clear. A British chemist, Rosalind Franklin, had carried out X-ray crystallographic analysis of fibers of DNA. In this process the DNA molecule is bombarded with an X-ray beam. When individual rays encounter atoms, their path is bent or diffracted; the pattern created by the total of all these diffractions can be captured on a piece of photographic film. Such a pattern resembles the ripples created by tossing a rock into a smooth lake. By carefully analyzing the diffraction pattern, it is possible to develop a three-dimensional image of the molecule.
Franklin’s studies were severely handicapped by the fact that it had not proven possible to obtain true crystals of natural DNA, so she had to work with DNA in the form of fibers. While the DNA molecules in a fiber are all aligned with one another, they do not form the perfectly regular crystalline array required to take full advantage of X-ray diffraction. Franklin worked in the laboratory of British biochemist Maurice Wilkins, who was able to prepare more uniformly oriented DNA fibers than had been possible previously; using these fibers, Franklin was able, by working carefully, to obtain crude diffraction information oatural DNA. The diffraction patterns that Franklin obtained suggested that the DNA molecule was a helical coil, a springlike spiral. It was also possible from her photographs of the diffraction pattern to determine some of the basic structural parameters of the molecule; the pattern indicated that the helix had a diameter of about 2 nanometers and made a complete spiral turn every 3.4 nanometers.
Learning informally of Franklin’s results before they were published in 1953, James Watson and Francis Crick, two young investigators at Cambridge University, quickly worked out a likely structure of the DNA molecule (Figure 14-13), which we now know to be substantially correct. They analyzed the problem deductively: first they built models of the nucleotides, and then they tested how these could be assembled into a molecule that fit what they knew from Chargaff s and Franklin’s work about the structure of DNA. They tried various possibilities, first assembling molecules with three strands of nucleotides wound around one another to stabilize the helical shape. None of these early efforts proved satisfactory. They finally hit on the idea that the molecule might be a simple double helix, one in which the bases of two strands pointed inward toward one another. By always pairing a purine, which is large, with a pyrimidine, which is small, the diameter of the duplex stays the same, 2 nanometers. Because hydrogen bonds can form between the two strands, the helical form is stabilized.
It immediately became apparent why Chargaff had obtained the results that he had – because the purine adenine (A) will not form proper hydrogen bonds in this structure with cytosine (C) but will with thymine (T), every A is paired to a T. Similarly the purine guanine (G) will not form proper hydrogen bonds with thymine but will with cytosine, so that every G is paired with C.
The Watson and Crick structure of DNA is a “double helix,” a spiral staircase composed of two polynucleotide chains hydrogen-bonded to each other, wrapped around a central axis.
The Watson-Crick model immediately suggested that the basis for copying the genetic information is complementarity. One chain of the DNA molecule may have any conceivable base sequence, but this sequence completely determines that of its partner in the duplex. If the sequence of one chain is ATTGCAT, the sequence of its partner in the duplex must be TAACGTA. Each chain in the duplex is a complementary mirror image of the other. To copy the DNA molecule, one need only “unzip” it and construct a new complementary chain along each naked single strand.
The form of DNA replication suggested by the Watson-Crick model is called semiconservative because after one round of replication, the original duplex is not conserved; instead, each strand of the duplex becomes part of another duplex. This prediction of the Watson-Crick model was tested in 1958 by Matthew Meselson and Frank Stahl of the California Institute of Technology. These two scientists grew bacteria for several generations in a medium containing the heavy isotope of nitrogen 15N, so the DNA of their bacteria was eventually denser thaormal. They then transferred the growing cells to a new medium containing the normal lighter isotope 14N and harvested the DNA at various intervals.
At first the DNA that the bacteria manufactured was all heavy. But as the new DNA that was being formed incorporated the lighter nitrogen isotope, DNA density fell. After one round of DNA replication was complete; the density of the bacterial DNA had decreased to a value intermediate between all-light isotope and all-heavy isotope DNA. After another round of replication, two density classes were observed, one intermediate and the other light, corresponding to DNA that included none of tin-heavy isotope. These results indicate that after one round of replication, each daughter DNA duplex possessed one of the labeled heavy strands of the parent molecule. When this hybrid duplex replicated, it contributed one heavy strand to form another hybrid duplex and one light strand to form a light duplex. Meselson and Stahl’s experiment thus clearly confirmed the prediction of the Watson-Crick model that DNA replicates by complementarity, in a semiconservative manner.
The basis for the great accuracy of DNA replication is complementarity. A DNA molecule is a duplex, containing two strands that are complementary mirror images of each other, so either one can be used as a template to reconstruct the other.
HOW DNA REPLICATES
DNA was proven as the hereditary material and Watson et al. had deciphered its structure. What remained was to determine how DNA copied its information and how that was expressed in the phenotype. Matthew Meselson and Franklin W. Stahl designed an experiment to determine the method of DNA replication. Three models of replication were considered likely.
1. Conservative replication would somehow produce an entirely new DNA strand during replication (Figure 27).
2. Semiconservative replication would produce two DNA molecules, each of which was composed of one-half of the parental DNA along with an entirely new complementary strand. In other words the new DNA would consist of one new and one old strand of DNA. The existing strands would serve as complementary templates for the new strand.
3. Dispersive replication involved the breaking of the parental strands during replication, and somehow, a reassembly of molecules that were a mix of old and new fragments on each strand of DNA.
The Meselson-Stahl experiment involved the growth of E. coli bacteria on a growth medium containing heavy nitrogen (Nitrogen-15 as opposed to the more common, but lighter molecular weight isotope, Nitrogen-14). The first generation of bacteria was grown on a medium where the sole source of N was Nitrogen-15. The bacteria were then transferred to a medium with light (Nitrogen-14) medium. Watson and Crick had predicted that DNA replication was semi-conservative. If it was, then the DNA produced by bacteria grown on light medium would be intermediate between heavy and light. It was.
DNA replication involves a great many building blocks, enzymes and a great deal of ATP energy (remember that after the S phase of the cell cycle cells have a G phase to regenerate energy for cell division). Only occurring in a cell once per (cell) generation, DNA replication in humans occurs at a rate of 50 nucleotides per second, 500/second in prokaryotes. Nucleotides have to be assembled and available in the nucleus, along with energy to make bonds betweeucleotides. DNA polymerases unzip the helix by breaking the H-bonds between bases. Once the polymerases have opened the molecule, an area known as the replication bubble forms (always initiated at a certain set of nucleotides, the origin of replication). New nucleotides are placed in the fork and link to the corresponding parental nucleotide already there (A with T, C with G). Prokaryotes open a single replication bubble, while eukaryotes have multiple bubbles. The entire length of the DNA molecule is replicated as the bubbles meet.
Since the DNA strands are antiparallel, and replication proceeds in the 5′ to 3′ direction on EACH strand, one strand will form a continuous copy, while the other will form a series of short Okazaki fragments.
The complementary nature of the DNA duplex provides a ready means of duplicating the molecule. If one were to unzip the molecule, one would need only to assemble the appropriate complementary nucleotides on the exposed single strands in order to form two daughter duplexes of the same sequence. The density label experiments of Meselson and Stahl described above demonstrate that this is indeed what happens. When a DNA molecule replicates, the double-stranded DNA molecule separates at one end, forming a replication fork, and each separated strand serves as a template for the synthesis of a new complementary strand. Indeed, electron micrographs reveal Y-shaped DNA molecules at the point of replication.
To the surprise of those investigating the way in which DNA molecules replicate, it turned out that the two new daughter strands are synthesized on their templates in different ways. One strand is built by simply adding nucleotides to its growing end. This strand grows inward toward the junction of the Y as the duplex unzips. Because this strand ends with an – OH group attached to the third carbon of the ribose sugar (called the “3-prime,” or 3′ carbon), the strand is said to grow from its 3′ end. The enzyme that catalyzes this process is called a DNA polymerase.
However, when investigators searched for a corresponding enzyme that added nucleotides to the other strand (which ends with a – PO4 group attached to the fifth carbon of the ribose sugar and is called the 5′ end), they were unable to find one. Nor has anyone ever found one. DNA polymerases add only to the 3′ ends of DNA strands.
How does the polymerase build the 5′ strand? Along this strand, the chain is also formed in the 3′ direction, the polymerase jumping ahead and filling in backward. The DNA polymerase starts a burst of synthesis at the point of the replication fork and moves outward, adding nucleotides to the 3′ end of a short new chain until this new segment fills in a gap of 1000 to 2000 nucleotides between the replication fork and the end of the growing chain to which the previous segment was added (the fragment is called an Okazaki fragment after the discoverer, who performed a “pulse-chase” experiment: a brief addition of radioactive nucleotides appears first in 1000 to 2000 nucleotide DNA fragments. Sampled a little later, such fragments no longer exhibit radioactivity; instead, the radioactive nucleotides are in the main DNA molecule. Okazaki concluded that the 5′ strand of DNA is built by first making short fragments and then stitching them to the growing end of the DNA chain with DNA-joining enzymes called ligases). The short new chain is then added to the growing chain, and the polymerase jumps ahead again to fill in another gap. In effect, it copies the template strand in segments about 1000 nucleotides long and stitches each new fragment to the end of the growing chain. This mode of replication is referred to as discontinuous synthesis. It one looks carefully at electron micrographs showing DNA replication in progress, one of the daughter strands behind the polymerase appears single-stranded for about 1000 nucleotides.
A DNA molecule copies (replicates) itself by separating its two strands and using each as a template to assemble a new complementary strand, thus forming two daughter duplexes. The two strands are assembled in different directions.
How Bacteria Replicate Their DNA
The genetic material of bacteria is organized as a single, circular molecule of DNA. Such a structure cannot be compared directly with the complex chromosomes that are characteristic of eukaryotes. The replication of the DNA in bacteria is a complex process involving many enzymes; it is not, however, difficult to visualize the overall process. The entire genome can be replicated simply by nicking the duplex at one site and displacing the strand on one or both sides of the nick, creating one or two replication forks. These replication forks then proceed around the circle, creating a new daughter duplex as they go. The mitochondria and chloroplasts of eukaryotic cells contain similar circular molecules of DNA, which replicate in the same way as they do in the bacteria from which they evolved.
How a Eukaryote Replicates its Chromosomes
The DNA within a eukaryotic chromosome is not circular, and, when examined under the electron microscope, it proves to have numerous replication forks spaced along the chromosome, rather than a single replication fork as prokaryotic chromosomes do. Each individual zone of the chromosome replicates as a discrete unit, called a replication unit. Replication units vary in length from 10,000 to 1 million base pairs; most are about 100,000 base pairs in length. They have been described for many different eukaryotes. Because each chromosome of a eukaryote possesses so much DNA, the orderly replication of DNA in eukaryotes undoubtedly requires sophisticated controls, which are as yet largely unknown.
The structure of pro- and eukaryotic gene. Structural, regulatory genes, tRNA, rRNA.
Gene Expression
Genes encode proteins and proteins dictate cell function. Therefore, the thousands of genes expressed in a particular cell determine what that cell can do. Moreover, each step in the flow of information from DNA to RNA to protein provides the cell with a potential control point for self-regulating its functions by adjusting the amount and type of proteins it manufactures.
At any given time, the amount of a particular protein in a cell reflects the balance between that protein’s synthetic and degradative biochemical pathways. On the synthetic side of this balance, recall that protein production starts at transcription (DNA to RNA) and continues with translation (RNA to protein). Thus, control of these processes plays a critical role in determining what proteins are present in a cell and in what amounts. In addition, the way in which a cell processes its RNA transcripts and newly made proteins also greatly influences protein levels.
How Is Gene Expression Regulated?
The amounts and types of mRNA molecules in a cell reflect the function of that cell. In fact, thousands of transcripts are produced every second in every cell. Given this statistic, it is not surprising that the primary control point for gene expression is usually at the very beginning of the protein production process — the initiation of transcription. RNA transcription makes an efficient control point because many proteins can be made from a single mRNA molecule.
Transcript processing provides an additional level of regulation for eukaryotes, and the presence of a nucleus makes this possible. In prokaryotes, translation of a transcript begins before the transcript is complete, due to the proximity of ribosomes to the new mRNA molecules. In eukaryotes, however, transcripts are modified in the nucleus before they are exported to the cytoplasm for translation.
Eukaryotic transcripts are also more complex than prokaryotic transcripts. For instance, the primary transcripts synthesized by RNA polymerase contain sequences that will not be part of the mature RNA. These intervening sequences are called introns, and they are removed before the mature mRNA leaves the nucleus. The remaining regions of the transcript, which include the protein-coding regions, are called exons, and they are spliced together to produce the mature mRNA. Eukaryotic transcripts are also modified at their ends, which affects their stability and translation.
Of course, there are many cases in which cells must respond quickly to changing environmental conditions. In these situations, the regulatory control point may come well after transcription. For example, early development in most animals relies on translational control because very little transcription occurs during the first few cell divisions after fertilization. Eggs therefore contain many maternally originated mRNA transcripts as a ready reserve for translation after fertilization.
On the degradative side of the balance, cells can rapidly adjust their protein levels through the enzymatic breakdown of RNA transcripts and existing protein molecules. Both of these actions result in decreased amounts of certain proteins. Often, this breakdown is linked to specific events in the cell. The eukaryotic cell cycle provides a good example of how protein breakdown is linked to cellular events. This cycle is divided into several phases, each of which is characterized by distinct cyclin proteins that act as key regulators for that phase. Before a cell can progress from one phase of the cell cycle to the next, it must degrade the cyclin that characterizes that particular phase of the cycle. Failure to degrade a cyclin stops the cycle from continuing.
An overview of the flow of information from DNA to protein in a eukaryote
First, both coding and noncoding regions of DNA are transcribed into mRNA. Some regions are removed (introns) during initial mRNA processing. The remaining exons are then spliced together, and the spliced mRNA molecule (red) is prepared for export out of the nucleus through addition of an endcap (sphere) and a polyA tail. Once in the cytoplasm, the mRNA can be used to construct a protein.
How Do Different Cells Express the Genes They Need?
Only a fraction of the genes in a cell are expressed at any one time. The variety of gene expression profiles characteristic of different cell types arise because these cells have distinct sets of transcription regulators. Some of these regulators work to increase transcription, whereas others prevent or suppress it.
Normally, transcription begins when an RNA polymerase binds to a so-called promoter sequence on the DNA molecule. This sequence is almost always located just upstream from the starting point for transcription (the 5′ end of the DNA), though it can be located downstream of the mRNA (3′ end). In recent years, researchers have discovered that other DNA sequences, known as enhancer sequences, also play an important part in transcription by providing binding sites for regulatory proteins that affect RNA polymerase activity. Binding of regulatory proteins to an enhancer sequence causes a shift in chromatin structure that either promotes or inhibits RNA polymerase and transcription factor binding. A more open chromatin structure is associated with active gene transcription. In contrast, a more compact chromatin structure is associated with transcriptional inactivity.
Some regulatory proteins affect the transcription of multiple genes. This occurs because multiple copies of the regulatory protein binding sites exist within the genome of a cell. Consequently, regulatory proteins can have different roles for different genes, and this is one mechanism by which cells can coordinate the regulation of many genes at once.
Modulation of transcription
An activator protein bound to DNA at an upstream enhancer sequence can attract proteins to the promoter region that activate RNA polymerase (green) and thus transcription. The DNA can loop around on itself to cause this interaction between an activator protein and other proteins that mediate the activity of RNA polymerase.
How Is Gene Expression Increased or Decreased in Response to Environmental Change?
In prokaryotes, regulatory proteins are often controlled by nutrient availability. This allows organisms such as bacteria to rapidly adjust their transcription patterns in response to environmental conditions. In addition, regulatory sites on prokaryotic DNA are typically located close to transcription promoter sites — and this plays an important part in gene expression.
Transcription repressioear the promoter region.
Molecules can interfere with RNA polymerase binding. An inactive repressor protein (blue) can become activated by another molecule (red circle). This active repressor can bind to a region near the promoter called an operator (yellow) and thus interfere with RNA polymerase binding to the promoter, effectively preventing transcription.
For an example of how this works, imagine a bacterium with a surplus of amino acids that signal the turning “on” of some genes and the turning “off” of others. In this particular example, cells might want to turn “on” genes for proteins that metabolize amino acids and turn “off” genes for proteins that synthesize amino acids. Some of these amino acids would bind to positive regulatory proteins called activators. Activator proteins bind to regulatory sites on DNA nearby to promoter regions that act as on/off switches. This binding facilitates RNA polymerase activity and transcription of nearby genes. At the same time, however, other amino acids would bind to negative regulatory proteins called repressors, which in turn bind to regulatory sites in the DNA that effectively block RNA polymerase binding.
The control of gene expression in eukaryotes is more complex than that in prokaryotes. In general, a greater number of regulatory proteins are involved, and regulatory binding sites may be located quite far from transcription promoter sites. Also, eukaryotic gene expression is usually regulated by a combination of several regulatory proteins acting together, which allows for greater flexibility in the control of gene expression.
The complexity of multiple regulators
Transcriptional regulators can each have a different role. Combinations of one, two, or three regulators (blue, green, and yellow shapes) can affect transcription in different ways by differentially affecting a mediator complex (orange), which is also composed of proteins. The effect is that the same gene can be transcribed in multiple ways, depending on the combination, presence, or absence of various transcriptional regulator proteins.
As previously mentioned, enhancer sequences are DNA sequences that are bound by an activator protein, and they can be located thousands of base pairs away from a promoter, either upstream or downstream from a gene. Activator protein binding is thought to cause DNA to loop out, bringing the activator protein into physical proximity with RNA polymerase and the other proteins in the complex that promote the initiation of transcription.
Different cell types express characteristic sets of transcriptional regulators. In fact, as multicellular organisms develop, different sets of cells within these organisms turn specific combinations of regulators on and off. Such developmental patterns are responsible for the variety of cell types present in the mature organism.
Transcriptional regulators can determine cell types
The wide variety of cell types in a single organism can depend on different transcription factor activity in each cell type. Different transcription factors can turn on at different times during successive generations of cells. As cells mature and go through different stages (arrows), transcription factors (colored balls) can act on gene expression and change the cell in different ways. This change affects the next generation of cells derived from that cell. In subsequent generations, it is the combination of different transcription factors that can ultimately determine cell type.
GENES: THE UNITS OF HEREDITARY INFORMATION
In 1902 a British physician, Archibald Garrod, who was working with one of the early Mendelian geneticists, his countryman William Bateson, noted that certain diseases among his patients were prevalent in particular families. Indeed, if one examined several generations within such families, some of these disorders seemed to behave as if they were controlled by simple recessive alleles. Garrod concluded that these disorders were Mendelian traits and that they had resulted from changes in the hereditary information that had occurred in the past to an ancestor of the affected families.
Garrod examined several of these disorders in detail. In one, alkaptonuria, the patients passed urine chat rapidly turned black on exposure to air. Such urine contained homogentisic acid (alkapton), which air oxidized. Iormal individuals homogentisic acid is broken down into simpler substances, but the affected patients were unable to carry out that breakdown. With considerable insight, Garrod concluded that the patients suffering from alkaptonuria lacked the enzyme necessary to catalyze this breakdown and, more generally, that many inherited disorders might reflect enzyme deficiencies.
From Garrod’s finding it is but a short leap of intuition to surmise that the information encoded within the DNA of chromosomes is used to specify particular enzymes. This point was not actually established, however, until 1941, when a series of experiments by the Stanford University geneticists George Beadle and Edward Tatum finally provided definitive evidence on this point. Beadle and Tatum deliberately set out to create Mendelian mutations in the chromosomes; they then studied the effects of these mutations on the organism.
One of die reasons that Beadle and Tatum’s experiments produced clear-cut results is that the researchers made an excellent choice of experimental organism. They chose the bread mold Neitrospora, a fungus that can readily be grown in the laboratory on a defined medium (a medium that contains only known substances such as glucose and sodium chloride, rather than some uncharacterized cell extract such as ground-up yeasts). Beadle and Tatum induced mutations by exposing Neitrospora spores to X-rays. They then allowed the progeny to grow on complete medium (a medium that contained all necessary metabolites). In this way the investigators were able to keep alive strains that, as a result of the earlier irradiation, had experienced damage to their DNA in a region encoding the ability to make one or more of the compounds that the fungus needed for normal growth. Change of this kind in the DNA is called mutation, and strains or organisms that have undergone such change (in this case losing the ability to use one or more compounds) are called mutants.
The next step was to test the progeny of the irradiated spores to see if any mutations leading to metabolic deficiency actually had been created by the X-ray treatment. Beadle and Tatum did this by attempting to grow subdivisions of individual fungal strains on minimal medium, which contained only sugar, ammonia, salts, a few vitamins, and water. A cell that had lost the ability to make a necessary metabolite would not grow on such a medium. Using this approach, Beadle and Tatum succeeded in identifying and isolating many deficient mutants.
To determine the nature of each deficiency, Beadle and Tatum tried adding various chemicals to the minimal medium in an attempt to find one that would make it possible for a given strain to grow. In this way they were able to pinpoint the nature of the biochemical problems that many of their mutants had. Many of the mutants proved unable to synthesize a particular vitamin or amino acid. The addition of arginine, for example, permitted the growth of a group of mutant strains, dubbed arg mutants. When the chromosomal position of each mutant arg gene was located, they were found to cluster in three areas.
For each enzyme in the arginine biosynthetic pathway, Beadle and Tatum were able to isolate a mutant strain with a defective form of that enzyme, and the mutation always proved to be located at one of a few specific chromosomal sites, a different site for each enzyme; that is, each of the mutants that Beadle and Tatum examined could be explained in terms of a defect in one (and only one) enzyme, which could be localized at a single site on one chromosome. The geneticists concluded that genes produce their effects by specifying the structure of enzymes. They called this relationship the one gene – one enzyme hypothesis.
Enzymes are responsible for catalyzing the synthesis of all the parts of the cell. They mediate the assembly of nucleic acids, the synthesis of proteins, carbohydrates, fats and lipids.
As we know, cancer is one example of what can happen when a gene is altered. Other deliberate changes induced by genetic engineers have proved very beneficial; most of the insulin used today by diabetics, for example, is the product of a human gene introduced into bacterial cells. Our understanding of genes as the units of heredity, the foundations of which you have encountered in this chapter, represents one of the high-water marks of biology as a science. The intellectual path that biologists have followed in their pursuit of this understanding has not always been a straight one, the best questions not always obvious. But however erratic and lurching the experimental journey, our picture of heredity has become progressively clearer, the image more sharply defined.
Every cell in your body contains more information than is stored in this book. This information is the hereditary information, the instructions that specify that you will have arms and not fins, hair and not feathers, two eyes and not one. The color of your eyes, the texture of your fingernails, whether you dream in color – all of the many traits that you receive from your parents are recorded in every cell of your body. As we have seen, biologists learned by experiment that long DNA molecules, which in eukaryotes complex with proteins to form chromosomes, contain this information. The information itself is arrayed in little blocks like entries in a dictionary, each block a gene specifying a particular polypeptide. Some of these polypeptides are entire proteins, while many other proteins are formed of two or more gene products. Proteins are the tools of heredity. Many of them are enzymes that carry out reactions within cells: what you are is a result of what they do. The essence of heredity is the ability of a cell to use information in its DNA to bring about the production of particular polypeptides, and so affect what that cell will be like. In this chapter we will examine how this happens.
THE ARCHITECTURE OF A GENE
Now that we have surveyed in general terms how specific polypeptides are assembled by ribosomes from mRNA copies of genes, and how the production of these mRNA copies is regulated, we will examine in more detail a specific example and trace how the structure of a particular set of genes achieves the precise and timed production of the proteins it encodes. The set of genes we will examine is the lac system (Figure 29) a cluster of genes encoding three proteins that bacteria use to obtain energy from the sugar lactose. These proteins include two enzymes and a membrane-bound transport protein (a permease). Researchers have found this cluster to be typical of how genes are organized in bacteria. Within the cluster there are five different regions:
1. Coding sequence. Three coding sequences specify the three lactose-utilizing enzymes. All three sequences are transcribed onto the same piece of mRNA and constitute part of an operational unit called an operon. An operon consists of one or more structural genes and the associated regulatory elements, the operator and the promoter, which are discussed below. This particular operon is called the lac operon, because the three genes that it includes are all involved in lactose utilization. Such a pattern of clustering of coding sequences onto single transcription units is common among bacteria but not in eukaryotes.
2. Ribosome recognition site. Upstream from the three coding sequences is the binding site for the bacterial ribosome. This series of nucleotides is within an initial untranslated portion of the mRNA sometimes called a leader region. Each mRNA molecule transcribed from the cluster is composed of the leader region and the three coding sequences, transcribed in order. The cluster and the leader region are often referred to as a transcription unit.
3. RNA polymerase binding site. Upstream from the transcription unit is a specific DNA nucleotide sequence that the polymerase recognizes and to which it binds. Such polymerase-recognition sites are called promoters because they promote transcription.
4. Regulatory protein binding site. Between the promoter site and the transcription unit is a regulatory site, the operator, where a repressor protein binds to block transcription.
5. CAP site. Upstream from the promoter is another regulatory site, CAP, where an activator protein binds. This in turn facilitates the unwinding of the DNA duplex and so enables the polymerase to bind to the nearby promoter.
Genes encoding enzymes possess regulatory regions. The segment that is transcribed into mRNA is called a transcription unit and consists of the elements that are involved in the translation of the mRNA: the ribosome-binding site and the coding sequences. In front of the transcription unit on the DNA are the elements involved in regulating its transcription: binding sites for the polymerase and for regulatory proteins.
CELLS USE RNA TO MAKE PROTEIN
To find out how a cell uses its DNA to direct the production of particular proteins, perhaps the simplest question you might ask is “Where in the cell are proteins made?” You can answer this question by placing cells for a short time in a medium containing radioactive amino acids; the cells will take up the radioactively labeled amino acids for the short time that they are exposed to them. When investigators looked to see where in the cells radioactive proteins first appeared, they found that proteins were assembled not in the nucleus, where the DNA is, but rather in the cytoplasm, on large protein aggregates called ribosomes. These little polypeptide-making factories proved to be very complex, containing over 50 different proteins. They also contain a very different sort of molecule, RNA. RNA is very similar to DNA, and its presence in ribosomes hints that RNA molecules play an important role in polypeptide synthesis.
A cell contains many kinds of RNA. There are three major classes:
Ribosomal RNA. The class of RNA that is found in ribosomes, together with characteristic proteins, is called ribosomal RNA, or rRNA. During polypeptide synthesis, rRNA molecules provide the site on the ribosome where the polypeptide is assembled.
Transfer RNA. A second class of RNA, called transfer RNA, or tRNA, is much smaller. Human cells contain more than 40 different kinds of tRNA molecules, which float free in the cytoplasm. During polypeptide synthesis, tRNA molecules transport the amino acids to the ribosome for use in building the polypeptide, and position each amino acid at the correct place on the elongating polypeptide chain.
Messenger RNA. A third class of RNA is messenger RNA, or mRNA. Each mRNA molecule is a long single strand of RNA that passes from the nucleus to the cytoplasm. During polypeptide synthesis, mRNA molecules bring information from the chromosomes to the ribosomes to direct which polypeptide is assembled.
These molecules, together with ribosomal proteins and certain enzymes, constitute a system that carries out the task of reading the genetic message and producing the polypeptide that the particular message specifies. They are the principal components of the apparatus that a cell uses to translate its hereditary information. You can think of this information as a message written in the code specified by the sequence of nucleotides in the DNA. The cell’s polypeptide-producing apparatus reads this message one gene after another, translating the genetic code of each gene into a particular polypeptide. As we will see, biologists have also learned to read this code, and in so doing have learned a great deal about what genes are and how they work in dictating what a protein will be like and when it will be made.
THE GENETIC CODE
Working out the genetic code was a great step forward in removing the mystery from the process of gene expression. However, it leaves a great question unanswered. How is the information stored in a sequence of nucleotides like UUU used to identify a specific amino acid such as phenylalanine? How is the genetic code deciphered? To answer this question, we must look more carefully at the first events of the translation process. Translation occurs on the ribosomes. First, the initial portion of the mRNA transcribed from a gene binds to an rRNA molecule interwoven in the ribosome. The mRNA lies on the ribosome in such a way that only the three-nucleotide portion of the mRNA molecule – the codon – is exposed at the polypeptide-making site. As each bit of the mRNA message is exposed in turn, a molecule of tRNA with the complementary three-nucleotide sequence, or anticodon, binds to the mRNA (Figure 30). Because this tRNA molecule carries a particular ammo acid, that amino acid and no other is added to the polypeptide in that position. Protein synthesis occurs as a series of tRNA molecules bind one after another to the exposed portion of the mRNA molecule as it moves through the ribosome. Each of these tRNA molecules has attached to it an amino acid, and the amino acids, which they bring to the ribosome, are added, one after another, to the end of a growing polypeptide chain.
The anticodon of a tRNA molecule is three nucleotides long. The base sequences of the tRNA anticodons are complementary to the associated sequences of mRNA. Since there are four different kinds of nucleotides in mRNA (cytosine, guanine, adenine and uracil) there are 43, or 64, different three-letter code words, or codons, possible. Some tRNA molecules recognize only one codon, corresponding to one of these code words; others recognize two, three, four, or six different mRNA codons. The list of different mRNA codons specific for each of the 20 amino acids is called the genetic code (Figure 31). Each activating enzyme is by conventioamed for the amino acid that it adds, thus the activating enzyme that adds leucine to its tRNA is designated leucine tRNA synthetase.
The genetic code is the same in all organisms, with only a few minor exceptions. A particular codon such as AGA corresponds to the same amino acid (arginine) in bacteria as in humans. Note that three of the 64 codons (UAA, UAG, and UGA) do not correspond to triplets that are recognized by any activating enzyme. These three codons called nonsense codons, serve as “stop” signals in the mRNA message marking the end of a polypeptide. The “start” signal, which marks the beginning of a polypeptide amino acid sequence within an mRNA message, is the codon AUG, a triplet that also encodes the amino acid methionine. The ribosome uses the first AUG that it encounters in the mRNA message to signal the start of its translation.
Discovery of a Second Genetic Code
How does a particular tRNA molecule come to possess the amino acid that it does, and not just any amino acid? The correct amino acid is placed on each tRNA molecule by a collection of 20 enzymes called activating enzymes. There is one activating enzyme for each of the 20 common amino acids. An activating enzyme binds the amino acid that it recognizes to a tRNA molecule. If one considers the nucleotide sequence of mRNA to be a coded message, then the 20 activating enzymes are the code books of the cell – the instructions for decoding the message. An activating enzyme recognizes both nucleotide-sequence information (a specific sequence of a tRNA molecule) and protein-sequence information (a particular amino acid).
Ever since the main genetic code was worked out 20 years ago, researchers have searched for this second code, the language of the instructions on the tRNA, which the activating enzyme reads in order to select the correct amino acid to bind to that tRNA molecule. In May of 1988 the first step in breaking this second genetic code was reported. A critical pair of nucleotides located near the site where the alanine gets attached to tRNA, when replaced with different bases, destroyed the activating enzyme’s ability to add an amino acid to the tRNA. This region was the place on the tRNA where the activating enzyme looked for instructions. If the 2-nucleotide bit from alanine tRNA was put into a tRNA normally recognized by the glycine activating enzyme in place of the 2-nucleotide fragment normally there, the altered tRNA was recognized by the alanine activating enzyme – a tRNA with the anticodon specifying glycine was recognized by the alanine activating enzyme, which added alanine to it instead of glycine! The full decoding of this second genetic code is an area of intense research.
The mRNA codons specific for the 20 common amino acids constitute the genetic code. All organisms possess a battery of 20 enzymes, called activating enzymes, each of which recognizes a particular anticodon, and a particular amino acid.
Not All Organisms Use the Same Genetic Code
Starting in 1979, investigators began to determine the complete nucleotide sequences of mitochondrial genomes. The sequences for humans, cattle, and mice were all reported within a short period. It came as something of a shock when these investigators learned that the genetic code that these mammalian mitochondria were using was not quite the same as the “universal code” that had by then become so familiar to biologists. Most of the code words were the same – but not all. In mammalian mitochondria, what should have been a “stop” codon, UGA, was instead read as an amino acid, tryptophan; AUA was read as methionine rather than isoleucine; and AGA and AGG were read as “stop” rather than arginine. To make matters worse, when the first chloroplasts were sequenced, they too proved to contain minor differences from the universal code. Thus it appears that the genetic code is not quite universal. Sometime, presumably after beginning their symbiotic existence, mitochondria and chloroplasts began to read the code differently, particularly that portion of the code associated with “stop” signals. Nor is this phenomenon limited to subcellular organelles. At some point very early in the evolution of ciliates (one of the phyla of the kingdom Protista), they also changed the way in which they read chain-terminating “stop” codons. Under most conditions, any genetic change involving termination signals would be expected to be lethal to an organism. How this change could have arisen in organelles, much less in eukaryotes such as ciliates, is a puzzle to which we as yet have no answer.
AN OVERVIEW OF GENE EXPRESSION
The hereditary apparatus of your body works in much the same way as that of the most primitive bacteria – all organisms utilize the same basic mechanism. An RNA copy of each active gene is made, and at a ribosome the RNA copy directs the sequential assembly of a chain of amino acids. There are many minor differences in the details of gene expression between bacteria and eukaryotes, and a single major difference that we will discuss later. The basic apparatus used in gene expression, however, appears to be the same in all organisms; it apparently has persisted virtually unchanged since very early in the history of life. The process of gene expression occurs in two phases, which are called transcription and translation.
Transcription
The first stage of gene expression is the production of an RNA copy of the gene, called messenger RNA or mRNA. Like all classes of RNA that occur in cells, mRNA is formed on a DNA template. The production of mRNA is called transcription; the messenger RNA molecule is said to have been transcribed from the DNA (Figure 32). Transcription is initiated when a special enzyme, called an RNA polymerase, binds to a particular sequence of nucleotides on one of the DNA strands (how does the RNA polymerase know which strand is the “sense” strand? Because it searches for a particular nucleotide sequence called a promoter site. Analysis of over 100 such sites shows that two 6-nucleotide sequences occur, with minor variations, in all promoter sites: TTGACA and TATAAT. It is to these two sequences on the sense strand that the RNA polymerase binds. RNA polymerase does not bind the other strand because that strand contains the complementary sequences AACTGT and ATATTA, sequences which RNA polymerase does not recognize), a sequence located at the edge of a gene. Starting at that end of the gene, the RNA polymerase proceeds to assemble a single strand of mRNA with a nucleotide sequence complementary to that of the DNA strand it has bound. Complementarity refers to the way in which the two single strands of DNA that form a double helix relate to one another, with A (adenine) pairing with T (thymine) and G (guanine) pairing with C (cytosine). An RNA strand complementary to a DNA strand has the same relationships, but with U (uracil) in place of thymine.
As the RNA polymerase moves along the strand into the gene, encountering each DNA nucleotide in turn, it adds the corresponding complementary RNA nucleotide to the growing mRNA strand. When the enzyme arrives at a special “stop” signal at the far edge of the gene, it disengages from the DNA and releases the newly assembled mRNA chain. This chain is complementary to the DNA strand from which the polymerase assembled it; thus, it is an RNA transcript (copy), called the primary mRNA transcript, of the DNA nucleotide sequence of the gene.
Translation
The second stage of gene expression is the synthesis of a polypeptide by ribosomes, which use the information contained in an mRNA molecule to direct the choice of amino acids. This process of mRNA-directed polypeptide synthesis by ribosomes is called translation because nucleotide-sequence information is translated into amino acid – sequence information. Translation begins when an rRNA molecule within the ribosome binds to one end of an mRNA transcript. Once it has bound to the mRNA molecule, a ribosome proceeds to move down the mRNA molecule in increments of three nucleotides. At each step, it adds an amino acid to a growing polypeptide chain. It continues to do this until it encounters a “stop” signal that indicates the end of the polypeptide. It then disengages from the mRNA and releases the newly assembled polypeptide. An overview of protein synthesis is presented in Figure.
The information encoded in genes is expressed in two stages: transcription, in which an RNA polymerase enzyme assembles an mRNA molecule whose nucleotide sequence is complementary to the gene’s template DNA strand; and translation, in which a ribosome assembles a polypeptide, using the mRNA to specify the amino acids.
HOW ARE GENES ENCODED?
The essential question of gene expression is: How does the order of nucleotides in a DNA molecule encode the information that specifies the order of amino acids in a protein? What, in other words, is the nature of the genetic code? The answer came in 1961, as the result of an experiment led by Francis Crick. Crick’s experiment is so elegant, and the result so critical to our understanding of genes, that we will describe it in detail.
Crick and his colleagues reasoned that the “genetic code” most likely consisted of a series of blocks of information, each block corresponding to an amino acid in the encoded protein. They further hypothesized that the information within one block was probably a sequence of three nucleotides specifying a particular amino acid. They arrived at the number three because a two-nucleotide block would not yield enough different combinations to code for the 20 different kinds of amino acids that commonly occur in proteins. Imagine that you have some fruit: apples, oranges, pears, and grapefruits. How many different pairs of fruit can you construct? Only 42, or 16 different pairs. How many groups of three can you construct? A lot more – 43, or 64 different combinations, more than enough to code for the 20 amino acids.
But are these three-nucleotide blocks, called codons, read in simple sequence, one after the other? Or is there punctuation (a silent nucleotide) between each of the three-nucleotide blocks? The sentences of this book, for example, use spaces to punctuate between words. Without the spaces, this sentence would read “withoutthespacesthissentencewouldread.” In principle, either way of reading will work. However, it is important to know which method is employed by cells, since the two ways of reading DNA imply different translating mechanisms. To choose between these two alternative hypotheses, Crick and his colleagues used a chemical that adds or deletes a single nucleotide to a DNA strand. Such an addition or deletion changes the reading frame of a genetic message, whether or not it is punctuated, but the two alternative hypotheses differ in what is required to restore the correct reading frame register.
The deletion (or addition) of three nucleotides restores a three-digit unpunctuated code to the correct reading frame but does not restore a punctuated code. What Crick and his coworkers did was to use genetic recombination to put three deletions together near one another on a virus DNA and then look to see if the genes downstream were read correctly or as nonsense. When one or two deletions were placed at the beginning of the region, the downstream gene was translated as nonsense, but three deletions (or three additions) restored the correct reading frame so that sequences downstream were translated correctly. Thus the genetic code was read in increments consisting of three nucleotides, and reading occurs without punctuation between the three-nucleotide units.
Within genes that encode proteins, the nucleotide sequence of DNA is read in increments of three consecutive nucleotides, without punctuation between increments. Each block of three nucleotides codes for one amino acid.
Just what the code words are was soon worked out, the first results coming within a year of Crick’s experiment. Researchers had developed mixtures of RNA and protein isolated from ruptured cells (“cell-free systems”) that would synthesize proteins in a test tube. To determine which three-nucleotide sequences specify which amino acids, researchers added artificial RNA molecules to these cell-free systems and then looked to see what proteins were made. For example, when Marshall Nierenberg, of the National Institutes of Health, added poly-U (an RNA molecule consisting of a string of uracil nucleotides) to such a system, it proceeded to synthesize polyphenylalanine (a protein consisting of a string of phenylalanine amino acids). This result indicated that the three-nucleotide sequence (or triplet) specifying phenylalanine was UUU. In this and other ways all 64 possible triplets were examined, and the full genetic code was determined.
THE MECHANISM OF PROTEIN SYNTHESIS
Polypeptide synthesis begins with the formation of an initiation complex. First, a methionine-carrying tRNA – met-tRNA – molecule binds to the small ribosomal subunit. Special proteins called initiation factors position the met-tRNA on the ribosomal surface. The proper positioning of this first amino acid is critical, since it determines the reading frame (the groups of three) with which the nucleotide sequence will be translated into a polypeptide. This initiation complex, guided by another initiation factor, then binds to mRNA. It is important that the complex bind to the beginning of a gene, so that all of the gene will be translated. In bacteria, the beginning of each gene is marked by a sequence that is complementary to one of the rRNA molecules on the ribosome (in prokaryotes, many mRNAs encode more than one protein. In order to produce more than one protein from a single mRNA, protein synthesis must initiate at internal AUGs. Sequences immediately preceding the AUG (called Shine and Delgarno sequences) are recognized by the ribosome and signal that this AUG is the site at which protein synthesis should begin. A Shine and Delgarno sequence precedes the AUG start of each protein. In eukaryotes, where mRNAs almost always encode a single protein, initiation usually begins at the first AUG. Apparently, eukaryotic ribosomes bind to the capped mRNA and scan down the message until the first AUG is reached). This ensures that genes are read from the beginning; each mRNA binds to the ribosomes that read it by base-pairing between the sequence at its beginning and the complementary sequence on the rRNA which is a part of the ribosome.
An initiation complex consists of a small ribosomal subunit, mRNA, and a tRNA molecule carrying methionine.
After the initiation complex has been formed, the synthesis of the polypeptide proceeds as follows:
1. The ribosome exposes the codon on the mRNA immediately adjacent to the initiating AUG codon, positioning it for interaction with another incoming tRNA molecule. When a tRNA molecule with the appropriate anticodon appears, this new incoming tRNA briefly binds to the mRNA molecule at its exposed codon position. Special proteins called elongation factors (because they aid in making the polypeptide longer) help to position the incoming tRNA. Binding the incoming tRNA to the ribosome in this fashion places the amino acid at the other end of the incoming tRNA molecule directly adjacent to the initial methionine, which is dangling from the initiating tRNA molecule still bound to the ribosome.
2. The two amino acids undergo a chemical reaction, in which the initial methionine is released from its tRNA and is attached instead by a peptide bond to the adjacent incoming amino acid. The abandoned tRNA falls from its site on the ribosome, leaving that site vacant.
3. In a process called translocation, the ribosome now moves along the mRNA molecule (“translocates”) a distance corresponding to three nucleotides, guided by other elongation factors. This movement repositions the growing chain, at this point containing two amino acids, and exposes the next codon of the mRNA. This is the same situation that existed in step 1. When a tRNA molecule that recognizes this next codon appears, the anticodon of the incoming tRNA binds this codon, placing a new amino acid adjacent to the growing chain. The growing chain transfers to the incoming amino acid, as in step 2, and the elongation process continues.
4. When a chain-terminating nonsense codon is encountered, no tRNA exists to bind to it. Instead, it is recognized by special release factors, proteins that bring about the release of the newly made polypeptide from the ribosome.
Protein synthesis is carried out on the ribosomes, which bind to sites at one end of the mRNA and then move down the mRNA in increments of three nucleotides. At each step of the ribosome’s progress, it exposes a three-base sequence to binding by a tRNA molecule with the complementary nucleotide sequence. Ultimately, the amino acid carried by that particular tRNA molecule is added to the end of the growing polypeptide chain.
PROTEIN SYNTHESIS IN EUKARYOTES
Protein synthesis occurs in a similar way in both bacteria and eukaryotes, although there are differences. One difference is of particular importance. Unlike bacterial genes, most eukaryotic genes are much larger than they need to be, containing long stretches of nucleotides that are cut out of the mRNA transcript before it is used in polypeptide synthesis. Because these sequences are removed from the mRNA transcript before it is used, they are not translated and do not correspond to any portion of the polypeptide. The sequences that intervene between the polypeptide-specifying portions of the gene are called introns. The remaining segments of the gene – the nucleotide sequences that encode the amino-acid sequence of the polypeptide – are called exons. Exons are typically much shorter than introns, and are scattered among the larger noncoding sequences. In a typical human gene the nontranslated intron portion of a gene can be 10 to 30 times larger than the coding exon portion. For example, even though only 432 nucleotides are required to encode the 144 amino acids of hemoglobin, there are actually 1356 nucleotides in the primary mRNA transcript of the hemoglobin gene.
THE DISCOVERY OF INTRONS
Virtually every nucleotide within the transcribed portion of a bacterial gene participates in an amino acid-specifying codon, and the order of amino acids in the protein is the same as the order of the codons in the gene. It was assumed for many years that all organisms would naturally behave in this logical way. In the late 1970s, however, biologists were amazed to discover that this relationship, one with which they had become completely familiar, did not in fact apply to eukaryotes. Instead, eukaryotic genes are encoded in segments that are excised from several locations along the transcribed mRNA and subsequently stitched together to form the mRNA that is eventually translated in the cytoplasm. With the benefit of hindsight, it is not difficult to design an experiment that reveals this unexpected mode of gene organization:
1. Isolate the mRNA corresponding to a particular gene. Much of the mRNA of red blood cells, for example, is related to the production of the proteins hemoglobin and ovalbumin, making it easy to purify the mRNAs from the genes related to these proteins.
2. Using an enzyme called reverse transcriptase, it is possible to make a DNA version of the mRNA that has been isolated. Such a version of a gene is called “copy” DNA (cDNA).
3. Using genetic engineering techniques, isolate from the nuclear DNA the portion that corresponds to one of the actual hemoglobin genes. This procedure is referred to as “cloning” of the gene in question.
4. Mix single-strand forms of this hemoglobin cDNA and nuclear DNA and permit them to pair with each other (“hybridize”) and form a duplex.
When this experiment was actually carried out and the resulting duplex DNA molecules were examined with the electron microscope, the hybridized DNA did not appear as a single duplex. Instead, unpaired loops were observed. In a related example, there are seven different sites within the ovalbumin gene at which the nuclear version contains long nucleotide sequences that are not present in the cytoplasmic cDNA version. The conclusion is inescapable: nucleotide sequences are removed from within the gene transcript before the cytoplasmic mRNA is translated into protein. As we noted earlier, these internal noncoding sequences are called introns, and the coding segments are called exons. Because introns are removed from the mRNA transcript before it is translated into tRNA, they do not affect the structure of the protein that is encoded by the gene in which they
REGULATING GENE EXPRESSION
A cell must know not only how to make a particular protein, but also when to make it. It is important for an organism to be able to control which of its genes are being transcribed, and when. There is, for example, little point for a cell to produce an enzyme when the enzyme’s substrate, the target of its activity, is not present in the cell. Much energy can be saved if the enzyme is not produced very much until the appropriate substrate is encountered and the enzyme’s activity will be of use to the cell.
From a broader perspective, the growth and development of multicellular organisms entails a long series of biochemical reactions, each delicately tuned to achieve a precise effect. Specific enzyme activities are called into play and bring about a particular change. Once this change has occurred, those particular enzyme activities cease, lest they disrupt other activities that follow. During development, genes are transcribed in a carefully prescribed order, each gene for a specified period of time. The hereditary message is played like a piece of music on a grand organ in which particular proteins are the notes and the hereditary information, which regulates their expression, is the score.
Organisms control the expression of their genes largely by controlling when the transcription of individual genes begins. Most genes possess special nucleotide sequences called regulatory sites, which act as points of control. These nucleotide sequences are recognized by specific regulatory proteins within the cell, which bind to the sites.
Negative Control
Regulatory sites often function to shut off transcription, a process called negative control. In these cases, the site at which the regulatory protein binds to the DNA is located between the site at which the polymerase binds and the beginning edge of the gene that the polymerase is to transcribe. When the regulatory protein is bound to its regulatory site, its presence there blocks the movement of the polymerase toward the gene. To understand this more clearly, imagine that you are shooting a cue ball at the eight ball on a pool table and someone places a brick on the table between the cue ball and the eight ball. Functionally, this brick is like the regulatory protein that binds to a regulatory site on the DNA: its placement blocks movement of the cue ball to the eight ball, just as placement of the regulatory protein between the polymerase and the gene blocks movement of the polymerase to the gene. The process of blocking transcription in this way is called repression, and the regulatory protein that is responsible for the blockage is called a repressor protein.
Positive Control
Regulatory sites may also serve to turn on transcription, a process called positive control. In these situations, the binding of a regulatory protein to the DNA is necessary before the transcription of a particular gene may begin. The regulatory protein whose binding turns on transcription is called an activator protein, and the “turning on” of the transcription of specific genes in this way is called activation . Activation can be achieved by a variety of mechanisms. In some cases, the activator protein’s binding promotes the unwinding of the DNA duplex. This facilitates the production of an mRNA transcript of a gene, because the polymerase, while it can bind to a double-stranded DNA duplex, cannot produce an mRNA transcript from such a duplex: mRNA is transcribed from a single strand of the duplex.
How Regulatory Proteins Work
How does the cell use regulatory sites to control which genes are transcribed? It does it by influencing the shape of the regulatory proteins. Regulatory proteins possess binding sites not only for DNA but also for specific small molecules. The binding of one of these small molecules can change the shape of a regulatory protein and thus destroy or enhance its ability to bind DNA. In some cases, the protein in its new shape may no longer recognize the regulatory site on the gene. In other cases, the recontoured regulatory protein may begin to recognize a regulatory site that it had previously ignored.
The cell thus uses the presence of particular “signal” molecules within the cell to incapacitate particular regulatory proteins or to mobilize them for action. These regulatory proteins, in turn, repress or activate the transcription of particular genes. The pattern of metabolites in the cell sets “on/off” protein regulatory switches, and, by doing so, achieves a proper configuration of gene expression.
Organisms control the expression of their hereditary information by selectively inhibiting the transcription of some genes and facilitating the transcription of others. Control over transcription is exercised by modifying the shape of regulatory proteins and thus influencing their tendency to bind to sites on the DNA that influence the initiation of transcription.
A VOCABULARY OF GENE EXPRESSION
anticodon The three-nucleotide sequence at the tip of a transfer RNA molecule that is complementary to, and base pairs with, an amino acid-specifying codon in messenger RNA.
codon The basic unit of the genetic code; a sequence of three adjacent nucleotides in DNA or mRNA that code for one amino acid or for polypeptide termination.
exon A segment of DNA that is both transcribed into mRNA and translated into protein; contrasts with introns. In. eukaryotic genes exons are typically scattered within much longer stretches of nontranslated intron sequences.
intron A segment of DNA that is transcribed into mRNA but removed before translation. These untranscribed regions make up the bulk of most eukaryotic genes.
nonsense codon A chain-terminating codon; a codon for which there is no tRNA with a complementary anticodon. There are three: – UAA, UAG, and UGA.
operator A site of negative gene regulation; a sequence of nucleotides that may overlap the promoter, which is recognized by a repressor protein. Binding of the repressor protein to the operator prevents binding of the polymerase to the promoter (just as two people cannot sit at one chair) and so blocks transcription of the structural genes of an operon.
operon A cluster of functionally related genes transcribed onto a single mRNA molecule. A common mode of gene regulation in prokaryotes, but it is rare in eukaryotes other than fungi.
promoter An RNA polymerase binding site; the nucleotide sequence at the 5′ end of a gene to which RNA polymerase attaches to initiate transcription of mRNA.
RNA polymerase The enzyme that transcribes RNA from DNA.
repressor A protein that regulates transcription of mRNA from DNA by binding to the operator and so preventing RNA polymerase from attaching to the promoter.
transcription The polymerase-catalyzed assembly of an RNA molecule complementary to a strand DNA.
translation The assembly of a protein on the ribosomes, using mRNA to direct the order of amino acids.
Genes encoding enzymes possess regulatory regions. The segment that is transcribed into mRNA is called a transcription unit and consists of the elements that are involved in the translation of the mRNA: the ribosome-binding site and the coding sequences. In front of the transcription unit on the DNA are the elements involved in regulating its transcription: binding sites for the polymerase and for regulatory proteins.
This complex system of regulatory sites works to ensure that mRNA is copied from the three structural genes only when the cell can effectively utilize the proteins they encode: when there is lactose present and when the cell requires the energy that would result from lactose breakdown. The regulatory region of the lac operon controls when the lac genes are transcribed in several interacting ways.
Activation
A special activator protein called CAP (catabolite activator protein) stimulates the transcription of the lac operon when the cell is low in energy. When glucose levels in the cell are high, metabolites called cyclic AMP (cAMP) are low, and because CAP can only bind the lac operon when complexed to cAMP (altering its shape so that it recognizes the CAP site), activation by CAP does not occur, and the lac operon is not transcribed. As a result of this CAP activation system, the enzymes needed for the metabolism of lactose are produced only when the cell requires the energy that lactose would provide.
Repression
The sugar lactose is encountered by bacteria only occasionally, so that the enzymes that metabolize lactose are not usually able to function, for lack of a substrate on which to act. Bacteria do not produce the enzymes under these conditions. The lac repressor protein blocks the binding of RNA polymerase to the lac promoter region under most circumstances; cells in this condition are said to be repressed with respect to lac operon transcription.
Like the activator protein, the lac repressor protein is capable of changes in shape. When lactose binds to the repressor protein, the protein assumes a different shape, one that does not recognize the operator sequence. If the cell contains much lactose, therefore, the lac repressor proteins become inactive. This removes the block from in front of the polymerase, and so permits transcription of the lac genes to begin. It is for this reason that addition of lactose to a growing bacterial culture causes a burst of synthesis of the lactose-utilizing enzymes. The transcription of the enzymes is said to have been induced by the lactose. This element of the control system ensures that the lac operon is transcribed only in the presence of lactose.
The lac operon is thus controlled at two levels: the lactose-utilizing enzymes are not produced unless the sugar lactose is available; even if lactose is available, these enzymes are not produced unless the cell has need of the energy. Other similarly precise control mechanisms are known in eukaryotes, but the example provided here illustrates the complex nature of cellular control of protein synthesis.
In the process of mitosis every cell in the body of a multicellular organism inherits a complete set of chromosomes; the nucleotide sequences of this DNA carry in full the genetic information that is the cellular evolutionary endowment. But though every cell in the body receives the same set of instructions, individual cells may look entirely different from other cells, and may behave in entirely different ways. In fact, only a particular subset of all the genes of cell generates proteins. Furthermore, only a small percentage of this genetic material is active at any one time. Many minor, minute-by-minute adjustments of cellular chemistry can be made by shifts in enzyme activity; major adjustments involve altering the expression of a cell’s genes as the needs of the cell change with the changing cellular environment. In this chapter we shall examine the logic and mechanisms of gene control, a process we now know involves chemicals that bind directly or indirectly to DNA or mRNA.
Control of gene expression in bacteria. Early investigators of gene expression in bacteria made several assumptions about the process they sought to understand. First, it was logical to assume that proper control of gene expression would require that only those genes whose products were needed at any given moment be expressed. Second, since most genes code for an enzyme that controls only a single step in a biochemical pathway, the genes coding for several enzymes in the same pathway might be expected to be controlled as a group. And furthermore, since the function of a pathway is to turn a reactant into a product, the availability of the reactant in the cell might be expected to turn on transcription, while the availability of the final product might turn it off. These assumptions have proven to be correct in many cases of transcription in bacteria. Because bacteria possess only about 3,000 genes (compared to the 30,000 of humans, for instance) and can be grown rapidly in huge numbers, they have been especially useful organisms for study of the control of gene transcription. Much of our current knowledge of this subject comes from research on the intestinal bacterium Escherichia coli.
The Jacob-Monod model of gene induction. Investigating enzyme synthesis in E. coli, the French biochemists Francois Jacob and Jacques Monod formulated a powerful model of gene regulation in bacterial cells. They worked mainly with the enzyme β-galactosidase, which catalyzes the breakdown of lactose to glucose and galactose, substances both used and produced by other pathways.
Lactose is not continuously available to E. coli, and so – as would be expected – the gene for β-galactosidase is normally transcribed at a very low rate; in the absence of lactose there are only about ten β -galactosidase molecules per cell. Jacob and Monod found that the further production of this digestive enzyme is triggered by the presence of a so-called inducer, in this instance allolactose, an isomer of lactose automatically produced in the cell when lactose is present. Normally, then, β -galactosidase is an inducible enzyme. But they also found a mutant strain of E. coli in which the same enzyme is a constitutive enzyme – that is, an enzyme whose production is continuous, apparently uninfluenced by control substances such as inducers.
By means of recombination experiments, Jacob and Monod were eventually able to demonstrate the participation of four genes in the production of β -galactosidase and the two other enzymes involved in lactose breakdown: three so-called structural genes, each specifying the amino acid sequence of one of the three enzymes, and a regulator gene, which controls the activity of the structural genes. They proposed that the regulator gene, which is located at some distance from the structural genes, normally directs the synthesis of a repressor protein that inhibits transcription of the structural genes. The allele of the regulator gene present in the mutant constitutive strain, they concluded, lacks the ability to direct synthesis of an effective repressor; hence it cannot prevent transcription of the structural genes, which are thus left free to direct continuous protein synthesis. Jacob and Monod also discovered that a special region of DNA contiguous to the structural gene for β-galactosidase determines whether transcription of the structural genes will be initiated; they called this special region the operator, and they called the combination of the operator and its three associated structural genes an operon. Subsequently it was found that the operator, which does not in itself constitute a gene since it doesn’t code for a specific product, is located between the two important sequences of the promoter, the region to which RNA polymerase binds. Hence, when the repressor binds to the operator, RNA polymerase cannot physically bind to the promoter, and transcription is blocked. The operon consists of a promoter/operator region and three structural genes (Z, Y, and A). For simplicity the structural genes, which are much longer than the promoter/operator region, are shown greatly shortened. Also, the boundaries between the operator and the promoter sequences are drawn to appear quite sharp, though actually the operator sequence overlaps the end of the first promoter sequence and the beginning of the second. The regulator gene codes for mRNA, which is translated on the ribosomes and determines synthesis of repressor protein. When the represser protein binds to the operator, it blocks the promoter’s binding sites for RNA polymerase and thus prevents transcription of the structural genes.
If inducer is present, it will bind to the repressor, thus causing a conformational change in the repressor that forces it to dissociate from the operator; in short, the inducer inactivates the repressor. Now free to bind to the promoter, RNA polymerase can initiate transcription of the structural genes and the production of mRNA. Binding of inducer to the repressor inactivates the repressor, and the RNA polymerase can then bind to the promoter regions and initiate transcription of the structural genes. These, transcribed as a unit, determine production of polycistronic mRNA-that is, mRNA coding for more than one gene product. The mRNA then complexes with ribosomes in the cytoplasm and is translated into three enzymes. Enzyme I is β-galactosidase; enzyme II is a permease that helps transport lactose into the cell; and enzyme III is a transacetylase, whose role in lactose utilization is not understood).
The mRNA carries the instructions of all three structural genes, and is therefore said to be potycistronic. This messenger complexes with ribosomes in the cytoplasm, where its information is translated and the three enzymes necessary for lactose metabolism are synthesized. The number of β-galactosidase enzymes rises to about 5,000 per cell when the operon is not repressed.
According to the Jacob-Monod model, then, the condition of the operator region is the key to whether or not there will be activation of the so-called lac operon-the operon responsible for the synthesis of enzymes involved in the breakdown of lactose. If repressor protein is bound to the operator, there will be no transcription. If no repressor is bound to the operator (because the repressor has been inactivated by inducer), transcription can proceed freely. Notice that the three jointly controlled structural genes of the lac operon specify enzymes with closely related functions. It is characteristic for the structural genes of an operon to determine the enzymes of a single biochemical pathway; thus the whole pathway can be regulated as a unit. The adaptive advantage of such coordinated control is obvious.
Gene repression. In the years since the Jacob-Monod model was first proposed, it has become apparent that not all operons are regulated in the same way as the lac operon, which is an inducible operon-that is, one which is inactive until turned on by an inducer substance. Many operons are, instead, continuously active unless turned off by a corepressor substance. One example is the operon whose five structural genes code for the enzymes necessary to synthesize the amino acid tryptophan. This operon is normally turned on, but when E. coli are grown in a medium containing tryptophan, it switches off. Enzymes encoded by genes that are usually active but can be repressed arc called repressible enzymes. In their case, the repressor protein encoded by the regulator gene is inactive when first produced. Only if a corepressor substance binds to and activates it can it bind to the operator and block RNA polymerase binding. A repressible operon. The repressor protein encoded by the regulator gene is initially inactive. Only if it binds a corepressor molecule (often the end product of a biochemical pathway) can it bind to the operator and block transcription of the structural genes. After the operon has been repressed, the concentration of the corepressor falls as it is used in cellular metabolism and no more is produced. When the corepressor becomes scarce, the repressor tends to lose it to metabolic enzymes. As a result, the repressor cao longer bind to the operator, the RNA polymerase binds to the promoter, and transcription resumes. The operon shown here is responsible for the synthesis of an amino acid, which is incorporated into new proteins). Unlike inducible enzymes, which are synthesized only if their operon is turned on by an inducer, repressible enzymes are automatically synthesized unless their operon is turned off by a corepressor. In tryptophan synthesis, the tryptophan itself activates the repressor protein, enabling it to bind to the operator.An inducer is often either the first substrate in the biochemical pathway being regulated (that is, the first molecule the synthesized enzyme will bind to) or some substance closely related to that substrate. It is not surprising, therefore, to find that a corepressor is usually the end product of the biochemical pathway being regulated, or a closely related substance. In both substrate induction and end-product corepression, then, gene transcription is regulated by the cellular substances most affected by the transcription-a truly elegant functional arrangement.
Gene Regulation in Eukaryotes. In eukaryotes, the genes that code for the enzymes needed to catalyze the various steps of a particular metabolic pathway may not be adjacent or even be in different chromosomes. However, these genes are regulated by operon model as in bacteria. Both the inducible and the repressible systems of the operon model work in the eukaryotic cells but through a complex network of regulatory genes. Environment changes in the cells as growth and development proceed, and the gene expression is suitably regulated to cope with the new environment. Hormones, vitamins, minerals, chemicals and pathogens formed or entering the cells may induce or repress certain genes. This would produce certain proteins or stop the formation of some proteins, thereby initiating new or terminating existing metabolic pathways. This ultimately alters the cell function. The change in cell function brought about by the influence of environment on gene expression forms the molecular basis of growth, development, differentiation and disease.
The following points are notable about gene regulation in eukaryotes:
1. The structural gene has coding and noncoding segments called exons and introns respectively.
2. Each structural gene appears to have its own promoter gene.
3. There are sensor genes to pick up changes in the intracellular environment such as presence or lack of a substance.
4. There are integrator genes to coordinate functioning of structural genes located in different parts of DNA.
Advantages of the regulation of gene action. An operon is the unit of transcription. When excess protein is present, it is no longer economical for the cell to use up its energy and materials in the synthesis of enzymes that are not immediately needed. Hence, the protein synthesis is stopped. This is clearly advantageous for the cell.
Genetic code. The uniqueness of every cell, individual or species lies in the uniqueness of its proteins. Cells are enabled to synthesize their specific proteins by the information flowing from the DNA. This information exists as the particular sequences of bases in the DNA strands and is called genetic code. It is sent to the protein-manufacturing machinery in the form of mRNA synthesized on DNA template. The order of bases in the mRNA decides the order of amino acids in the polypeptide to be synthesized.
Nature of Genetic Code. It has been found that a sequence of three consecutive bases in a DNA molecule codes for one specific amino acid. Thus, the genetic code is a triplet code. That a sequence of three nucleotides codes for one amino acid was first suggested by George Gamow in 1954. Crick in 1961 concluded that three consecutive nucleotides in mRNA strand determine the position of a single amino acid in a polypeptide chain. Nirenberg and Mathaei soon provided experimental evidence to show that the genetic code is a triplet one.
Nirenberg and Mathaei have determined which sequence of bases coded for which amino acid with the help of experiments. It was discovered that codes are in terms of messenger RNA and not of DNA. The reason for this practice is that the cell reads the code from messenger RNA molecule, and not directly from DNA of chromosomes. The mRNA is read from the 5′ end towards the 3′ end.
Characteristics of Genetic Code. The genetic code of DNA has certain cell established fundamental characteristics. These are given be low:
1. Triplet Nature. The genetic code is a triplet code. Three adjacent bases, termed a codon, specify one amino acid.
2. No Overlapping. The adjacent codons do not overlap.
3. No Punctuation. The genetic code has no «punctuation marks» (gaps) between the coding triplets.
4. Universality. The genetic code is universal, i.e.. a given codon in the DNA and mRNA specifies the same amino acid in the protein-synthesizing stems of all organisms, from bacteria to man, also in viruses.
5. Degeneracy. The genetic code is degenerate, i.e.. it lacks specificity, and one amino acid often has more than one code triplet.
6. Terminator Codons or «Nonsense» Codons. Three (3) of the 64 codons, namely, UAA, UAG and UGA, do not specify am amino acid, but signal the end of a message. They are called the nonsense or terminator codons. Either of these stops synthesis of the polypeptide chain.
7. Initiation or Start Codons. The codons AUG and CUG are called the initiation or start codons as they begin the synthesis of polypeptide chain.
8. Colinearity. DNA is a linear polynucleotide chain and a protein is a linear polypeptide chain. The sequence of amino acids in a polipeptide chain corresponds to the sequence of nucleotide bases in the gene (DNA) that codes for it. Change in a specific codon in DNA produces a change of amino acid in the corresponding position in the polypeptide. The gene and the polypeptide it codes for are said to be colinear.
9. Gene-Polypeptide Parity. A specific gene transcribes a specific mRNA, which produces a specific polypeptide. On this basis, a cell can have only as many types of polypeptides as it has types of genes.
1. Gene Expression in prokaryotes: transcription, translation.
2. Gene Expression in eukaryotic cells: transcription, processing, translation.
3. Translation stages: initiation, elongation and termination.
4. Central dogma of molecular biology (DNA® DNA® mRNA® protein).
Gene Expression. The process by which a gene produces a product, usually a protein, is called gene expression. DNA not only serves as a template for its own replication, it is also a template for RNA formation. Most often it is mRNA that is produced. The process by which a mRNA copy is made of a portion of DN A is called transcription. Following transcription, mRNA will have a sequence of bases that is complementary to that of DNA. Then, mRNA moves into the cytoplasm. Photographic data shows radioactively labeled RNA moving from the nucleus to the cytoplasm, where protein synthesis occurs. The central dogma of molecular biology also says that mRNA directs the synthesis of a polypeptide. During translation, the sequence of bases in mRNA dictates the sequence of amino acids in a protein.Gene expression requires both transcription and translation. These terms are apt. Transcribing a document means making a copy of it, and translating a document means putting it in a different language.Gene expression includes the processes of transcription and translation. During transcription, DNA serves as a template for the formation of complementary RNA. During translation, the sequence of bases in RNA determines the sequence of amino acids in a protein.
Transcription. Transcription is the first step required for gene expression, the process by which a gene product is made. Most often this gene product is a protein, but we should note that the molecules tRNA and rRNA are also transcribed off DNA templates. These molecules are also gene products. Just now we are interested in the formation of mRNA, which carries genetic information to the ribosomes, where protein synthesis occurs.
Messenger RNA. During transcription, a mRNA molecule is formed that has a sequence of bases complementary to a portion of one DNA strand; wherever A, T, G, or С is present in the DNA template, U, A, C, or G is incorporated into the mRNA molecule. A segment of the DNA helix unwinds and unzips, and complementary RNA nucleotides pair with DNA nucleotides of the strand that is to be transcribed. When these RNA nucleotides are joined together by an RNA polymerase, an mRNA molecule results. This molecule now carries a sequence of codons that will be used to order the sequence of amino acids in a protein. Transcription begins at a region of DNA called a promoter. A promoter is a special sequence of DNA bases where RNA polymerase attaches and the transcribing process begins. A promoter is at the start end of the gene to be transcribed. Some genes are on one of the DNA strands, and some are on the other strand.
Elongation of the mRNA molecule occurs as long as transcription proceeds. Only the newest portion of a RNA molecule is bound to the DNA, and the rest dangles off to the side. Finally, RNA polymerase comes to a terminator sequence at the other end of the gene being transcribed. The terminator causes RNA polymerase to stop transcribing the DNA and to release the mRNA molecule, now called a RNA transcript. Many RNA polymerase molecules can be working to produce a RNA transcript at the same time. This allows the cell to produce many thousands of copies of the same mRNA molecule and eventually many copies of the same protein within a shorter period of time than otherwise.
RNA Processing. Since the advent of modern molecular techniques, investigators can compare the structure of various eukaryotic RNA transcripts and their corresponding genes. They do this by first isolating the mRNA and its corresponding gene. Then, they separate the DNA molecule into single strands and allow the mRNA to bind to its complementary strand. If the 2 molecules are indeed colinear, then the mRNA should bind along the entire length of its template DNA. Much to their surprise, researchers have found that much of human template DNA does not bind to its mRNA. The segments of a gene that do not bind to mRN A and therefore do not code for protein are called intervening sequences, or introns. The segments of a gene that do bind to mRN A and therefore do code for protein are called exons because they are expressed. By comparing the mRNA molecule present in the nuclei with that in the cytoplasm, it can be shown that both exons an introns are present in the primary mRNA transcript, but only exons are present in the mature mRNA transcript that leaves the nuclei and enters the cytoplasm. The introns are removed from the primary mRNA transcript by a process called RNA processing or RNA splicing.
Since the discovery of split (interrupted) genes in eukaryotes, 2 essential questions have been asked: How is processing carried out? and What is the function of introns in the first place? It was discovered that splicing of RNA may be done by spliceosomes, a complex that contains several kinds of ribonucleoproteins. A spliceosome cuts the primary mRNA and then rejoins the adjacent exons.
There has been much speculation about the possible role of introns in the eukaryotic genome. It’s possible that introns allow crossing-over within a gene during meiosis. It’s also possible that introns divide a gene up into regions that can be joined in different combinations to give novel genes and protein products, a process that perhaps facilitates evolution.
Some researchers are trying to determine whether introns exist in more primitive eukaryotes and in prokaryotes. They have found that the more primitive the eukaryote, the less likely a gene is to be interrupted by introns and the introns that do exist are shorter. At first it was thought that introns do not exist at all in prokaryotes, but an intron has been discovered in the gene for a tRNA molecule in Anabaena, a cyanobacterium. This particular intron belongs to a class of introns called “self-splicing.” Self-splicing introns, which have the enzymatic capability of splicing themselves out of an RNA transcript, were discovered in the early 1980s. The finding of these so-called ribozymes did away with the belief that only proteins can function as enzymes. Ribozymes, however, are restricted in their function since each one cleaves only RNA at specific locations.
This discovery of ribozymes in prokaryotes is being used to substantiate the belief that RNA could have been the first genetic material and the first enzyme in the history of life. For many years, scientists have puzzled over which came first-DNA, which is the genetic material, or proteins, which are enzymes. Now it appears that this is an unnecessary dilemma. Possibly RNA could have fulfilled both functions in the first cell or cells. RNA molecules (mRNA, rRNA, and tRNA) are transcribed off of DNA templates. mRNA carries a copy of the genetic informatioeeded for protein synthesis. Particularly in eukaryotes, the primary mRNA transcript is processed before it becomes a mature mRNA transcript.
Translation.Translation is the second step by which gene expression leads to protein synthesis. During translation, the sequence of codons in mRNA directs the sequence of amino acids in a protein. Two other types of RNA are needed for protein synthesis. rRNA is contained in the ribosomes, where the codons of mRNA are read, and tRNA carries amino acids to the ribosomes so that protein synthesis сan occur.
The process of translation must be extremely orderly so that the amino acids of a polypeptide are sequenced correctly. Protein synthesis involves 3 steps: initiation, elongation, and termination.
1. Initiation of translation: A small ribosomal subunit attaches to the mRNA in the vicinity of the start codon (AUG). The first or initiator tRNA pairs with this codon. Then a large ribosomal subunit joins to the small subunit, and translation begins.
2. Chain elongation: Each ribosome contains 2 sites, the P (for polypeptide) site and the A (for amino acid) site.During elongation, a tRNA with attached polypeptide chain is at the P site and an tRNA-amino acid complex is just arriving at the A site. The polypeptide chain is transferred and attached by a peptide bond to the newly arrived amino acid. An enzyme (peptidyl transferase), which is a part of the larger ribosomal subunit, and energy are needed to bring about this transfer. Now the tRNA molecule at the P site leaves.
Then translocation occurs: the mRNA along with the peptide-bearing tRNA moves from the A site to the empty P site. This makes it seem as if the ribosome has moved forward 3 nucleotides, especially since there is a new codoow located at the empty A site. The complete cycle-pairing of new tRNA-amino acid complex, transfer of peptide chain, translocation-is repeated at a rapid rate (about 15 times each second in E. coli).
3. Chain termination: Termination of polypeptide chain synthesis occurs at a stop codon, codons that do not code for an amino acid. The polypeptide chain is enzymatically cleaved from the last tRNA, and it leaves the ribosome, which dissociates into its 2 subunits.
During translation, the codons of mRNA base pair with the anticodons of tRNA molecules carrying specific amino acids. The order of the codons determines the order of the tRNA molecules and the sequence of amino acids in a polypeptide.
Gene Expression in Review
1. DNA contains genetic information. The sequence of its bases determines the sequence of amino acids in a protein.
2. During transcription, one strand of DNA serves as a template for the formation of messenger RNA (mRNA). The bases in mRNA are complementary to those in DNA; every 3 bases is a codon that codes for an amino acid.
3. Messenger RNA is processed before it leaves the nucleus, during which time the introns are removed.
4. Messenger RNA carries a sequence of codons to the ribosomes, which are composed of ribosomal RNA (rRNA) and proteins.
5. Transfer RNA (tRNA) molecules, each of which is bonded to a particular amino acid, have anticodons that pair complementarily to the codons in mRNA.
6. During translation, tRNA molecules and their attached amino acids arrive at the ribosomes and the linear sequence of codons of the mRNA determines the order in which the amino acids become incorporated into a protein.