MOLECULAR BASIS OF HEREDITY. REALIZATION OF HEREDITARY INFORMATION.
Molecular biology is the study of biology at a molecular level. Molecular biology concerns itself with understanding the interactions between the various systems of a cell, including the interactions between DNA, RNA and protein biosynthesis and learning how these interactions are regulated.
Nucleus contains genetic materials encoded in DNA of chromosomes. Only nucleus directs protein synthesis in the cytoplasm via ribosomal RNA (rRNA), messenger RNA (mRNA) and transport RNA (tRNA), which are synthesized in the nucleus.
Organelles (mitochondria, chloroplasts) have their own chromosomes (DNA)
The nucleic acids are polymers of smaller units called nucleotides. There are 2 types of nucleic acids: DNA (deoxyribonucleic acid) and RNA (ribonucleic acid).
Structure of nucleotide:
1) five-carbon sugar (deoxyribose in DNA and ribose in RNA);
2) a phosphate group (PO4);
3) one of five types nitrogen-containing compounds called nitrogenous bases.
The nitrogenous bases are:
1) Purines, which are larger – Adenine (A), Guanine (G);
2) Pyramidines, which are smaller – Thymine (T), Cytosine (C), Uracil (U).
DNA is a long, double-stranded, linear molecule composed of multiple nucleotide sequences. The DNA double helix consists of two complementary DNA strands held together by hydrogen bonds between the base pairs A-T and G-C.
Chargaff’s rules said that A = T and G = C. The model shows that A is hydrogen bonded to T and G is hydrogen bonded to C. This so-called complementary base pairing means that a purine is always bonded to a pyrimidine. Only in this way will the molecule have the width (2 nm) dictated by its X-ray diffraction pattern, since 2 pyrimidines together are too narrow and 2 purines together are too wide.
In the formation of a nucleic acid chain the phosphate group of the nucleotide binds to the hydroxyl group of another, forming what is called a phosphodiester bond, which is very strong.
The double helix of DNA was discovered in 1953 by Crick F. and Watson J. Nobel prize in 1962. Watson and Crick model shows that DNA is a double helix with sugar-phosphate backbones on the outside and paired bases on the inside. This arrangement first the mathematical measurements provided by the X-ray diffraction data for the spacing between the base pairs (0.34 nm) and for a complete turn of the double helix (3.4 nm).
Organization of DNA in cells. Classical geneticists who made chromosome maps would have predicted that the genome (all the genes of an organism) consists of DNA that directs the synthesis of cytoplasmic proteins. It appears, however, that at most 70% of eukaryotic DNA seems to function in this way. The rest either has no function or has a function we haven’t been able to determine yet.
The main DNA functions:
1. DNA stores hereditary information in the order of its bases. The order of the bases specifies the order of amino acids in polypeptides.
2. DNA-replication – maintaining genetic information. DNA-replication occurs in three stages: 1) enzymes unwind the two chains of the double helix from each other; 2) bases float in and pair with their complements to form a new chain; 3) an enzyme called DNA-polymerase joints the bases in the new chain together – sugar to phosphate to sugar.
3. The transmission of hereditary information during the transcription. It is the RNA-polymerase – catalyzed assembly of an mRNA molecule on a DNA template (or complementary to a strand of DNA).
DNA Replication. The Watson and Crick model suggests that DNA can be replicated by means of complementary base pairing. During replication, each old DNA strand of the parent molecule serves as a template for a new strand in a daughter molecule. A template is most often a mold used to produce a shape complementary to itself.
Replication of DNA requires the following steps:
1. Unwinding. The old strands that make up the parent DNA molecule are unwound and “unzipped” (i.e., the weak hydrogen bonds between the paired bases are broken). There is a special enzyme called helicase that unwinds the molecule.
2. Complementary base pairing. New complementary nucleotides, always present in the nucleus, are positioned by the process of complementary base pairing.
3. Joining. The complementary nucleotides become joined together to form new strands. Each daughter DNA molecule contains an old strand and a new strand. Steps 2 and 3 are carried out by the enzyme DNA polymerase.
DNA replication is termed semiconservative replication because one of the old strands is conserved, or present, in each daughter double helix. Semiconservative replication was experimentally confirmed by Matthew Meselson and Franklin Stahl in 1958.
Accuracy of Replication.
The mismatched nucleotide causes a pause in replication, and during this time, the mismatched nucleotide is excised from the daughter strand. The errors that slip through nucleotide selection and proofreading cause a gene mutation to occur. Actually it is of benefit for mutations to occur occasionally because variation is the raw material for the evolutionary process.
DNA damage, due to environmental factors and normal metabolic processes inside the cell, occurs at a rate of 1,000 to 1,000,000 molecular lesions per cell per day. While this constitutes only 0.000165% of the human genome’s approximately 6 billion bases (3 billion base pairs), unrepaired lesions in critical genes (such as tumor suppressor genes) can impede a cell’s ability to carry out its function and appreciably increase the likelihood of tumor formation.
The vast majority of DNA damage affects the primary structure of the double helix; that is, the bases themselves are chemically modified. These modifications can in turn disrupt the molecules’ regular helical structure by introducing non-native chemical bonds or bulky adducts that do not fit in the standard double helix. Unlike proteins and RNA, DNA usually lacks tertiary structure and therefore damage or disturbance does not occur at that level. DNA is, however, supercoiled and wound around “packaging” proteins called histones (in eukaryotes), and both superstructures are vulnerable to the effects of DNA damage.
DNA damage can be subdivided into two main types:
1) endogenous damage such as attack by reactive oxygen species produced from normal metabolic byproducts (spontaneous mutation), especially the process of oxidative deamination also includes replication errors;
2) exogenous damage caused by external agents such asultraviolet [UV 200-400 nm] radiation from the sun, other radiation frequencies, including x-rays and gamma rays, hydrolysis or thermal disruption, certain plant toxins, human-made mutagenic chemicals, especially aromatic compounds that act as DNA intercalating agents, viruses.
The replication of damaged DNA before cell division can lead to the incorporation of wrong bases opposite damaged ones. Daughter cells that inherit these wrong bases carry mutations from which the original DNA sequence is unrecoverable (except in the rare case of a back mutation, for example, through gene conversion).
Types of damage
There are five main types of damage to DNA due to endogenous cellular processes:
· oxidation of bases [e.g. 8-oxo-7,8-dihydroguanine (8-oxoG)] and generation of DNA strand interruptions from reactive oxygen species,
· alkylation of bases (usually methylation), such as formation of 7-methylguanine, 1-methyladenine, 6-O-Methylguanine
· hydrolysis of bases, such as deamination, depurination, and depyrimidination.
· “bulky adduct formation” (i.e., benzo[a]pyrene diol epoxide-dG adduct, aristolactam I-dA adduct)
· mismatch of bases, due to errors in DNA replication, in which the wrong DNA base is stitched into place in a newly forming DNA strand, or a DNA base is skipped over or mistakenly inserted.
Damage caused by exogenous agents comes in many forms. Some examples are:
· UV-B light causes crosslinking between adjacent cytosine and thymine bases creating pyrimidine dimers. This is called direct DNA damage.
· UV-A light creates mostly free radicals. The damage caused by free radicals is called indirect DNA damage.
· Ionizing radiation such as that created by radioactive decay or in cosmic rays causes breaks in DNA strands. Low-level ionizing radiation may induce irreparable DNA damage (leading to replicational and transcriptional errors needed for neoplasia or may trigger viral interactions) leading to pre-mature aging and cancer.
· Thermal disruption at elevated temperature increases the rate of depurination (loss of purine bases from the DNA backbone) and single-strand breaks. For example, hydrolytic depurination is seen in the thermophilic bacteria, which grow in hot springs at 40-80 °C.[9][10] The rate of depurination (300 purine residues per genome per generation) is too high in these species to be repaired by normal repair machinery, hence a possibility of an adaptive response cannot be ruled out.
· Industrial chemicals such as vinyl chloride and hydrogen peroxide, and environmental chemicals such as polycyclic aromatic hydrocarbons found in smoke, soot and tar create a huge diversity of DNA adducts- ethenobases, oxidized bases, alkylated phosphotriesters and Crosslinking of DNA just to name a few.
· UV damage, alkylation/methylation, X-ray damage and oxidative damage are examples of induced damage. Spontaneous damage can include the loss of a base, deamination, sugar ring puckering and tautomeric shift.
Point Mutations.
A point mutation, or single base substitution, is a type of mutation that causes the replacement of a single base nucleotide with another nucleotide of the genetic material, DNA or RNA.
One can categorize point mutations as follows:
1) transitions: replacement of a purine base with another purine or replacement of a pyrimidine with another pyrimidine;
2) transversions: replacement of a purine with a pyrimidine or vice versa.
Transition mutations are about an order of magnitude more common than transversions. Point mutations can also be categorized functionally:
• nonsense mutations: code for a stop, which can truncate the protein
• missense mutations: code for a different amino acid
• silent mutations: code for the same or a different amino acid but without any functional change in the protein
For example, sickle-cell disease is caused by a single point mutation (a missense mutation) in the beta-hemoglobin gene that converts a GAG codon into GTG, which encodes the amino acid valine rather than glutamic acid.
Sickle-cell disease (SCD), or sickle-cell anaemia (SCA) or drepanocytosis, is a hereditary blood disorder, characterized by red blood cells that assume an abnormal, rigid, sickle shape. Sickling decreases the cells’ flexibility and results in a risk of various complications. The sickling occurs because of a mutation in the haemoglobin gene. Individuals with one copy of the defunct gene display both normal and abnormal haemoglobin. This is an example of codominance.
Life expectancy is shortened. In 1994, in the US, the average life expectancy of persons with this condition was estimated to be 42 years in males and 48 years in females, but today, thanks to better management of the disease, patients can live into their 70s or beyond.
Sickle-cell disease occurs more commonly among people whose ancestors lived in tropical and sub-tropical sub-saharan regions where malaria is or was common. Where malaria is common, carrying a single sickle-cell gene (sickle cell trait) confers a fitness. Specifically, humans with one of the two alleles of sickle-cell disease show less severe symptoms when infected with malaria.
Sickle-cell anaemia is a form of sickle-cell disease in which there is homozygosity for the mutation that causes HbS. Sickle-cell anaemia is also referred to as “HbSS”, “SS disease”, “haemoglobin S” or permutations of those names. In heterozygous people, that is, those who have only one sickle gene and one normal adult haemoglobin gene, the condition is referred to as “HbAS” or “sickle cell trait”. Other, rarer forms of sickle-cell disease are compound heterozygous states in which the person has only one copy of the mutation that causes HbS and one copy of another abnormal haemoglobin allele. They include sickle-haemoglobin C disease (HbSC), sickle beta-plus-thalassaemia (HbS/β+) and sickle beta-zero-thalassaemia (HbS/β0). The term disease is applied because the inherited abnormality causes a pathological condition that can lead to death and severe complications. Not all inherited variants of haemoglobin are detrimental, a concept known as genetic polymorphism.
Sickle-cell anaemia is caused by a point mutation in the β-globin chain of haemoglobin, causing the hydrophilic amino acid glutamic acid to be replaced with the hydrophobic amino acid valine at the sixth position. The β-globin gene is found on chromosome 11. The association of two wild-type α-globin subunits with two mutant β-globin subunits forms haemoglobin S (HbS). Under low-oxygen conditions (being at high altitude, for example), the absence of a polar amino acid at position six of the β-globin chain promotes the non-covalent polymerisation (aggregation) of haemoglobin, which distorts red blood cells into a sickle shape and decreases their elasticity.
The loss of red blood cell elasticity is central to the pathophysiology of sickle-cell disease. Normal red blood cells are quite elastic, which allows the cells to deform to pass through capillaries. In sickle-cell disease, low-oxygen tension promotes red blood cell sickling and repeated episodes of sickling damage the cell membrane and decrease the cell’s elasticity. These cells fail to return to normal shape wheormal oxygen tension is restored. As a consequence, these rigid blood cells are unable to deform as they pass through narrow capillaries, leading to vessel occlusion and ischaemia.
The actual anaemia of the illness is caused by haemolysis, the destruction of the red cells, because of their misshape. Although the bone marrow attempts to compensate by creating new red cells, it does not match the rate of destruction. Healthy red blood cells typically live 90–120 days, but sickle cells only survive 10–20 days. Normally, humans have Haemoglobin A, which consists of two alpha and two beta chains, Haemoglobin A2, which consists of two alpha and two delta chains and Haemoglobin F, consisting of two alpha and two gamma chains in their bodies. Of these, Haemoglobin A makes up around 96-97% of the normal haemoglobin in humans.
Sickle-cell gene mutation probably arose spontaneously in different geographic areas, as suggested by restriction endonuclease analysis. These variants are known as Cameroon, Senegal, Benin, Bantu and Saudi-Asian. Their clinical importance springs from the fact that some of them are associated with higher HbF levels, e.g., Senegal and Saudi-Asian variants, and tend to have milder disease.
In people heterozygous for HgbS (carriers of sickling haemoglobin), the polymerisation problems are minor, because the normal allele is able to produce over 50% of the haemoglobin. In people homozygous for HgbS, the presence of long-chain polymers of HbS distort the shape of the red blood cell from a smooth doughnut-like shape to ragged and full of spikes, making it fragile and susceptible to breaking within capillaries. Carriers have symptoms only if they are deprived of oxygen (for example, while climbing a mountain) or while severely dehydrated. The sickle-cell disease occurs when the sixth amino acid, glutamic acid, is replaced by valine to change its structure and function; as such, sickle cell anemia is also known as E6V. Valine is hydrophobic, causing the haemoglobin to collapse in on itself occasionally. The structure is not changed otherwise. When enough haemoglobin collapses in on itself the red blood cells become sickle-shaped.
Point mutations may arise from spontaneous mutations that occur during DNA replication. The rate of mutation may be increased by mutagens. Mutagens can be physical, such as radiation from UV rays, X-rays or extreme heat, or chemical (molecules that misplace base pairs or disrupt the helical shape of DNA). Mutagens associated with cancers are often studied to learn about cancer and its prevention.
DNA repair is a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encode its genome. In human cells, both normal metabolic activities and environmental factors such as UV light and radiation can cause DNA damage, resulting in as many as 1 million individual molecular lesions per cell per day. Many of these lesions cause structural damage to the DNA molecule and can alter or eliminate the cell’s ability to transcribe the gene that the affected DNA encodes. Other lesions induce potentially harmful mutations in the cell’s genome, which affect the survival of its daughter cells after it undergoes mitosis. As a consequence, the DNA repair process is constantly active as it responds to damage in the DNA structure. Wheormal repair processes fail, and when cellular apoptosis does not occur, irreparable DNA damage may occur, including double-strand breaks and DNA crosslinkages (interstrand crosslinks or ICLs).
The rate of DNA repair is dependent on many factors, including the cell type, the age of the cell, and the extracellular environment. A cell that has accumulated a large amount of DNA damage, or one that no longer effectively repairs damage incurred to its DNA, can enter one of three possible states:
· an irreversible state of dormancy, known as senescence
· cell suicide, also known as apoptosis or programmed cell death
· unregulated cell division, which can lead to the formation of a tumor that is cancerous
The DNA repair ability of a cell is vital to the integrity of its genome and thus to its normal functioning and that of the organism. Many genes that were initially shown to influence life span have turned out to be involved in DNA damage repair and protection.Failure to correct molecular lesions in cells that form gametes can introduce mutations into the genomes of the offspring and thus influence the rate of evolution.
Defects in DNA repair or replication
All are associated with a high frequency of chromosome and gene (base pair) mutations; most are also associated with a predisposition to cancer, particularly leukemias.
Xeroderma pigmentosum caused by mutations in genes involved iucleotide excision repair associated with a >1000-fold increase of sunlight-induced skin cancer and with other types of cancer such as melanoma.
Ataxia telangiectasia caused by gene that detects DNA damage increased risk of X-ray associated with increased breast cancer in carriers
Fanconi anemia caused by a gene involved in DNA repair increased risk of X-ray and sensitivity to sunlight
Bloom syndrome caused by mutations in a DNA helicase gene increased risk of X-ray sensitivity to sunlight
Cockayne syndrome caused by a defect in transcription-linked DNA repair sensitivity to sunlight
Werner’s syndrome caused by mutations in a DNA helicase gene premature aging.
RNA – a polymer of ribose-containing nucleotides, with forms single-strand. RNA contains Adenine, Guanine, Cytosine, Uracil.
There are three types of RNA:
1) messenger RNA (mRNA)
2) ribosomal RNA (rRNA)
3) transfer RNA (tRNA).
RNA is synthesized by transcription of DNA. Transcription is catalyzed by three RNA polymerases: I for rRNA, II for mRNA, and III for tRNA.
rRNA associates with proteins to form ribosomes, which provide structural support for protein synthesis
Messenger RNA carries the genetic code to the cytoplasm to direct protein synthesis.
1. This single-stranded molecule consists of hundreds to thousands of nucleotides.
2. mRNA contains codons that are complementary to the DNA codons from which it was transcribed, including one codon (AUG) for initiating protein synthesis and one of three codons (UAA, UAG, or UGA) for terminating protein synthesis.
Transfer RNA is folded into a cloverleaf shape and contains about 80 nucleotides, terminating in adenylic acid (where amino acids attach).
1. Each tRNA combines with a specific amino acid that has been activated by an enzyme.
2. One end of the tRNA molecule possesses an anticodon, a triplet of nucleotides that recognizes the complementary codon in mRNA. If recognition occurs, the anticodon insures that the tRNA transfers its activated amino acid molecule in the proper sequence to the growing polypeptide chain.
DNA and RNA differ
· RNA is single-stranded (but it can fold back upon itself to form secondary structure, e.g. tRNA)
· In RNA, the sugar molecule is ribose rather than deoxyribose
· In RNA, the fourth base is uracil rather than thymine.
Genetic code.
The uniqueness of every cell, individual or species lies in the uniqueness of its proteins. Cells are enabled to synthesize their specific proteins by the information flowing from the DNA. This information exists as the particular sequences of bases in the DNA strands and is called genetic code. It is sent to the protein-manufacturing machinery in the form of mRNA synthesized on DNA template. The order of bases in the mRNA decides the order of amino acids in the polypeptide to be synthesized.
Nature of Genetic Code. It has been found that a sequence of three consecutive bases in a DNA molecule codes for one specific amino acid. Thus, the genetic code is a triplet code. That a sequence of three nucleotides codes for one amino acid was first suggested by George Gamow in 1954. Crick in 1961 concluded that three consecutive nucleotides in mRNA strand determine the position of a single amino acid in a polypeptide chain. Nirenberg and Mathaei soon provided experimental evidence to show that the genetic code is a triplet one.
Nirenberg and Mathaei have determined which sequence of bases coded for which amino acid with the help of experiments. It was discovered that codes are in terms of messenger RNA and not of DNA. The reason for this practice is that the cell reads the code from messenger RNA molecule, and not directly from DNA of chromosomes. The mRNA is read from the 5′ end towards the 3′ end.
Characteristics of Genetic Code. The genetic code of DNA has certain cell established fundamental characteristics. These are given be low:
1. Triplet Nature. The genetic code is a triplet code. Three adjacent bases, termed a codon, specify one amino acid.
2. No Overlapping. The adjacent codons do not overlap.
3. No Punctuation. The genetic code has no «punctuation marks» (gaps) between the coding triplets.
4. Universality. The genetic code is universal, i.e. a given codon in the DNA and mRNA specifies the same amino acid in the protein-synthesizing stems of all organisms, from bacteria to man, also in viruses.
5. Degeneracy. The genetic code is degenerate, i.e. it lacks specificity, and one amino acid often has more than one code triplet.
6. Terminator Codons or «Nonsense» Codons. Three (3) of the 64 codons, namely, UAA, UAG and UGA, do not specify am amino acid, but signal the end of a message. They are called the nonsense or terminator codons. Either of these stops synthesis of the polypeptide chain.
7. Initiation or Start Codons. The codons AUG and CUG are called the initiation or start codons as they begin the synthesis of polypeptide chain.
8. Colinearity. DNA is a linear polynucleotide chain and a protein is a linear polypeptide chain. The sequence of amino acids in a polipeptide chain corresponds to the sequence of nucleotide bases in the gene (DNA) that codes for it. Change in a specific codon in DNA produces a change of amino acid in the corresponding position in the polypeptide. The gene and the polypeptide it codes for are said to be colinear.
9. Gene-Polypeptide parity. A specific gene transcribes a specific mRNA, which produces a specific polypeptide. On this basis, a cell can have only as many types of polypeptides as it has types of genes.
Gene Expression. The process by which a gene produces a product, usually a protein, is called gene expression. DNA not only serves as a template for its own replication, it is also a template for RNA formation. Most often it is mRNA that is produced. The process by which a mRNA copy is made of a portion of DN A is called transcription. Follow¬ing transcription, mRNA will have a sequence of bases that is complementary to that of DNA. Then, mRNA moves into the cytoplasm. Photographic data shows radioactively labeled RNA moving from the nucleus to the cytoplasm, where protein synthesis occurs.
The central dogma of molecular biology also says that mRNA directs the synthesis of a polypeptide. During translation, the sequence of bases in mRNA dictates the sequence of amino acids in a protein.Gene expression requires both transcription and transla¬tion. These terms are apt. Transcribing a document means making a copy of it, and translating a document means putting it in a different language.
Gene expression includes the processes of transcription and translation. During transcription, DNA serves as a template for the formation of complementary RNA. During translation, the sequence of bases in RNA determines the sequence of amino acids in a protein.
Transcription. Transcription is the first step required for gene expression, the process by which a gene product is made. Most often this gene product is a protein, but we should note that the molecules tRNA and rRNA are also transcribed off DNA templates. These molecules are also gene products. Just now we are interested in the formation of mRNA, which carries genetic information to the ribosomes, where protein synthesis occurs.
Messenger RNA. During transcription, a mRNA molecule is formed that has a sequence of bases complementary to a portion of one DNA strand; wherever A, T, G, or С is present in the DNA template, U, A, C, or G is incorporated into the mRNA molecule. A segment of the DNA helix unwinds and unzips, and complementary RNA nucleotides pair with DNA nucleotides of the strand that is to be transcribed. When these RNA nucleotides are joined together by an RNA polymerase, an mRNA molecule results. This molecule now carries a sequence of codons that will be used to order the sequence of amino acids in a protein. Transcription begins at a region of DNA called a promoter. A promoter is a special sequence of DNA bases where RNA polymerase attaches and the transcribing process begins. A promoter is at the start end of the gene to be transcribed. Some genes are on one of the DNA strands, and some are on the other strand.
Elongation of the mRNA molecule occurs as long as transcription proceeds. Only the newest portion of a RNA molecule is bound to the DNA, and the rest dangles off to the side. Finally, RNA polymerase comes to a terminator sequence at the other end of the gene being transcribed. The terminator causes RNA polymerase to stop transcribing the DNA and to release the mRNA molecule, now called a RNA transcript. Many RNA polymerase molecules can be working to produce a RNA transcript at the same time. This allows the cell to produce many thousands of copies of the same mRNA molecule and eventually many copies of the same protein within a shorter period of time than otherwise.
RNA Processing. Since the advent of modern molecular techniques, investigators can compare the structure of various eukaryotic RNA transcripts and their corresponding genes. They do this by first isolating the mRNA and its corresponding gene. Then, they separate the DNA molecule into single strands and allow the mRNA to bind to its complementary strand. If the 2 molecules are indeed colinear, then the mRNA should bind along the entire length of its template DNA. Much to their surprise, researchers have found that much of human template DNA does not bind to its mRNA.
The segments of a gene that do not bind to mRNA and therefore do not code for protein are called intervening sequences, or introns. The segments of a gene that do bind to mRNA and therefore do code for protein are called exons because they are expressed. By comparing the mRNA molecule present in the nuclei with that in the cytoplasm, it can be shown that both exons and introns are present in the primary mRNA transcript, but only exons are present in the mature mRNA transcript that leaves the nuclei and enters the cytoplasm. The introns are removed from the primary mRNA transcript by a process called RNA processing or RNA splicing.
Since the discovery of split (interrupted) genes in eukaryotes, 2 essential questions have been asked: How is processing carried out? What is the function of introns in the first place? It was discovered that splicing of RNA may be done by spliceosomes, a complex that contains several kinds of ribonucleoproteins. A spliceosome cuts the primary mRNA and then rejoins the adjacent exons.
There has been much speculation about the possible role of introns in the eukaryotic genome. It’s possible that introns allow crossing-over within a gene during meiosis. It’s also possible that introns divide a gene up into regions that can be joined in different combinations to give novel genes and protein products, a process that perhaps facilitates evolution. Some researchers are trying to determine whether introns exist in more primitive eukaryotes and in prokaryotes. They have found that the more primitive the eukaryote, the less likely a gene is to be interrupted by introns and the introns that do exist are shorter. At first it was thought that introns do not exist at all in prokaryotes, but an intron has been discovered in the gene for a tRNA molecule in Anabaena, a cyanobacterium. This particular intron belongs to a class of introns called “self-splicing.” Self-splicing introns, which have the enzymatic capability of splicing themselves out of an RNA transcript, were discovered in the early 1980s. The finding of these so-called ribozymes did away with the belief that only proteins can function as enzymes. Ribozymes, however, are restricted in their function since each one cleaves only RNA at specific locations.
This discovery of ribozymes in prokaryotes is being used to substantiate the belief that RNA could have been the first genetic material and the first enzyme in the history of life. For many years, scientists have puzzled over which came first—DNA, which is the genetic material, or proteins, which are enzymes. Now it appears that this is an unnecessary dilemma. Possibly RNA could have fulfilled both functions in the first cell or cells. RNA molecules (mRNA, rRNA, and tRNA) are transcribed off of DNA templates. mRNA carries a copy of the genetic informatioeeded for protein synthesis. Particularly in eukaryotes, the primary mRNA transcript is processed before it becomes a mature mRNA transcript.
Translation. Translation is the second step by which gene expression leads to protein synthesis. During translation, the sequence of codons in mRNA directs the sequence of amino acids in a protein. Two other types of RNA are needed for protein synthesis. rRNA is contained in the ribosomes, where the codons of mRNA are read, and tRNA carries amino acids to the ribosomes so that protein synthesis сan occur.
The process of translation must be extremely orderly so that the amino acids of a polypeptide are sequenced correctly. Protein synthesis involves 3 steps: initiation, elongation, and termination.
1. Initiation of translation: A small ribosomal subunit attaches to the mRNA in the vicinity of the start codon (AUG). The first or initiator tRNA pairs with this codon. Then a large ribosomal subunit joins to the small subunit, and translation begins.
2. Chain elongation: Each ribosome contains 2 sites, the P (for polypeptide) site and the A (for amino acid) site. During elongation, a tRNA with attached polypeptide chain is at the P site and an tRNA-amino acid complex is just arriving at the A site. The polypeptide chain is transferred and attached by a peptide bond to the newly arrived amino acid. An enzyme (peptidyl transferase), which is a part of the larger ribosomal subunit, and energy are needed to bring about this transfer. Now the tRNA molecule at the P site leaves.
Then translocation occurs: the mRNA along with the peptide-bearing tRNA moves from the A site to the empty P site. This makes it seem as if the ribosome has moved forward 3 nucleotides, especially since there is a new codon now located at the empty A site. The complete cycle—pairing of new tRNA-amino acid complex, transfer of peptide chain, translocation—is repeated at a rapid rate (about 15 times each second in E. coli).
3. Chain termination: Termination of polypeptide chain synthesis occurs at a stop codon, codons that do not code for an amino acid. The polypeptide chain is enzymatically cleaved from the last tRNA, and it leaves the ribosome, which dissociates into its 2 subunits.
During translation, the codons of mRNA base pair with the anticodons of tRNA molecules carrying specific amino acids. The order of the codons determines the order of the tRNA molecules and the sequence of amino acids in a polypeptide.
Gene Expression in Review
1. DNA contains genetic information. The sequence of its bases determines the sequence of amino acids in a protein.
2. During transcription, one strand of DNA serves as a template for the formation of messenger RNA (mRNA). The bases in mRNA are complementary to those in DNA; every 3 bases is a codon that codes for an amino acid.
3. Messenger RNA is processed before it leaves the nucleus, during which time the introns are removed.
4. Messenger RNA carries a sequence of codons to the ribosomes, which are composed of ribosomal RNA (rRNA) and proteins.
5. Transfer RNA (tRNA) molecules, each of which is bonded to a particular amino acid, have anticodons that pair complementarily to the codons in mRNA.
6. During translation, tRNA molecules and their attached amino acids arrive at the ribosomes and the linear sequence of codons of the mRNA determines the order in which the amino acids become incorporated into a protein.
Control of gene expression in bacteria. Many prokaryotic genes are organized into operons, or groups of genes whose products have related functions and which are transcribed as a unit. The Jacob-Monod model of gene induction. Investigating enzyme synthesis in E. coli, the French biochemists Francois Jacob and Jacques Monod formulated a powerful model of gene regulation in bacterial cells. They worked mainly with the enzyme β-galactosidase, which catalyzes the breakdown of lactose to glucose and galactose, substances both used and produced by other pathways. Lactose is not continuously available to E. coli, and so — as would be expected — the gene for β-galactosidase is normally transcribed at a very low rate; in the absence of lactose there are only about ten β -galactosidase molecules per cell. Jacob and Monod found that the further production of this digestive enzyme is triggered by the presence of a so-called inducer, in this instance allolactose, an isomer of lactose automatically produced in the cell when lactose is present. Normally, then, β -galactosidase is an inducible enzyme. But they also found a mutant strain of E. coli in which the same enzyme is a constitutive enzyme — that is, an enzyme whose production is continuous, apparently uninfluenced by control substances such as inducers. By means of recombination experiments, Jacob and Monod were eventually able to demonstrate the participation of four genes in the production of β -galactosidase and the two other enzymes involved in lactose breakdown: three so-called structural genes, each specifying the amino acid sequence of one of the three enzymes, and a regulator gene, which controls the activity of the structural genes. They proposed that the regulator gene, which is located at some distance from the structural genes, normally directs the synthesis of a repressor protein that inhibits transcription of the structural genes. The allele of the regulator gene present in the mutant constitutive strain, they concluded, lacks the ability to direct synthesis of an effective repressor; hence it cannot prevent transcription of the structural genes, which are thus left free to direct continuous protein synthesis. Jacob and Monod also discovered that a special region of DNA contiguous to the structural gene for β-galactosidase determines whether transcription of the structural genes will be initiated; they called this special region the operator, and they called the combination of the operator and its three associated structural genes an operon. Subsequently it was found that the operator, which does not in itself constitute a gene since it doesn’t code for a specific product, is located between the two important sequences of the promoter, the region to which RNA polymerase binds. Hence, when the repressor binds to the operator, RNA polymerase cannot physically bind to the promoter, and transcription is blocked. The operon consists of a promoter/operator region and three structural genes (Z, Y, and A). For simplicity the structural genes, which are much longer than the promoter/operator region, are shown greatly shortened. Also, the boundaries between the operator and the promoter sequences are drawn to appear quite sharp, though actually the operator sequence overlaps the end of the first promoter sequence and the beginning of the second. The regulator gene codes for mRNA, which is translated on the ribosomes and determines synthesis of repressor protein. When the repressor protein binds to the operator, it blocks the promoter’s binding sites for RNA polymerase and thus prevents transcription of the structural genes. If inducer is present, it will bind to the repressor, thus causing a conformational change in the repressor that forces it to dissociate from the operator; in short, the inducer inactivates the repressor. Now free to bind to the promoter, RNA polymerase can initiate transcription of the structural genes and the production of mRNA. Binding of inducer to the repressor inactivates the repressor, and the RNA polymerase can then bind to the promoter regions and initiate transcription of the structural genes. These, transcribed as a unit, determine production of polycistronic mRNA — that is, mRNA coding for more than one gene product. The mRNA then complexes with ribosomes in the cytoplasm and is translated into three enzymes. Enzyme I is β-galactosidase; enzyme II is a permease that helps transport lactose into the cell; and enzyme III is a transacetylase, whose role in lactose utilization is not understood.
The mRNA carries the instructions of all three structural genes, and is therefore said to be potycistronic. This messenger mRNA complexes with ribosomes in the cytoplasm, where its information is translated and the three enzymes necessary for lactose metabolism are synthesized. The number of β-galactosidase enzymes rises to about 5,000 per cell when the operon is not repressed.
According to the Jacob-Monod model, then, the condition of the operator region is the key to whether or not there will be activation of the so-called lac operon—the operon responsible for the synthesis of enzymes involved in the breakdown of lactose. If repressor protein is bound to the operator, there will be no transcription. If no repressor is bound to the operator (because the repressor has been inactivated by inducer), transcription can proceed freely. Notice that the three jointly controlled structural genes of the lac operon specify enzymes with closely related functions. It is characteristic for the structural genes of an operon to determine the enzymes of a single biochemical pathway; thus the whole pathway can be regulated as a unit. The adaptive advantage of such coordinated control is obvious.
Proteins are large biological molecules consisting of one or more chains of amino acids. Proteins perform a vast array of functions within living organisms, including catalyzing metabolic reactions, replicating DNA, responding to stimuli, and transporting molecules from one location to another. Proteins differ from one another primarily in their sequence of amino acids, which is dictated by the nucleotide sequence of their genes, and which usually results in folding of the protein into a specific three-dimensional structure that determines its activity.
A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of adjacent amino acid residues. The sequence of amino acids in a protein is defined by the sequence of a gene, which is encoded in the genetic code. In general, the genetic code specifies 20 standard amino acids; however, in certain organisms the genetic code can include selenocysteine and—in certain archaea—pyrrolysine. Shortly after or even during synthesis, the residues in a protein are often chemically modified by posttranslational modification, which alters the physical and chemical properties, folding, stability, activity, and ultimately, the function of the proteins. Sometimes proteins have non-peptide groups attached, which can be called prosthetic groups or cofactors. Proteins can also work together to achieve a particular function, and they often associate to form stable protein complexes.
Like other biological macromolecules such as polysaccharides and nucleic acids, proteins are essential parts of organisms and participate in virtually every process within cells. Many proteins are enzymes that catalyze biochemical reactions and are vital to metabolism. Proteins also have structural or mechanical functions, such as actin and myosin in muscle and the proteins in the cytoskeleton, which form a system of scaffolding that maintains cell shape. Other proteins are important in cell signaling, immune responses, cell adhesion, and the cell cycle. Proteins are also necessary in animals’ diets, since animals cannot synthesize all the amino acids they need and must obtain essential amino acids from food. Through the process of digestion, animals break down ingested protein into free amino acids that are then used in metabolism.
Biochemists often refer to four distinct aspects of a protein’s structure:
· Primary structure: the amino acid sequence. A protein is a polyamide.
· Secondary structure: regularly repeating local structures stabilized by hydrogen bonds. The most common examples are the alpha helix, beta sheet and turns. Because secondary structures are local, many regions of different secondary structure can be present in the same protein molecule.
· Tertiary structure: the overall shape of a single protein molecule; the spatial relationship of the secondary structures to one another. Tertiary structure is generally stabilized by nonlocal interactions, most commonly the formation of a hydrophobic core, but also through salt bridges, hydrogen bonds, disulfide bonds, and even posttranslational modifications. The term “tertiary structure” is often used as synonymous with the term fold. The tertiary structure is what controls the basic function of the protein.
· Quaternary structure: the structure formed by several protein molecules (polypeptide chains), usually called protein subunits in this context, which function as a single protein complex.
Functions of Proteins in the organism:
· All enzymes are proteins.
· Storing amino acids as nutrients and as building blocks for the growing organism.
· Transport function (proteins transport fatty acids, bilirubin, ions, hormones, some drugs etc.).
· Proteins are essential elements in contractile and motile systems (actin, myosin).
· Protective or defensive function (fibrinogen, antibodies).
· Some hormones are proteins (insulin, omatotropin).
· Structural function (collagen, elastin).
Proteins may be purified from other cellular components using a variety of techniques such as ultracentrifugation, precipitation, electrophoresis, and chromatography; the advent of genetic engineering has made possible a number of methods to facilitate purification. Proteins are the chief actors within the cell, said to be carrying out the duties specified by the information encoded in genes. With the exception of certain types of RNA, most other biological molecules are relatively inert elements upon which proteins act. The enzyme hexokinase is shown as a conventional ball-and-stick molecular model. To scale in the top right-hand corner are two of its substrates, ATP and glucose.
The chief characteristic of proteins that also allows their diverse set of functions is their ability to bind other molecules specifically and tightly. The region of the protein responsible for binding another molecule is known as the binding site and is often a depression or “pocket” on the molecular surface. This binding ability is mediated by the tertiary structure of the protein, which defines the binding site pocket, and by the chemical properties of the surrounding amino acids’ side chains.
Proteins can bind to other proteins as well as to small-molecule substrates. When proteins bind specifically to other copies of the same molecule, they can oligomerize to form fibrils; this process occurs often in structural proteins that consist of globular monomers that self-associate to form rigid fibers. Protein–protein interactions also regulate enzymatic activity, control progression through the cell cycle, and allow the assembly of large protein complexes that carry out many closely related reactions with a common biological function. Proteins can also bind to, or even be integrated into, cell membranes.
Energy source: Proteins are a source of energy like carbohydrates and fats. Equal weights of carbohydrates and proteins provide the same amount of energy (in calories). When we eat protein-rich food, they are broken down into constituent amino acids by enzymes. These are then used by living cells as fuel for building new muscles, cartilages, and repairing any damaged tissue or cell. There are various plant sources (pulses, beans, soya, etc) and animal sources (meat, fish, poultry) of protein that are a part of our diet. Casein supplies the body with the nutrition required to develop bones and encasing muscle. The stored proteins act as a source of nitrogen for the growing embryo, supplying it with necessary energy.
Enzymes are usually highly specific and accelerate only one or a few chemical reactions. Enzymes carry out most of the reactions involved in metabolism, as well as manipulating DNA in processes such as DNA replication, DNA repair, and transcription. Some enzymes act on other proteins to add or remove chemical groups in a process known as posttranslational modification. About 4,000 reactions are known to be catalyzed by enzymes. The region of the enzyme that binds the substrate and contains the catalytic residues is known as the active site.
Dirigent proteins are members of a class of proteins which dictate the stereochemistry of a compound synthesized by other enzymes.