|
|
||||||||
1 BioInformatics Institute, Matrix, 30 Biopolis Street, Singapore
2 Nanyang Centre for Supercomputing and Visualization, School of Mechanical and Production Engineering, Nanyang Technological University, Singapore
3 Human Genome Laboratory, Department of Microbiology, Faculty of Medicine, National University of Singapore, Kent Ridge, Singapore 117597
Correspondence
Vincent T. K. Chow
micctk{at}nus.edu.sg
| ABSTRACT |
|---|
|
|
|---|
The overlapping gene pairs of the rickettsial species have been classified into four categories. The gene identification number, the gene name, the direction of overlap and the number of overlapping nucleotides have been tabulated for each category and are available as supplementary material in IJSEM Online.
| INTRODUCTION |
|---|
|
|
|---|
| METHODS |
|---|
|
|
|---|
|
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
In order to further investigate the retention of overlapping genes in organisms with different and similar lifestyles, we determined the proportion of genomes represented by overlapping genes in five obligatory intracellular parasites, two reduced genomes, one endosymbiont and one free-living bacterium (Table 1
). It was noteworthy that a substantial portion of the genomes is represented by overlapping genes in all the organisms, clearly suggesting an important role for overlapping gene pairs in bacterial genomes. Interestingly, Mycoplasma genitalium has the smallest genome but the largest percentage representation of overlapping gene pairs (Fig. 1
). Mycoplasma pneumoniae, the closest relative of Mycoplasma genitalium, also shows a greater proportion of overlapping gene pairs. Obligatory intracellular parasites follow this trend, thereby endorsing the fact that overlapping genes are a means of compressing the maximum amount of information into a short sequence. Mycobacterium leprae is an exceptional facultative intracellular parasite that has a higher proportion of overlapping genes compared to the free-living anaerobe Clostridium perfringens. This may be explained by the notion that Mycobacterium leprae is still undergoing downsizing and genome reduction, as it is often considered a genome in decay. This explanation is supported by the fact that Mycobacterium leprae has the maximum number of pseudogenes (>1000) compared to only 12 pseudogenes in R. prowazekii, the obligatory intracellular parasite with the most extensive genome degradation (Fig. 1
). These observations clearly implicate the role of overlapping genes and their contribution to genome reduction.
|
Overlapping gene pairs can assume one of three structures, namely, convergent (
), unidirectional (
) or divergent (
) (Rogozin et al., 2002
). A significant proportion (>90 %) of overlapping gene pairs in all the genomes were identified to be unidirectional. These results support the earlier hypothesis by Eyre-Walker (1995)
that most overlapping gene pairs have unidirectional structure (Table 1
). The frequent occurrence of the unidirectional overlapping structure probably reflects the commonest orientation of adjacent genes in the chromosomes, as prokaryotic genes are often organized into operons or clusters of genes that are transcribed together. Since all genes in an operon must be transcribed in the same direction, this organization will be reflected by a tendency for adjacent genes to have the same orientation. Fewer gene pairs have the two inverted orientations (
and 
). The lower proportion of the divergent structure may be attributed to the evolutionary constraints at the 5' end of the gene and the upstream region, which incorporate essential structures such as promoters. In addition, a frameshift mutation at the 5' end may destroy the entire gene. The unidirectional and convergent structures are more easily formed due to the loss of stop codons or a frameshift. These results concur with those of Rogozin et al. (2002)
and Fukuda et al. (2003)
, and highlight the fact that gene orientation, genome reduction and evolutionary constraints work together during the organism's adaptation in its niche.
Comparative study of overlapping genes in R. prowazekii and R. conorii
Gene pairs that occur as overlapping genes in both genomes.
R. prowazekii is thought to have essentially appeared as a subset of R. conorii 40 to 80 million years ago. One hundred and thirty seven genes of R. conorii do not have any sequence similarity with the R. prowazekii genome (Ogata et al., 2001
). Supplementary Table A (in IJSEM Online) summarizes the overlapping gene pairs in two genomes R. prowazekii and R. conorii. The overlapping gene pairs are unidirectional, and are found on the same strand in both the genomes except for RP884/RP885 (
) and RC1373/RC1374 (
). The number of unidirectional overlapping gene pairs is more on the leading strand than on the lagging strand for both the genomes (Supplementary Table A). This may be attributed to the existence of a larger number of genes in the leading strand (Rocha & Danchin, 2001
). Out of the unidirectional overlaps, many (11/23) are 4 bp in length. Most of the 4 bp overlaps are ATGA, involving fusion of the termination codon of one gene with the initiation codon of another (translational coupling). This observation reveals the role of gene overlaps in bringing neighbouring genes in contact with the translational machinery to ensure some form of coordinated regulation. A similar case has been reported for the trp operon where TrpC and TrpF overlap by several base pairs (Zheng et al., 2002
). Six gene pairs were found to be overlapping in both the genomes, with different numbers of bases in overlap. The entries and the causes of change in overlapping nucleotides are shown in Supplementary Table B (in IJSEM Online).
To overlap or not to overlap.
Although there are six gene pairs that are overlapping in R. prowazekii, these are split in R. conorii. However, there are only three gene pairs that are overlapping in R. conorii, but are split in R. prowazekii, which has a smaller genome than R. conorii (Table 1
). These differences support the notion that overlapping genes may be a means of compressing the maximum amount of information into the available short sequence space, and may be a result of evolutionary pressure to minimize genome size. It was observed that overlapping genes are generated due to loss of a stop codon or start codon of either gene that results in extension of the 3' end or reassignment of the start codon. This can happen as a result of deletion of the stop/start codons, point mutations at the stop/start codons or frameshift anywhere in the coding region. The results are elaborated below.
Gene pairs that overlap in R. prowazekii but are split in R. conorii.
Six gene pairs overlap in R. prowazekii, but are non-overlapping or split in R. conorii. Out of these, two are present with zero intergenic distance between them in R. conorii, and have 4 bp overlap in R. prowazekii (Supplementary Table C, in IJSEM Online). However, four of them are overlapping in R. prowazekii, and have intergenic distances ranging from 3 to 188 bp in R. conorii (Supplementary Table D, in IJSEM Online).
Gene pairs that overlap in R. conorii but are split in R. prowazekii.
Three gene pairs were identified as overlapping in R. conorii, but as split genes in R. prowazekii. The intergenic distance for these genes ranges from 1 to 40 bp. The possible causes for their emergence are elaborated in Supplementary Table E (in IJSEM Online).
Conclusion
Whole genome sequencing of micro-organisms is providing an opportunity for computer-based genetic analysis that allows us to highlight important features such as overlapping genes in the genomes. From our analysis, mutations at the ends of the coding region are the main force that determines gene overlaps. It also appears that gene overlaps arise from the reduction or elimination of intergenic regions caused by mutational bias towards deletion that helps in genome compression while retaining information content. Furthermore, most of the overlapping genes are not mutually exclusive in function. These studies thus emphasize that there is substantial plasticity among obligatory intracellular parasites, and that overlapping genes facilitate genome reduction and functional coupling.
| REFERENCES |
|---|
|
|
|---|
Andersson, S. G., Zomorodipour, A., Andersson, J. O. & 7 other authors (1998). The genome sequence of Rickettsia prowazekii and the origin of mitochondria. Nature 396, 133140.[CrossRef][Medline]
Burge, C. B. & Karlin, S. (1998). Finding the genes in genomic DNA. Curr Opin Struct Biol 8, 346354.[CrossRef][Medline]
Cebrat, S., Dudek, M. R., Mackiewicz, P., Kowalczuk, M. & Fita, M. (1997). Asymmetry of coding versus noncoding strand in coding sequences of different genomes. Microb Comp Genomics 2, 259268.[Medline]
Chen, S. M., Takiff, H. E., Barber, A. M., Dubois, G. C., Bardwell, J. C. & Court, D. L. (1990). Expression and characterization of RNase III and Era proteins. Products of the rnc operon of Escherichia coli. J Biol Chem 265, 28882895.
Clark, M. A., Baumann, L., Thao, M. L., Moran, N. A. & Baumann, P. (2001). Degenerative minimalism in the genome of a psyllid endosymbiont. J Bacteriol 183, 18531861.
Cole, S. T., Eiglmeier, K., Parkhill, J. & 41 other authors (2001). Massive gene decay in the leprosy bacillus. Nature 409, 10071011.[CrossRef][Medline]
Eyre-Walker, A. (1995). The distance between Escherichia coli genes is related to gene expression levels. J Bacteriol 177, 53685369.
Fraser, C. M., Gocayne, J. D., White, O. & 25 other authors (1995). The minimal gene complement of Mycoplasma genitalium. Science 270, 397403.
Fukuda, Y., Washio, T. & Tomita, M. (1999). Comparative study of overlapping genes in the genomes of Mycoplasma genitalium and Mycoplasma pneumoniae. Nucleic Acids Res 27, 18471853.
Fukuda, Y., Nakayama, Y. & Tomita, M. (2003). On dynamics of overlapping genes in bacterial genomes. Gene 323, 181187.[CrossRef][Medline]
Himmelreich, R., Hilbert, H., Plagens, H., Pirkl, E., Li, B. C. & Herrmann, R. (1996). Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. Nucleic Acids Res 24, 44204449.
Inokuchi, Y., Hirashima, A., Sekine, Y., Janosi, L. & Kaji, A. (2000). Role of ribosome recycling factor (RRF) in translational coupling. EMBO J 19, 37883798.[CrossRef][Medline]
Kalman, S., Mitchell, W., Marathe, R. & 7 other authors (1999). Comparative genomes of Chlamydia pneumoniae and C. trachomatis. Nat Genet 21, 385389.[CrossRef][Medline]
Keese, P. K. & Gibbs, A. (1992). Origins of genes: "big bang" or continuous creation? Proc Natl Acad Sci U S A 89, 94899493.
Krakauer, D. C. (2000). Stability and evolution of overlapping genes. Evolution Int J Org Evolution 54, 731739.[Medline]
Lipman, D. J. (1997). Making (anti)sense of non-coding sequence conservation. Nucleic Acids Res 25, 35803583.
Miyata, T. & Yasunaga, T. (1978). Evolution of overlapping genes. Nature 272, 532535.[CrossRef][Medline]
Normark, S., Bergstrom, S., Edlund, T., Grundstrom, T., Jaurin, B., Lindberg, F. P. & Olsson, O. (1983). Overlapping genes. Annu Rev Genet 17, 499525.[CrossRef][Medline]
Ogata, H., Audic, S., Renesto-Audiffren, P. & 8 other authors (2001). Mechanisms of evolution in Rickettsia conorii and R. prowazekii. Science 293, 20932098.
Pavesi, A., De Iaco, B., Granero, M. I. & Porati, A. (1997). On the informational content of overlapping genes in prokaryotic and eukaryotic viruses. J Mol Evol 44, 625631.[CrossRef][Medline]
Rocha, E. P. & Danchin, A. (2001). Ongoing evolution of strand composition in bacterial genomes. Mol Biol Evol 18, 17891799.
Rogozin, I. B., Spiridonov, A. N., Sorokin, A. V., Wolf, Y. I., Jordan, I. K., Tatusov, R. L. & Koonin, E. V. (2002). Purifying and directional selection in overlapping prokaryotic genes. Trends Genet 18, 228232.[CrossRef][Medline]
Sander, C. & Schulz, G. E. (1979). Degeneracy of the information contained in amino acid sequences: evidence from overlaid genes. J Mol Evol 13, 245252.[CrossRef][Medline]
Shigenobu, S., Watanabe, H., Hattori, M., Sakaki, Y. & Ishikawa, H. (2000). Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature 407, 8186.[CrossRef][Medline]
Shimizu, T., Ohtani, K., Hirakawa, H. & 7 other authors (2002). Complete genome sequence of Clostridium perfringens, an anaerobic flesh-eater. Proc Natl Acad Sci U S A 99, 9961001.
Smith, T. F. & Waterman, M. S. (1981). Overlapping genes and information theory. J Theor Biol 91, 379380.[CrossRef][Medline]
Stephens, R. S., Kalman, S., Lammel, C. J. & 9 other authors (1998). Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. Science 282, 754759.
Yelin, R., Dahary, D., Sorek, R. & 13 other authors (2003). Widespread occurrence of antisense transcription in the human genome. Nat Biotechnol 21, 379386.[CrossRef][Medline]
Zheng, Y., Szustakowski, J. D., Fortnow, L., Roberts, R. J. & Kasif, S. (2002). Computational identification of operons in microbial genomes. Genome Res 12, 12211230.
This article has been cited by other articles:
![]() |
L.-W. Jiang, K.-L. Lin, and C. L. Lu OGtree: a tool for creating genome trees of prokaryotes based on overlapping genes Nucleic Acids Res., July 1, 2008; 36(suppl_2): W475 - W480. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Kingsford, A. L. Delcher, and S. L. Salzberg A Unified Model Explaining the Offsets of Overlapping and Near-Overlapping Prokaryotic Genes Mol. Biol. Evol., September 1, 2007; 24(9): 2091 - 2098. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. I. Montero, D. L. Lewis, M. R. Johnson, S. B. Conners, E. A. Nance, J. D. Nichols, and R. M. Kelly Colocation of Genes Encoding a tRNA-mRNA Hybrid and a Putative Signaling Peptide on Complementary Strands in the Genome of the Hyperthermophilic Bacterium Thermotoga maritima. J. Bacteriol., October 1, 2006; 188(19): 6802 - 6807. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| INT J SYST EVOL MICROBIOL | MICROBIOLOGY | J GEN VIROL |
| J MED MICROBIOL | ALL SGM JOURNALS | |