|
|
||||||||
1 Information Génomique et Structurale, CNRS UPR2589, Case 934, 163 Avenue de Luminy, 13288 Marseille cedex 09, France
2 Unité des rickettsies, IFR 48, CNRS UMR 6020, Faculté de Médecine, Université de la Méditerranée, 27 Boulevard Jean Moulin, 13385 Marseille cedex 05, France
Correspondence
Pierre-Edouard Fournier
Pierre-Edouard.Fournier{at}medecine.univ-mrs.fr
| ABSTRACT |
|---|
|
|
|---|
Published online ahead of print on 31 December 2005 as DOI 10.1099/ijs.0.63903-0.
Tables detailing the prokaryotic species for which complete genome sequences are available in GenBank, the differences between RGC and CGC values obtained in the study for 100 prokaryotic genomes and a list of genes conserved in prokaryotic genomes in the COG database are available as supplementary data in IJSEM Online.
| INTRODUCTION |
|---|
|
|
|---|
In our laboratory, we have been studying the rpoB gene, encoding the
-subunit of the DNA-dependent RNA polymerase (RNAP) (Cramer, 2002
; Murakami & Darst, 2003
), for several years (Drancourt et al., 2004
; Khamis et al., 2003
, 2004
; Mollet et al., 1997
, 1998
; Renesto et al., 2000
, 2001a
, b
; Taillardat-Bisch et al., 2003
). We have observed that, in prokaryotes, the DNA G+C content of the rpoB gene correlates well with that of the genome. This led to the suggestion that the rpoB gene might serve as a measure of genomic DNA G+C content. We investigated whether the genomic DNA G+C content of prokaryotes would correlate well with that of their genes. More specifically, our objective was to identify the most suitable gene which had a DNA G+C content that could be used as an acceptable (not differing by more than 5 mol% from that of the genome) measure of prokaryotic genomic DNA G+C content.
| METHODS |
|---|
|
|
|---|
|
Correlation between gene and genomic DNA G+C contents.
For a given prokaryote, the DNA G+C content from each essential single-copy gene (GGC) and each genome (real DNA G+C content, RGC) was determined from its nucleotide sequence using the EMBOSS software package (Olson, 2002
). For correlation analysis, we analysed only 157 genomes, one genome per species. This was to eliminate the possibility of misinterpreting data that might have resulted from the overrepresentation of some species for which several strains have already been completely sequenced. Correlations between GGC and RGC were studied by means of scatter plots using EXCEL 2003 software (Microsoft). Tendency curves and coefficients of determination (r2) between GGC and RGC were examined for each gene. For the final analyses, we retained only those universally conserved single-copy genes that exhibited an r2 value of >95 %.
We also determined a calculated genomic DNA G+C content (CGC) for each gene and each species. This was inferred from the GGC using the tendency curve equation (Table 1
). For each gene, the variable y in the tendency curve equation represents CGC and x represents GGC. The median mol% difference between CGC and RGC was calculated for each gene. In addition, the sensitivity (i.e. the probability that the RGC and CGC values of two species differ by more than 5 %) of each gene was calculated. The best candidate gene was defined as the gene that showed the highest r2 and sensitivity and the smallest median mol% difference.
|
(n2)/
(1r2), where r is the correlation coefficient, r2 is the coefficient of determination and n represents the number of genomes studied] (Bailey, 1995
Estimation of the suitability of the selected gene.
In order to examine the suitability of the selected gene for the estimation of prokaryotic genome DNA G+C content, we inferred the CGC from the GGC for each of the 49 prokaryotic strains that had not been used previously in gene selection. This was because the species of each of these prokaryotic strains had already been considered in the calculations of DNA G+C content correlation. We also tested 51 prokaryotic genomes that became available after our study had begun (Supplementary Table S2 in IJSEM Online).
| RESULTS |
|---|
|
|
|---|
Correlation between GGC and RGC
Among the 57 single-copy universally conserved genes, r2 ranged from 0.73 for the rpmC gene to 0.98 for the ftsY gene (Supplementary Table S3 in IJSEM Online). For 20 of these genes, r2 exceeded a value of 0.95 (Supplementary Table S3 in IJSEM Online). These genes include pheS (COG0016), rpsB (COG0052), pheT (COG0072), nusA (COG0195), prfA (COG0216), frr (COG0233), tsf (COG0264), trmD (COG0336), rplI (COG0359), lepA (COG0481), valS (COG0525), pyrH (COG0528), obg (COG0536), ftsY (COG0552), smpB (COG0691), nusB (COG0781), rbfA (COG0858) and genes encoding a predicted GTPase (COG0012), a predicted S-adenosylmethionine-dependent methyltransferase involved in cell envelope biogenesis (COG0275) and a metal-dependent hydrolase (COG0319). For all these 20 genes, the correlations between GGC and RGC were statistically significant (P <102). The median mol% difference between RGC and CGC ranged from 1.06 for the ftsY gene to 1.87 for the rbfA gene (Table 1
). The sensitivities of the 20 genes ranged from 93.6 % (147/157) for the tsf, pyrH and smpB genes to 100 % (157/157) for the ftsY gene (Table 1
). The ftsY gene was significantly more sensitive (157/157) than each of the tsf, pyrH and smpB genes (147/157, P <102) and the frr, trmD and rbfA genes (148/157, P =0.01). The lepA gene was significantly more sensitive (156/157) than the tsf, pyrH and smpB genes (147/157, P =0.01) and the frr, trmD and rbfA genes (148/157, P =0.02). Individually, both the predicted metal hydrolase (COG0319) and nusB (155/157) genes were significantly more sensitive than the tsf, pyrH and smpB genes (147/157, P =0.04). All other comparisons provided P values >0.05 and thus were considered non-significant.
We chose the ftsY gene as the best candidate gene. It had the highest r2 value (0.98; Fig. 2
) and sensitivity (100 %). Furthermore, it showed the smallest median mol% difference between RGC and CGC (1.06). The median size of the ftsY gene in prokaryotes was found to be 1144 nucleotides (range 6692325 nucleotides).
|
| DISCUSSION |
|---|
|
|
|---|
DNA G+C content is relatively constant in prokaryotic genomes, in particular in coding regions (Forsdyke & Mortimer, 2000
; Sandberg et al., 2003
), and correlates well with synonymous codon choice (Knight et al., 2001
), amino acid usage (Lobry, 2005
) and genomic signatures (Deschavanne et al., 1999
). This prompted us to investigate whether gene sequences could be used to extrapolate values of genomic DNA G+C content. In our study, we chose genes that were conserved in all prokaryotes as a single copy (Supplementary Table S3 in IJSEM Online; Koonin, 2003
), as the objective was to identify a gene that could be used for all prokaryotes and that had a CGC value that did not differ from the RGC value by more than 5 mol%. The ftsY gene emerged as the best candidate gene, exhibiting the highest coefficient of determination between GGC and RGC, the smallest median mol% difference between CGC and RGC and a sensitivity of 100 %. In prokaryotes, the ftsY gene is present as a single copy (Cao & Saier, 2003) and there is no evidence to suggest horizontal transfer of this gene (Caldon et al., 2001
; Gribaldo & Cammarano, 1998
). The CGC values inferred, as above, from the GGC value for this gene were within a range of 5 mol% from the RGC for 100 prokaryotic strains that had not been previously included in the determination of correlation. Furthermore, we observed that the CGC values obtained for such prokaryotic species as Campylobacter jejuni (31.0 and 31.2 mol%), Ureaplasma urealyticum (25.9 mol%) or T. whipplei (49.3 mol%) were closer to their RGC [30.3 and 30.5 (Owen, 1983
), 25.5 (Razin, 1985
) and 46.3 mol% (Raoult et al., 2003
), respectively] than the DNA G+C content values obtained using traditional methods [31.5 (Owen, 1983
), 26.928 (Razin, 1985
) and 59.4 mol% (La Scola et al., 2001
), respectively]. Finally, the ftsY gene has a median size of 1144 nucleotides and the genome sequences available cover all phylogenetic prokaryote clades. This makes it easy to select primers from phylogenetically close genomes and makes this gene easy to sequence. Thus, based on our results and the characteristics of the gene, we believe that the ftsY gene offers an accurate way of estimating genomic DNA G+C content.
Compared with conventional methods, our method is rapid, less labour-intensive and reproducible. Our method also requires a smaller quantity of DNA than required for the conventional methods and the results are easily comparable between laboratories. It may even be suitable for uncultured bacteria. In addition, unlike recently described methods such as those using a LightCycler thermal cycler (Xu et al., 2000
), our method does not require the use of any specific equipment as sequencing facilities are now available in many academic and non-academic laboratories. Such laboratories could even be sent PCR products from remote areas. Furthermore, our method is not affected by differences in affinity to the SYBR Green nucleic acid stain between prokaryotic chromosomes. In summary, the use of the ftsY gene GGC is a rapid and reliable means of estimating genomic DNA G+C content that may easily be used by any laboratory.
| ACKNOWLEDGEMENTS |
|---|
| REFERENCES |
|---|
|
|
|---|
Bailey, N. T. J. (1995). Statistical Methods in Biology. Cambridge: University Press.
Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Wheeler, D. L. (2005). GenBank. Nucleic Acids Res 33 (Database Issue), D34D38.
Caldon, C. E., Yoong, P. & March, P. E. (2001). Evolution of a molecular switch: universal bacterial GTPases regulate ribosome function. Mol Microbiol 41, 289297.[CrossRef][Medline]
Cao, T. B. & Saier, M. H., Jr (2003). The general protein secretory pathway: phylogenetic analyses leading to evolutionary conclusions. Biochim Biophys Acta 1609, 115125.[Medline]
Cramer, P. (2002). Multisubunit RNA polymerases. Curr Opin Struct Biol 12, 8997.[CrossRef][Medline]
De Ley, J. (1970). Reexamination of the association between melting point, buoyant density, and chemical base composition of deoxyribonucleic acid. J Bacteriol 101, 738754.
Deschavanne, P. J., Giron, A., Vilain, J., Fagot, G. & Fertil, B. (1999). Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. Mol Biol Evol 16, 13911399.[Abstract]
Drancourt, M., Roux, V., Fournier, P. E. & Raoult, D. (2004). rpoB gene sequence-based identification of aerobic Gram-positive cocci of the genera Streptococcus, Enterococcus, Gemella, Abiotrophia, and Granulicatella. J Clin Microbiol 42, 497504.
Ezaki, T., Saidi, S. M., Liu, S. L., Hashimoto, Y., Yamamoto, H. & Yabuuchi, E. (1990). Rapid procedure to determine the DNA base composition from small amounts of gram-positive bacteria. FEMS Microbiol Lett 55, 127130.[Medline]
Forsdyke, D. R. & Mortimer, J. R. (2000). Chargaff's legacy. Gene 261, 127137.[CrossRef][Medline]
Goodfellow, M., Manfio, G. P. & Chun, J. (1997). Towards a practical species concept for cultivable bacteria. In Species: The Units of Biodiversity, pp. 2529. Edited by M. F. Clarridge & H. A. Dawah. London: Chapman and Hall.
Gribaldo, S. & Cammarano, P. (1998). The root of the universal tree of life inferred from anciently duplicated genes encoding components of the protein-targeting machinery. J Mol Evol 47, 508516.[CrossRef][Medline]
Ishikawa, J., Yamashita, A., Mikami, Y., Hoshino, Y., Kurita, H., Hotta, K., Shiba, T. & Hattori, M. (2004). The complete genomic sequence of Nocardia farcinica IFM 10152. Proc Natl Acad Sci U S A 101, 1492514930.
Khamis, A., Colson, P., Raoult, D. & Scola, B. L. (2003). Usefulness of rpoB gene sequencing for identification of Afipia and Bosea species, including a strategy for choosing discriminative partial sequences. Appl Environ Microbiol 69, 67406749.
Khamis, A., Raoult, D. & La Scola, B. (2004). rpoB gene sequencing for identification of Corynebacterium species. J Clin Microbiol 42, 39253931.
Knight, R. D., Freeland, S. J. & Landweber, L. F. (2001). A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol 2, research 0010.10010.13. doi:10.1186/gb-2001-2-4-research0010
Ko, C. Y., Johnson, J. L., Barnett, L. B., McNair, H. M. & Vercellotti, J. R. (1977). A sensitive estimation of the percentage of guanine plus cytosine in deoxyribonucleic acid by high performance liquid chromatography. Anal Biochem 80, 183192.[CrossRef][Medline]
Koonin, E. V. (2003). Comparative genomics, minimal gene-sets and the last common universal ancestor. Nat Rev Microbiol 1, 127136.[CrossRef][Medline]
La Scola, B., Fenollar, F., Fournier, P. E., Altwegg, M., Mallet, M. N. & Raoult, D. (2001). Description of Tropheryma whipplei gen. nov., sp. nov., the Whipple's disease bacillus. Int J Syst Evol Microbiol 51, 14711479.[Abstract]
Lobry, J. R. (2005). Influence of genomic G+C content on average amino acid composition of proteins from 59 bacterial species. Gene 205, 309316.
Mandel, M., Igambi, L., Bergendahl, J., Dodson, M. L., Jr & Scheltgen, E. (1970). Correlation of melting temperature and cesium chloride buoyant density of bacterial deoxyribonucleic acid. J Bacteriol 101, 333338.
Marmur, J. & Doty, P. (1962). Determination of the base composition of deoxyribonucleic acid from its thermal denaturation temperature. J Mol Biol 5, 109118.[Medline]
Mesbah, M. & Whitman, W. B. (1989). Measurement of deoxyguanosine/thymidine ratios in complex mixtures by high-performance liquid chromatography for determination of the mole percentage guanine + cytosine of DNA. J Chromatogr 479, 297306.[CrossRef][Medline]
Mollet, C., Drancourt, M. & Raoult, D. (1997). rpoB sequence analysis as a novel basis for bacterial identification. Mol Microbiol 26, 10051011.[CrossRef][Medline]
Mollet, C., Drancourt, M. & Raoult, D. (1998). Determination of Coxiella burnetii rpoB sequence and its use for phylogenetic analysis. Gene 207, 97103.[CrossRef][Medline]
Murakami, K. S. & Darst, S. A. (2003). Bacterial RNA polymerases: the wholo story. Curr Opin Struct Biol 13, 3139.[CrossRef][Medline]
Olson, S. A. (2002). EMBOSS opens up sequence analysis. European Molecular Biology Open Software Suite. Brief Bioinform 3, 8791.
Owen, R. J. (1983). Nucleic acids in the classification of campylobacters. Eur J Clin Microbiol 2, 367377.[CrossRef][Medline]
Owen, R. J., Hill, L. R. & Lapage, S. P. (1969). Determination of DNA base compositions from melting profiles in dilute buffers. Biopolymers 7, 503516.[CrossRef][Medline]
Raoult, D., Ogata, H., Audic, S., Robert, C., Suhre, K., Drancourt, M. & Claverie, J. M. (2003). Tropheryma whipplei Twist: a human pathogenic Actinobacteria with a reduced genome. Genome Res 13, 18001809.
Razin, S. (1985). Molecular biology and genetics of mycoplasmas (Mollicutes). Microbiol Rev 49, 419455.
Renesto, P., Lorvellec-Guillon, K., Drancourt, M. & Raoult, D. (2000). rpoB gene analysis as a novel strategy for identification of spirochetes from the genera Borrelia, Treponema, and Leptospira. J Clin Microbiol 38, 22002203.
Renesto, P., Gautheret, D., Drancourt, M. & Raoult, D. (2001a). Determination of the rpoB gene sequences of Bartonella henselae and Bartonella quintana for phylogenic analysis. Res Microbiol 151, 831836.[CrossRef]
Renesto, P., Gouvernet, J., Drancourt, M., Roux, V. & Raoult, D. (2001b). Use of rpoB gene analysis for detection and identification of Bartonella species. J Clin Microbiol 39, 430437.
Sandberg, R., Branden, C. I., Ernberg, I. & Coster, J. (2003). Quantifying the species-specificity in genomic signatures, synonymous codon choice, amino acid usage and G+C content. Gene 311, 3542.[CrossRef][Medline]
Schildkraut, C. L., Marmur, J. & Doty, P. (1962). Determination of the base composition of deoxyribonucleic acid from its buoyant density in CsCl. J Mol Biol 4, 430443.[Medline]
Stackebrandt, E., Frederiksen, W., Garrity, G. M. & 10 other authors (2002). Report of the ad hoc committee for the re-evaluation of the species definition in bacteriology. Int J Syst Evol Microbiol 52, 10431047.[Abstract]
Taillardat-Bisch, A. V., Raoult, D. & Drancourt, M. (2003). RNA polymerase beta-subunit-based phylogeny of Ehrlichia spp., Anaplasma spp., Neorickettsia spp. and Wolbachia pipientis. Int J Syst Evol Microbiol 53, 455458.
Tatusov, R. L., Natale, D. A., Garkavtsev, I. V. & 7 other authors (2001). The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 29, 2228.
Vandamme, P., Pot, B., Gillis, M., De Vos, P., Kersters, K. & Swings, J. (1996). Polyphasic taxonomy, a consensus approach to bacterial systematics. Microbiol Rev 60, 407438.
Xu, H. X., Kawamura, Y., Li, N., Zhao, L., Li, T. M., Li, Z. Y., Shu, S. & Ezaki, T. (2000). A rapid method for determining the G+C content of bacterial chromosomes by monitoring fluorescence intensity during DNA denaturation in a capillary tube. Int J Syst Evol Microbiol 50, 14631469.[Abstract]
This article has been cited by other articles:
![]() |
T. Adekambi, T. M. Shinnick, D. Raoult, and M. Drancourt Complete rpoB gene sequencing as a suitable supplement to DNA-DNA hybridization for bacterial species and genus delineation Int J Syst Evol Microbiol, August 1, 2008; 58(8): 1807 - 1814. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Mediannikov, K. Matsumoto, I. Samoylenko, M. Drancourt, V. Roux, E. Rydkina, B. Davoust, I. Tarasevich, P. Brouqui, and P.-E. Fournier Rickettsia raoultii sp. nov., a spotted fever group rickettsia associated with Dermacentor ticks in Europe and Russia Int J Syst Evol Microbiol, July 1, 2008; 58(7): 1635 - 1639. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Christensen, P. Kuhnert, H.-J. Busse, W. C. Frederiksen, and M. Bisgaard Proposed minimal standards for the description of genera, species and subspecies of the Pasteurellaceae Int J Syst Evol Microbiol, January 1, 2007; 57(1): 166 - 178. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| INT J SYST EVOL MICROBIOL | MICROBIOLOGY | J GEN VIROL |
| J MED MICROBIOL | ALL SGM JOURNALS | |