|
|
||||||||
Bacillus Genetic Stock Center, Department of Biochemistry, The Ohio State University, Columbus, OH 43210, USA
Correspondence
Daniel R. Zeigler
zeigler.1{at}osu.edu
| ABSTRACT |
|---|
|
|
|---|
A spreadsheet listing sequence identity scores for all candidates with each genome comparison is available as supplementary data in IJSEM Online.
| INTRODUCTION |
|---|
|
|
|---|
An alternative approach to quantification of genome relatedness is to compare selected DNA sequences for a group of bacterial strains. The core technology for this method, DNA sequencing, is relatively rapid and inexpensive, highly reproducible and readily available to virtually any research group through specialized sequencing centres. Databases of gene sequences and computer applications to compare them are, likewise, freely available. For these reasons, DNA sequence analysis has taken an increasingly important role in taxonomic studies in recent years.
The bacterial species concept, however, requires that entire genomes are compared. If sequence analysis is to augment, or even replace, DNADNA hybridization in defining species, it is paramount that taxonomists identify genes that can represent whole genomes reliably for the purposes of comparison. Recently, an ad hoc committee for the re-evaluation of the species definition in bacteria issued a call for identification of a set of such genes (Stackebrandt et al., 2002
). The committee's consensus was that analysis of at least five genes of diverse chromosomal loci and wide distribution could provide sufficient information to distinguish a bacterial species from related taxa. Once a species was defined in this way, sequence information from a single member of this gene set may be enough to assign additional strains to the species (Stackebrandt et al., 2002
).
It is an open question how much information any given gene sequence can provide about the genome that contains it. Sequence differences between related organisms within a given gene are presumably due to slow, continual acquisition of random mutations, which are subject to selection and inherited vertically. Differences seen in whole-genome comparisons, however, are the sum of this vertical inheritance and any number of horizontal transfer events that involve simultaneous acquisition of many genes through transformation, conjugation or bacteriophage infection. Genome reduction through gene inactivation and deletion further complicates the picture. The relative rates of these factors random mutation, horizontal transfer and genome reduction are poorly understood.
The goal of the current study is to obtain statistical evidence that individual gene sequences diverge at a rate that reflects the overall rate of genome divergence and to identify genes that could best serve as predictors of genome relatedness. Publicly available bacterial genome sequences were used to identify over 30 genes that satisfy the ad hoc committee's criteria (Stackebrandt et al., 2002
). When closely related bacteria were compared, the frequency of identical residues in individual gene alignments correlated with the frequency of identical residues in whole-genome alignments with R2
0·9 for each of eight genes from this set. The highest-scoring sequence from the set, recN, could be used to predict whole-genome relatedness with high accuracy for a test set of 44 bacterial genomes. Combining data from two or three genes could further refine this prediction. It appears that a small number of carefully selected gene sequences can indeed equal, or perhaps even surpass, the precision of DNADNA hybridization for quantification of genome relatedness.
| METHODS |
|---|
|
|
|---|
Individual gene and whole-genome alignments.
For calculating DNA sequence identity for individual genes, sequences obtained from related organisms were aligned with CLUSTAL W and a distance matrix was computed (Thompson et al., 1994
). Pairs of whole genomes were aligned by using the NUCMER application (Delcher et al., 2002
) with the following parameters: breakLen=500, minCluster=40, diagFactor=0·15, maxGap=250 and minMatch=12. To ensure that the algorithm found all possible alignments, each pair was analysed twice with the reference and query files swapped. The resulting output files, giving the coordinates of regions of sequence similarity between the two genomes, were combined and duplicate regions were removed from the list. When two neighbouring regions shared overlapping end-points, the common segment was divided equally between them. Two similarity estimates were calculated from each genomic sequence comparison. DNA sequence identity for conserved regions was calculated as the mean sequence identity of the homologous regions, weighted by each region's length in nucleotides. DNA sequence identity for a pair of whole genomes was calculated by multiplying the sequence identity of their conserved regions by the ratio of the net length of the conserved regions to the mean length of the two genomes.
Statistical analysis.
Univariate linear regression models were used to assess the predictive ability of sequence identities for each individual gene with respect to sequence identities for whole-genome alignments. Step-wise linear regression procedures were used to find the subset of genes that best predicted the whole-genome alignment. Prediction intervals were calculated and the upper/lower limits of these intervals were used to determine the cut-point for the desired 70 % alignment.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
|
These data allow us to estimate how well one factor involved in genome divergence random mutation within vertically transferred genetic material correlates with total divergence between pairs of genomes. Univariate regression analysis of whole-genome sequence identity with respect to conserved-region sequence identity (Fig. 1
a) showed an excellent fit to the linear model (P<0·001, R2=0·871). The simplest interpretation of this result is that during bacterial speciation, the various forces that change genome sequence content act at discrete rates in a time-dependent manner. As a result, the ratio between sequence differences in conserved regions and overall genome sequence differences remains fairly constant, at least while the bacteria are related as closely as those analysed here. This interpretation, if true, means that the proposal of Stackebrandt et al. (2002)
is quite reasonable: a rational definition of bacterial species could be based on sequence analysis of a set of conserved genes. If the relatedness of whole genomes can be measured by examining the subset of genes they share, then a small but representative set of shared genes should successfully predict genome relatedness.
|
|
For each gene, the plot of genome sequence identity versus individual gene sequence identity fit a linear model with P<0·01. Goodness of fit varied widely among the candidate genes, with R2 values ranging from 0·536 to 0·965. Eight of the genes had an R2 value of 0·9 or better, making them outstanding candidates for a species prediction sequence set (Table 2
). The individual candidate gene with the poorest ability to predict genome relatedness on a genus or species level was 16S rDNA, the gene that encodes 16S rRNA (Fig. 1b
); as others have noted (Fox et al., 1992
; Stackebrandt & Goebel, 1994
), it is simply too highly conserved to differentiate reliably between closely related taxa. The candidate gene with the greatest potential for predicting genome relatedness at the genus or subgenus level was recN (Fig. 1c
), a recombination and repair protein-encoding gene that is found in each of the free-living bacterial genomes analysed, as well as in the two Rickettsia species. Among genes found in every bacterial genome analysed, the highest-scoring candidate was dnaX, a gene that encodes two subunits of DNA polymerase III. Whilst recN is slightly superior to dnaX in terms of fit to whole-genome data (R2, 0·965 and 0·943, respectively), the latter sequence may prove to be particularly useful in analysis of taxa characterized by genome reduction.
A parallel study analysed amino acid sequence identities of the predicted products for each of the candidate genes (not shown). In each case, the gene sequences showed a better fit to genome relatedness than the predicted protein sequences. Whilst protein sequences have often been analysed to compare distantly related organisms or ancient gene duplications (Brown et al., 2001
; Delcher et al., 2002
), DNA sequences may be more useful for distinguishing close phylogenetic relationships.
This bacterial species prediction set differs from sequence sets that were assembled for the purpose of constructing universal phylogenetic trees (Brown et al., 2001
). The purpose of those studies is to detect relationships among very distantly related organisms at the domain level, whereas the purpose of the current study is to distinguish between closely related organisms at the species level. Many of the highest scoring sequences in Table 2
have no known orthologues outside bacteria and so cannot be used to construct universal trees. Further, the practical aims of the current study impose selection criteria that would be unnecessary in a universal tree study, such as moderate gene length and absence of close paralogues. Nevertheless, there is some overlap, including certain tRNA synthetases and DNA and RNA polymerase subunits, between the bacterial species prediction set developed in this study and the universal tree set of Brown et al. (2001)
.
Predictive models
Linear regression analysis yielded simple predictive models for high-scoring candidates. Genome relatedness for two related bacterial strains can be predicted by the following formula:
|
|
|
|
|
Step-wise linear regression procedures were used to determine whether inclusion of sequence data from other candidate genes improved the predicting power of recN. The results suggest that the best two-gene combination (that is, the combination with the highest R2 value) was recN and thdF, which encodes a thiophene oxidation enzyme (Fig. 2b
), whereas the best three-gene combination was recN, thdF and rpoA, which encodes the RNA polymerase
-subunit (Fig. 2c
). Interestingly, the best sets used a combination of genes that were highly (rpoA), moderately (thdF) and weakly (recN) conserved. Prediction models resulting from this analysis were:
|
|
|
|
Bacterial genome sequences continue to become available in increasing numbers. Following the statistical modelling phase of this study, public databases were re-examined for new whole-genome sequences that either fit within the genus groups analysed before or defined new groups. Several new comparisons were possible, serving as a test of the validity of the predictive models (Table 3
). The genomes were analysed as before and for each genome pair, sequence identity scores were computed for whole genomes and for the predictor genes recN, thdF and rpoA. The single-, two- and three-gene sets predicted whole-genome sequence identity with mean residual values of 0·049, 0·032 and 0·040, respectively. This second test set of comparisons yielded results that were completely in harmony with those of the first set of comparisons.
|
| ACKNOWLEDGEMENTS |
|---|
| REFERENCES |
|---|
|
|
|---|
Brenner, D. J., Fanning, G. R., Skerman, F. J. & Falkow, S. (1972). Polynucleotide sequence divergence among strains of Escherichia coli and closely related organisms. J Bacteriol 109, 933965.
Brown, J. R., Douady, C. J., Italia, M. J., Marshall, W. E. & Stanhope, M. J. (2001). Universal trees based on large combined protein sequence data sets. Nat Genet 28, 281285.[CrossRef][Medline]
Crosa, J. H., Brenner, D. J., Ewing, W. H. & Falkow, S. (1973). Molecular relationships among the Salmonelleae. J Bacteriol 115, 307315.
Delcher, A. L., Phillippy, A., Carlton, J. & Salzberg, S. L. (2002). Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res 30, 24782483.
Fox, G. E., Wisotzkey, J. D. & Jurtshuk, P., Jr (1992). How close is close: 16S rRNA sequence identity may not be sufficient to guarantee species identity. Int J Syst Bacteriol 42, 166170.
Fukushi, H. & Hirai, K. (1989). Genetic diversity of avian and mammalian Chlamydia psittaci strains and relation to host origin. J Bacteriol 171, 28502855.
Gürtler, V. & Mayall, B. C. (2001). Genomic approaches to typing, taxonomy and evolution of bacterial isolates. Int J Syst Evol Microbiol 51, 316.[Abstract]
Johnson, J. L. (1994). Similarity analysis of DNAs. In Methods for General and Molecular Bacteriology, pp. 655682. Edited by P. Gerhardt, R. G. E. Murray, W. A. Wood & N. R. Krieg. Washington, DC: American Society for Microbiology.
Rosselló-Mora, R. & Amann, R. (2001). The species concept for prokaryotes. FEMS Microbiol Rev 25, 3967.[Medline]
Somerville, H. J. & Jones, M. L. (1972). DNA competition studies within the Bacillus cereus group of bacilli. J Gen Microbiol 73, 257265.[Medline]
Stackebrandt, E. & Goebel, B. M. (1994). Taxonomic note: a place for DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology. Int J Syst Bacteriol 44, 846849.
Stackebrandt, E., Frederiksen, W., Garrity, G. M. & 10 other authors (2002). Report of the ad hoc committee for the re-evaluation of the species definition in bacteriology. Int J Syst Evol Microbiol 52, 10431047.[Abstract]
Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 46734680.
Wayne, L. G., Brenner, D. J., Colwell, R. R. & 9 other authors (1987). International Committee on Systematic Bacteriology. Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. Int J Syst Bacteriol 37, 463464.
Weiss, E., Schramek, S., Wilson, N. N. & Newman, L. W. (1970). Deoxyribonucleic acid heterogeneity between human and murine strains of Chlamydia trachomatis. Infect Immun 2, 2428.
This article has been cited by other articles:
![]() |
P. Konczy, K. Ziebell, M. Mascarenhas, A. Choi, C. Michaud, A. M. Kropinski, T. S. Whittam, M. Wickham, B. Finlay, and M. A. Karmali Genomic O Island 122, Locus for Enterocyte Effacement, and the Evolution of Virulent Verocytotoxin-Producing Escherichia coli J. Bacteriol., September 1, 2008; 190(17): 5832 - 5840. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Rameshkumar, Y. Fukui, T. Sawabe, and S. Nair Vibrio porteresiae sp. nov., a diazotrophic bacterium isolated from a mangrove-associated wild rice (Porteresia coarctata Tateoka) Int J Syst Evol Microbiol, July 1, 2008; 58(7): 1608 - 1615. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mignard and J.-P. Flandrois A seven-gene, multilocus, genus-wide approach to the phylogeny of mycobacteria using supertrees Int J Syst Evol Microbiol, June 1, 2008; 58(6): 1432 - 1441. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Cerritos, P. Vinuesa, L. E. Eguiarte, L. Herrera-Estrella, L. D. Alcaraz-Peraza, J. L. Arvizu-Gomez, G. Olmedo, E. Ramirez, J. L. Siefert, and V. Souza Bacillus coahuilensis sp. nov., a moderately halophilic species from a desiccation lagoon in the Cuatro Cienegas Valley in Coahuila, Mexico Int J Syst Evol Microbiol, April 1, 2008; 58(4): 919 - 923. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Koeppel, E. B. Perry, J. Sikorski, D. Krizanc, A. Warner, D. M. Ward, A. P. Rooney, E. Brambilla, N. Connor, R. M. Ratcliff, et al. Identifying the fundamental units of bacterial diversity: A paradigm shift to incorporate ecology into bacterial systematics PNAS, February 19, 2008; 105(7): 2504 - 2509. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Martens, P. Dawyndt, R. Coopman, M. Gillis, P. De Vos, and A. Willems Advantages of multilocus sequence analysis for taxonomic studies: a case study using 10 housekeeping genes in the genus Ensifer (including former Sinorhizobium) Int J Syst Evol Microbiol, January 1, 2008; 58(1): 200 - 214. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Naser, P. Dawyndt, B. Hoste, D. Gevers, K. Vandemeulebroecke, I. Cleenwerck, M. Vancanneyt, and J. Swings Identification of lactobacilli by pheS and rpoA gene sequence analyses Int J Syst Evol Microbiol, December 1, 2007; 57(12): 2777 - 2789. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Young and D.-C. Park Probable synonymy of the nitrogen-fixing genus Azotobacter and the genus Pseudomonas Int J Syst Evol Microbiol, December 1, 2007; 57(12): 2894 - 2901. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. C. Thompson, F. L. Thompson, A. C. P. Vicente, and J. Swings Phylogenetic analysis of vibrios and related species by means of atpA gene sequences Int J Syst Evol Microbiol, November 1, 2007; 57(11): 2480 - 2484. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. R. Brown, R. F. Whitcomb, and J. M. Bradbury Revised minimal standards for description of new species of the class Mollicutes (division Tenericutes) Int J Syst Evol Microbiol, November 1, 2007; 57(11): 2703 - 2719. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Sawabe, K. Kita-Tsukamoto, and F. L. Thompson Inferring the Evolutionary History of Vibrios by Means of Multilocus Sequence Analysis J. Bacteriol., November 1, 2007; 189(21): 7932 - 7936. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Ventura, C. Canchaya, A. Tauch, G. Chandra, G. F. Fitzgerald, K. F. Chater, and D. van Sinderen Genomics of Actinobacteria: Tracing the Evolutionary History of an Ancient Phylum Microbiol. Mol. Biol. Rev., September 1, 2007; 71(3): 495 - 548. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Bisgaard, J. P. Christensen, A. M. Bojesen, and H. Christensen Avibacterium endocarditidis sp. nov., isolated from valvular endocarditis in chickens Int J Syst Evol Microbiol, August 1, 2007; 57(8): 1729 - 1734. [Abstract] [Full Text] [PDF] |
||||
![]() |
L.-T. Wang, F.-L. Lee, C.-J. Tai, and H. Kasai Comparison of gyrB gene sequences, 16S rRNA gene sequences and DNA DNA hybridization in the Bacillus subtilis group Int J Syst Evol Microbiol, August 1, 2007; 57(8): 1846 - 1850. [Abstract] [Full Text] [PDF] |
||||
![]() |
L.-T. Wang, F.-L. Lee, C.-J. Tai, A. Yokota, and H.-P. Kuo Reclassification of Bacillus axarquiensis Ruiz-Garcia et al. 2005 and Bacillus malacitensis Ruiz-Garcia et al. 2005 as later heterotypic synonyms of Bacillus mojavensis Roberts et al. 1994 Int J Syst Evol Microbiol, July 1, 2007; 57(7): 1663 - 1667. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Lang, B. Griese, C. Sproer, P. Schumann, M. Steffen, and S. Verbarg Characterization of 'Pseudomonas azelaica' DSM 9128, leading to emended descriptions of Pseudomonas citronellolis Seubert 1960 (Approved Lists 1980) and Pseudomonas nitroreducens Iizuka and Komagata 1964 (Approved Lists 1980), including Pseudomonas multiresinivorans as its later heterotypic synonym Int J Syst Evol Microbiol, April 1, 2007; 57(4): 878 - 882. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Martens, M. Delaere, R. Coopman, P. De Vos, M. Gillis, and A. Willems Multilocus sequence analysis of Ensifer and related taxa Int J Syst Evol Microbiol, March 1, 2007; 57(3): 489 - 503. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Kuhnert, B. M. Korczak, H. Christensen, and M. Bisgaard Emended description of Actinobacillus capsulatus Arseculeratne 1962, 38AL Int J Syst Evol Microbiol, March 1, 2007; 57(3): 625 - 632. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. W. Gilmour, K. Bernard, D. M. Tracz, A. B. Olson, C. R. Corbett, T. Burdz, B. Ng, D. Wiebe, G. Broukhanski, P. Boleszczuk, et al. Molecular typing of a Legionella pneumophila outbreak in Ontario, Canada J. Med. Microbiol., March 1, 2007; 56(3): 336 - 341. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Christensen, P. Kuhnert, H.-J. Busse, W. C. Frederiksen, and M. Bisgaard Proposed minimal standards for the description of genera, species and subspecies of the Pasteurellaceae Int J Syst Evol Microbiol, January 1, 2007; 57(1): 166 - 178. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. Tracz, P. G. Backhouse, A. B. Olson, J. K. McCrea, J. A. Walsh, L.-K. Ng, and M. W. Gilmour Rapid detection of Vibrio species using liquid microsphere arrays and real-time PCR targeting the ftsZ locus J. Med. Microbiol., January 1, 2007; 56(1): 56 - 65. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Naser, M. Vancanneyt, C. Snauwaert, G. Vrancken, B. Hoste, L. De Vuyst, and J. Swings Reclassification of Lactobacillus amylophilus LMG 11400 and NRRL B-4435 as Lactobacillus amylotrophicus sp. nov. Int J Syst Evol Microbiol, November 1, 2006; 56(Pt 11): 2523 - 2527. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Sassera, T. Beninati, C. Bandi, E. A. P. Bouman, L. Sacchi, M. Fabbi, and N. Lo 'Candidatus Midichloria mitochondrii', an endosymbiont of the tick Ixodes ricinus with a unique intramitochondrial lifestyle. Int J Syst Evol Microbiol, November 1, 2006; 56(Pt 11): 2535 - 2540. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. E. Greenberg, S. F. Porcella, F. Stock, A. Wong, P. S. Conville, P. R. Murray, S. M. Holland, and A. M. Zelazny Granulibacter bethesdensis gen. nov., sp. nov., a distinctive pathogenic acetic acid bacterium in the family Acetobacteraceae. Int J Syst Evol Microbiol, November 1, 2006; 56(Pt 11): 2609 - 2616. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. I. Murcia, E. Tortoli, M. C. Menendez, E. Palenque, and M. J. Garcia Mycobacterium colombiense sp. nov., a novel member of the Mycobacterium avium complex and description of MAC-X as a new ITS genetic variant. Int J Syst Evol Microbiol, September 1, 2006; 56(Pt 9): 2049 - 2054. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Kuhnert and B. M. Korczak Prediction of whole-genome DNA-DNA similarity, determination of G+C content and phylogenetic analysis within the family Pasteurellaceae by multilocus sequence analysis (MLSA). Microbiology, September 1, 2006; 152(Pt 9): 2537 - 2548. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Naser, M. Vancanneyt, B. Hoste, C. Snauwaert, and J. Swings Lactobacillus cypricasei Lawson et al. 2001 is a later heterotypic synonym of Lactobacillus acidipiscis Tanasupawat et al. 2000. Int J Syst Evol Microbiol, July 1, 2006; 56(Pt 7): 1681 - 1683. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. Tracz, H. Tabor, M. Jerome, L.-K. Ng, and M. W. Gilmour Genetic Determinants and Polymorphisms Specific for Human-Adapted Serovars of Salmonella enterica That Cause Enteric Fever. J. Clin. Microbiol., June 1, 2006; 44(6): 2007 - 2018. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Rivas, P. Garcia-Fraile, P. F. Mateos, E. Martinez-Molina, and E. Velazquez Photobacterium halotolerans sp. nov., isolated from Lake Martel in Spain. Int J Syst Evol Microbiol, May 1, 2006; 56(Pt 5): 1067 - 1071. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Cooper and E. J. Feil The phylogeny of Staphylococcus aureus - which genes make the best intra-species markers? Microbiology, May 1, 2006; 152(Pt 5): 1297 - 1305. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Naser, K. E. Hagen, M. Vancanneyt, I. Cleenwerck, J. Swings, and T. A. Tompkins Lactobacillus suntoryeus Cachat and Priest 2005 is a later synonym of Lactobacillus helveticus (Orla-Jensen 1919) Bergey et al. 1925 (Approved Lists 1980) Int J Syst Evol Microbiol, February 1, 2006; 56(2): 355 - 360. [Abstract] [Full Text] [PDF] |