Previous Article | Next Article ![]()
Clinical Microbiology Reviews, January 1999, p. 126-146, Vol. 12, No. 1
Department of Plant and Microbial Biology,
University of California, Berkeley, California
94720-3102,1 and
Department of Biology,
Imperial College, Ascot SL5 7PY, United Kingdom2
0893-8512/99/$04.00+0
Copyright © 1999, American Society for Microbiology. All rights reserved.
The Evolutionary Biology and Population Genetics
Underlying Fungal Strain Typing

SUMMARY
INTRODUCTION
USING STRAIN-TYPING DATA TO ANALYZE THE REPRODUCTIVE MODE
AND POPULATION STRUCTURE OF FUNGI
GATHERING DATA FOR STRAIN TYPING AND POPULATION GENETICS
Multilocus Enzyme Electrophoresis
Electrophoretic Karyotype Analysis
Restriction Fragment Length Polymorphism Analysis
Randomly Amplified Polymorphic DNA or Arbitrarily Primed PCR
Analysis
Sequence-Confirmed Amplified Region Analysis
Nucleic Acid Sequencing of Known Genes
DNA Fingerprinting with Repetitive DNA Sequences
Microsatellite Analysis
Amplified Fragment Length Polymorphism Analysis
ANALYZING MOLECULAR VARIATION
Reproductive Mode
Genetic Differentiation and Isolation
SPECIFIC FUNGAL EXAMPLES
Coccidioides immitis
Strain typing.
Reproductive mode.
Genetic differentiation and isolation.
Cryptic species.
Histoplasma capsulatum
Strain typing.
Cryptic species.
Reproductive mode.
Genetic differentiation and isolation.
Candida albicans
Strain typing.
Cryptic species.
Reproductive mode.
Cryptococcus neoformans
Strain typing.
Reproductive mode.
Genetic differentiation and cryptic species.
Aspergillus fumigatus and A. flavus
Strain typing.
Cryptic species.
Reproductive mode.
FUTURE DIRECTIONS
Genomics
Clinical Variation
Amount of Recombination
Standardization of Approach
ACKNOWLEDGMENTS
REFERENCES
SUMMARY
|
|
|---|
Strain typing of medically important fungi and fungal population genetics have been stimulated by new methods of tapping DNA variation. The aim of this contribution is to show how awareness of fungal population genetics can increase the utility of strain typing to better serve the interests of medical mycology. Knowing two basic features of fungal population biology, the mode of reproduction and genetic differentiation or isolation, can give medical mycologists information about the intraspecific groups that are worth identifying and the number and type of markers that would be needed to do so. The same evolutionary information can be just as valuable for the selection of fungi for development and testing of pharmaceuticals or vaccines. The many methods of analyzing DNA variation are evaluated in light of the need for polymorphic loci that are well characterized, simple, independent, and stable. Traditional population genetic and new phylogenetic methods for analyzing mode of reproduction, genetic differentiation, and isolation are reviewed. Strain typing and population genetic reports are examined for six medically important species: Coccidioides immitis, Histoplasma capsulatum, Candida albicans, Cryptococcus neoformans, Aspergillus fumigatus, and A. flavus. Research opportunities in the areas of genomics, correlation of clinical variation with genetic variation, amount of recombination, and standardization of approach are suggested.
INTRODUCTION
|
|
|---|
Strain typing of medically important fungi, i.e., the ability to identify them to the species level and to discriminate among individuals within species, has been galvanized by new methods of tapping the tremendous variation found in fungal DNA. The aim of this review is to show how awareness of fungal population genetics can increase the utility of strain typing to better serve the interests of medical mycology. By paying attention to attributes of the life cycle, such as the mode of reproduction and genetic differentiation or isolation, mycologists can tell which intraspecific groups are worth identifying and can determine the number and type of markers that are needed to do so. The same evolutionary information can be just as valuable to the selection of fungi for development and testing of pharmaceuticals or vaccines. In the process, we should gain a better basic understanding of the behavior of pathogenic fungi in nature.
Initially, most of the application of fungal DNA variation to evolution was directed above the species level (7, 8, 11, 12, 13, 21, 45, 73, 91, 105), but that is no longer the case. Our ability to distinguish individuals in species of systemic and opportunistic pathogenic fungi is approaching the elusive goal of being able to identify every different genotype by a simple laboratory procedure. As discussed below, this goal is within our grasp because of recent reports of studies involving repetitive DNA sequences (33) or microsatellites (19, 25, 39). These advances in detection methods will make studies of nosocomial infections and epidemics much more convincing, and if thought is given to fungal biology, they will take full advantage of the power of DNA variation to identify fungi and track their medically relevant traits. In other words, with a little more effort, strain-typing studies could reveal a number of basic life history features of the fungi and, with that information, could have a greater impact on clinical identification and even research aimed at treatment or prevention.
Two fundamental life history traits that are of interest to medical mycologists are (i) the reproductive mode (i.e., is the fungus asexual and clonal or sexual and recombining?) and (ii) genetic differentiation or isolation (i.e., is the fungus one large interbreeding population, or is it subdivided into discrete populations which are genetically isolated?). Most fungi are sexual, and most are haploid (N), a medical example being Ajellomyces capsulatus (mitosporic name, Histoplasma capsulatum). However, many fungi are thought to be asexual (as many as 25% [58]), medical examples being Aspergillus fumigatus, Trichophyton rubrum and Candida albicans. A few fungi are diploid (2N), C. albicans being a good medical example, and in the Basidiomycota, many species are dikaryotic (N+N), although the medically important basidiomycetes (e.g., Cryptococcus neoformans, anamorph of Filobasidiella neoformans, Trichosporon spp., and Malassezia spp.) are probably haploid when associated with the host (55, 71). Members of sexual species may mate with themselves (selfing) or with other individuals (outcrossing) or may do both. Members of different sexual species may hybridize in laboratory settings, if they are under strong selection for hybrid offspring, but hybridization in nature is rarely encountered in fungi; there are only five good examples of this process. Four of them involve plant-pathogenic fungi that could come in close contact while in the host (Ophiostoma [18], Epichloë [102], Tilletia [41], Typhula [29]), and the fifth, Allomyces (36), is a chytrid with motile gametes that locate each other with the aid of pheromones). Some fungal species have worldwide distributions (e.g., heterothallic Neurospora spp. [93]), but others have populations specific to different continents or hosts (e.g., Armillaria spp. [2, 27] or Pleurotus spp. [115]) and many others are narrowly endemic. The pioneering work on fungal population genetics has been done with plant-pathogenic fungi, and there are good recent reviews of the field (1, 17, 85).
At the outset, it is worth considering the terms "recombination" and "clonality." Genetic recombination occurs during meiosis, when reassortment and crossing over produce gametes or progeny with different combinations of alleles from those in the parental genomes. In fungi, nonmeiotic recombination, or parasexuality, is also capable of making progeny with new combinations of genes and has been used routinely in laboratory genetic analysis (40). Whether parasexuality is important in nature is controversial, because it is initiated by fusion of vegetative hyphae (anastomosis) of different individuals, a step which appears to be limited to fungi with extremely similar genomes. If it is true that anastomoses are limited to fungi with identical or nearly identical genomes, parasexuality would not be able to make a significant contribution to recombination. Compared with the two mechanisms known for recombination, clonal reproduction is more complicated. Each hyphal apex in a mycelium is clonally related to the others, and fragmentation of a mycelium would qualify as clonal reproduction (for a discussion of this problem in plants, see references 57 and 63]). Production of mitospores (asexual spores, often called conidia) is an improved form of mycelial fragmentation and would seem to be the most common type of fungal clonal reproduction. Yeasts manage to combine mitospore production with vegetative growth, so that every mitotic division leads to mitospore production. However, what about apomictic ascospore production or selfing (homothallism)? The production of ascospores in the absence of recombination seems to point to their other role as resistant propagules. These forms of reproduction would also be considered clonal reproduction by the broad definition that we favor, because the progeny genomes would be identical to the parental genome with no recombination. As a result, if two variable regions of the genome were compared in a collection of individuals, they would appear to be associated in the clonal organism and not in the recombining one. Clonal reproduction, however, is not the only explanation for association between two variable regions. The regions might be in close physical proximity on a chromosome, or they might be more distant but located on a part of a chromosome where crossing over is suppressed. Selection could also lead to their being associated. If the fungal isolates were collected from different, genetically isolated populations, their loci would also appear associated when individuals from the two populations were compared. Similarly, recombination is not the only explanation for a lack of association of loci. Highly mutable loci (hypervariable loci) would mask the association of loci and give the appearance of recombination. Although it would be more accurate to use the terms "recombining" and "nonrecombining" or "associated" and "unassociated," we will use "recombining" and "clonal" because they are widely understood, if not precisely defined.
How can knowledge of the reproductive mode and genetic isolation leading to discrete populations or cryptic species make strain-typing data much more valuable to medical mycologists? Imagine an asexual fungus, which must reproduce clonally (Fig. 1 and 2a). There is no recombination in this fungus, and so the entire genome is transmitted intact from generation to generation and every region of the genome has the same evolutionary history (Fig. 2a). Therefore, only one variable region would suffice to identify different genotypes. Because every gene is associated, variation in traits such as pathogenicity can be followed by any variable DNA region, whether or not the two genes are in physical proximity in the genome. Due to clonality, the relatedness of individual fungi can be assessed through any variable region, and the relationship is straightforward. Without mating to keep populations genetically homogeneous or to incorporate foreign genes into local populations, the concepts of gene flow and genetic isolation take on a different meaning. However, the differential success of some clones and the extinctions of others can lead to populations that are distinct from neighboring ones, and gene flow between them certainly can be detected.
|
|
At the other extreme, imagine a sexual, outbreeding fungus. Here, due
to recombination, every region of the genome might have a different
evolutionary history, and it would be impossible for a single variable
locus to identify genetically different individuals
a collection of
loci would be necessary (Fig. 2b). Due to recombination, only loci
associated with pathogenicity genes could be used to assay
pathogenicity. Genetic distance among isolates could be assayed, but
again, the accuracy of the estimate would be proportional to the number
of loci assayed. Also, genetic distance would not necessarily have
anything to do with clinically relevant characteristics. Gene flow
among populations of sexual fungi is an important issue for strain
typing. If populations are genetically isolated, a comparison of
individual fungi taken from different populations would be like a
comparison of clonal organisms, because they do not recombine (Fig.
3). As a consequence, loci with different alleles fixed in different populations could be used to identify individuals to their population of origin; only one such locus is
needed, and any one is as good as any other. If differences in
pathogenicity were found among populations, any of the loci fixed
for different alleles in the populations could be used to track the
trait.
|
Without knowledge of the reproductive mode or genetic differentiation and isolation, strain-typing studies often assume that the fungus is asexual, has clonal reproduction, and is not subdivided into genetically isolated populations. As seen below, these assumptions have not been supported in several recent studies of medically important fungi. Therefore, medical mycologists should be aware of the reproductive mode and population structure of their fungi, because the use of DNA variation to identify fungi and track clinically relevant behavior depends upon it. Researchers working to develop antifungal drugs or vaccines may also be interested in genetically isolated populations or divergent clones, to be sure that all genotypes in the species are affected by the treatment.
USING STRAIN-TYPING DATA TO ANALYZE THE REPRODUCTIVE MODE AND POPULATION STRUCTURE OF FUNGI
|
|
|---|
Students of charismatic megabiota can study reproductive mode and gene flow with a pair of binoculars or a collection of radio collars, but microbiologists must use genetics. The necessary features are variable traits, in our case polymorphic macromolecules, and methods of analysis that can be used to distinguish clonality from recombination and to detect genetic differentiation and isolation.
GATHERING DATA FOR STRAIN TYPING AND POPULATION GENETICS
|
|
|---|
Strain typing and population genetics both exploit genetic variation, and almost all studies now focus on genotypic characters instead of phenotypic ones. In the panoply of acronyms that follow, all but the first one analyze DNA variation. To assess the basic features of fungal populations that are essential to interpret strain typing, i.e., reproductive mode and genetic isolation, one needs a series of simple, well-characterized, independent, stable polymorphic loci. Some of the methods below, e.g., electrophoretic karyotyping and fingerprinting methods, cannot provide this type of data. However, once the evolutionary biology of the fungus has been analyzed, methods such as fingerprinting can be useful for distinguishing clones.
Multilocus Enzyme Electrophoresis
Multilocus enzyme electrophoresis (MLEE) hardly needs an introduction. Any protein (locus) that can be selectively stained can be isolated from a collection of individuals and electrophoresed. If the protein bands are variable, they are considered to be alleles on the basis of mobility (Fig. 4). The technique is robust and continues to be put to good use in studies of medically important fungi, e.g., C. albicans (10) and A. fumigatus (99). An important advantage of this technique is that all alleles are recovered, so that alleles are rarely missing, or null; in other words, the locus is codominant in that both alleles in a diploid can be observed. The criticisms of enzyme electrophoresis are that (i) it assays the genotype indirectly, so that much variation at the nucleotide level may go undetected because nucleotide substitutions do not necessarily change the amino acid composition; (ii) changes in amino acid composition do not necessarily change the electrophoretic mobility of the protein and, as a consequence, alleles that are considered to be the same protein alleles from different individuals may represent different gene alleles; and (iii) selection may be acting on the polymorphisms, so that anonymous DNA markers may give a very different picture from allozyme markers, presumably because the former are neutral and the latter are under some sort of selection (64, 94).
|
Electrophoretic Karyotype Analysis
Chromosomal size variation is assayed via an electrophoretic technique, electrophoretic karyotype (EK) analysis, which uses electric fields of alternating orientation to move intact chromosomes through the agarose gel matrix. This biologically interesting and complex topic has recently been reviewed (123). Classic reports for C. albicans pointed out the value of this technique (79, 83), and several other human pathogenic fungi have been studied in this way: Coccidioides immitis (90), Histoplasma capsulatum (107), and Cryptococcus neoformans (119). It may be argued that karyotypes should be less variable for sexual species than for asexual ones, due to the need for pairing at meiosis, but fungal EK are known to display some variation due to loss of dispensable chromosomes (84) and to vary in size and gene arrangement, even in sexual species (51) (Fig. 5).
|
Restriction Fragment Length Polymorphism Analysis
Restriction fragment length polymorphism analysis (RFLP) assays the DNA sequence variation of the genome by using restriction endonucleases to sample short pieces of DNA sequence. Restriction endonucleases recognize specific DNA sequences, usually 4 to 6 nucleotides in length, and cut the DNA in or near the recognition sequence. Alteration of the recognition sequence by nucleotide substitution, insertion or deletion (length mutation), or, for some restriction enzymes, methylation of nucleotides (provided that the DNA is genomic and not amplified or cloned) can prevent the restriction endonuclease from acting and change the fragment pattern (Fig. 6). Length mutations in the region between restriction sites also can change the pattern. Any region of DNA (locus) can be used for RFLP analysis if variation (alleles) can be visualized directly due to multiple copies (mitochondrial DNA [mtDNA], rDNA) or if variation can be visualized indirectly either by hybridization to probe DNA or amplification by PCR. If only one variable restriction endonuclease site is present in the DNA fragment, allele scoring is straightforward. When several variable sites are present, mapping the sites will improve the interpretation. RFLPs were the first DNA markers used for fungal evolutionary biology, and they continue to be put to good use in population genetic studies, e.g., for A. nidulans (48). Critics of RFLP analysis note that while restriction sites in different individuals are most likely to be identical by descent, that is not the case for missing sites, because it is easier to lose a site than to gain one. There are many ways in which a restriction endonuclease site can be lost: any of the several nucleotides in the recognition sequence can be substituted (Fig. 6), or the site can suffer length mutations. Missing sites that have arisen, unknowingly, by different routes confound evolutionary analysis.
|
Randomly Amplified Polymorphic DNA or Arbitrarily Primed PCR Analysis
Randomly amplified polymorphic DNA (RAPD) analysis or arbitrarily primed PCR (AP-PCR) analysis is similar to RFLP analysis in that it assays DNA sequence variation in short regions, but instead of analyzing restriction endonuclease recognition sequences, it focuses on PCR priming regions (120). Nucleotide substitutions in the PCR priming regions, particularly the 3' ends, can prevent primer annealing and PCR amplification. RAPD analysis uses one short PCR primer (ca. 10 bp) and a low annealing temperature to generate several fragments in one amplification. If a comparison of amplifications of several isolates shows a band (locus) that varies, alleles are assigned to the presence [1] and absence [0] of the band. RAPD analysis is technically simple and often detects variation among isolates that are invariant with RFLP analysis, and so it has become quite popular. For example, in comparisons of several DNA-based typing methods for Candida lusitaniae and Aspergillus fumigatus, RAPDs detected variation missed by RFLPs (66, 75). Criticisms of RAPDs initially focused on reproducibility. RAPD analysis succeeds because just one nucleotide substitution can allow or prevent priming, and so it is not surprising that small differences in any aspect of PCR have the same effect. Even if there were no problems with repeatability, there would still be the concern that bands of equal electrophoretic mobility may not be homologous and the related concern that missing bands may not be homologous because they can be lost by several possible nucleotide substitutions in either PCR priming site as well as by length mutations (Fig. 7). There is also the problem of dominant and null alleles; in haploid organisms both the dominant (presence) and null (absence) alleles can be scored, but in diploids it is not possible to distinguish genotypes that are homozygous for the dominant allele from those that are heterozygous. It is also tempting to score more than one variable band per RAPD reaction, although they may not be independent. A recent evaluation of RAPD bands from hybrid plants found that a disturbingly large fraction (13%) of bands with equal mobility were not homologous (97). One solution to this problem is to sequence the RAPD bands to confirm their identity, as has been done in several recent studies (22, 54, 86).
|
Sequence-Confirmed Amplified Region Analysis
Starting with variable loci found with RFLP or RAPD analysis, sequence-confirmed amplified region (SCAR) analysis uses DNA sequencing to recover both positive alleles (band present) and null alleles (band absent) and to ensure that alleles from different individuals are homologous. If the RFLP loci are based on DNA hybridization and the variable restriction endonuclease site lies in the probe sequence, it is simple to sequence the probe, design PCR primers to amplify the probe region from every isolate, and use the restriction endonuclease to distinguish the alleles. If, however, the variable restriction endonuclease site lies adjacent to the probe sequence, where it cannot be easily sequenced, or if the variable DNA fragment is a RAPD fragment, where the modified priming site cannot be sequenced, developing the SCAR is more difficult. A robust strategy for finding SCARs involves using RFLPs or RAPDs to generate patterns of bands for a test group of 6 to 10 isolates and then searching for sequence variation among the bands of identical mobility. Sequencing with arbitrary primer pairs (SWAPP [23]) is a refinement of this method which uses two different, ca. 20-bp RAPD primers and low-stringency PCR annealing temperatures to generate PCR bands from every isolate. These bands, which show no variation in agarose gel electrophoresis, are heat denatured, snap cooled to promote single-strand folding, and then electrophoresed on acrylamide under nondenaturing conditions to see if the folded single strands now show variation. If this single-strand conformational polymorphism (SSCP) electrophoresis shows variation in the mobility of the single strands, it is very probably due to conformation changes caused by nucleotide substitutions (88). The variable bands are retrieved from the gel and sequenced. When polymorphic regions are found, new, specific PCR primers are designed and used to amplify the fragment from every isolate in the study. SSCP can be used to score the alleles; alternatively, if the variable nucleotide happens to lie in a restriction endonuclease recognition site, restriction digestion can be used for scoring. Of course, there may be more than one variable nucleotide position in the amplified fragment. However, they are likely to be associated, and only the most informative one is used, i.e., the one in which the allele frequencies are balanced. By this approach, the DNA variation is confined to a single variable nucleotide position or short length mutation, and there are usually just two alleles. Therefore, the problem of ambiguity about the null alleles is overcome because both alleles are recovered; i.e., the locus is codominant. It is still possible that the positive alleles are independently derived, but this is not likely unless the locus is hypervariable (22). This approach has been successfully applied to studies of Coccidioides immitis (22, 23), H. capsulatum (26), and C. albicans (54).
Nucleic Acid Sequencing of Known Genes
Nucleic acid sequencing of known genes has been rendered accessible by inexpensive automated nucleic acid sequencing. Whereas all the preceding techniques used a variety of clever approaches to search for variation without resorting to sequencing, this approach uses brute force. PCR primers that amplify parts of genes from a set of representative isolates are designed by reference to GenBank sequences of the fungus of interest or close relatives. SSCP can be used to screen the PCR products for polymorphism prior to sequencing, and from the sequence all variation in the region is recorded. The technique has all the advantages of SCAR analysis because all the alleles are recovered. If several genes are sequenced, additional opportunities for analysis become available, as discussed below (see, e.g., reference 67). The use of known genes often appeals to a broader scientific audience, but there may be a bias in the evolution of genes that are under strong selection, such as proteins of pathogenic fungi that are recognized as antigens by hosts. While it is true that nucleotides in third codon positions and introns probably would not be subject to selection, their frequency could be affected by proximity to selected regions through genetic hitchhiking (60). The only other drawback comes into play as smaller and younger populations are sampled, because there may be insufficient variation in the genes, or even their introns, to address questions of population genetics. In this event, it is necessary to sample more of the genome to find variation, a task that may involve the use of arbitrary regions as described above.
DNA Fingerprinting with Repetitive DNA Sequences
The techniques mentioned so far are used to describe polymorphic loci one at a time, but both the RFLP and PCR approaches can be used to sample many loci at once if the loci involve repeated DNA sequences. With RFLP fingerprinting, the probe is simply a moderately repeated DNA sequence (ca., 10 to 20 copies per genome) which hybridizes to restriction fragments containing the repeated DNA. With PCR fingerprinting, the primer recognizes a repeated DNA sequence, also producing a large number of fragments with each reaction. Due to the large number of fragments and the mutability of repeated DNA sequences, both the RFLP and PCR fingerprints are quite variable and provide superior discrimination among individual fungal isolates. In comparisons of different strain-typing methods, the fingerprinting methods are usually the most discriminating (33). It is tempting to treat many variable fingerprint fragments as an instant multilocus genotype by assuming that each fragment is an independent locus. Unfortunately, the problems of homology of positive alleles and ambiguity about null alleles are magnified with the complex patterns, and the fragments are not necessarily independent. As a result, fingerprints are useful for identifying individual fungi but problematic for estimating genetic or phylogenetic relationships among individuals.
Microsatellite Analysis
One of the emerging strain-typing techniques exploits the hypervariability of DNA regions made of 10 to 20 or more tandem repeats of nucleotide couplets, triplets, or quadruplets. The ease with which strand slippage during DNA replication can change the number of short repeats makes the level of variability as high as or higher than that of fingerprinting. Use of microsatellites has been demonstrated with C. albicans (39) and H. capsulatum (25). One concern with microsatellites is that alleles may be identical not by descent but by mutation, as is expected with highly mutable sequences (89). Fingerprints of many microsatellite loci can be revealed at once by hybridization of restriction endonuclease-digested fungal DNA to synthetic microsatellite sequences (109). Unfortunately, homology between fragments with the same mobility in different isolates is difficult to establish, as is the case with most fingerprinting techniques.
Amplified Fragment Length Polymorphism Analysis
The polymorphism revealed by amplified fragment length polymorphism (AFLP) analysis depends on restriction endonuclease site differences, just like RFLPs. However, the AFLP process requires that a PCR-amplified library of restriction fragments representative of the entire fungal genome be created (117). The library can be created from very small amounts of DNA, and so it may be useful for medically important fungi that are difficult to cultivate (100). The variable fragments (loci) have two alleles (positive and null), and so the same concerns raised about null alleles with RFLP and RAPD analysis apply. However, many more fragments are produced from each reaction than with RFLP analysis, and the reproducibility is much better than with RAPDs. It should be a good method for prospecting for SCARs and also could provide DNA for multiple gene sequences from pathogens that defy cultivation. Its utility has been demonstrated with several fungi (80, 87, 100).
Once the strain-typing data are in hand, they can be used to analyze genotype diversity among the fungal isolates to address questions about the source of the inoculum, be it reactivation of old infections or newly acquired ones, transfer from other infected humans, or acquisition from environmental sources. However, as discussed in the introduction, to get the most out of the data, they should first be used to analyze two features of the fungus life cycle, i.e., the mode of reproduction and the genetic differentiation or isolation of any groups within the species.
ANALYZING MOLECULAR VARIATION
|
|
|---|
Reproductive Mode
Almost all mycotic agents make mitospores, and so it is a given that asexual or clonal reproduction is a capability of medically important fungi. This fact can lead to the assumption that asexual reproduction is all that medical fungi can do and that strains of pathogenic fungi are responsible for epidemics. The question for medical mycologists is whether the fungus can also undergo recombination, the hallmark of sexual reproduction. The presence of sexual reproduction under laboratory conditions suggests that the same process is occurring in nature, but selfing or the presence of a preponderance of individuals of one mating type in nature could undermine that assumption.
There are several approaches to distinguishing recombination from clonality (113). Criteria for single-locus data apply to diploid organisms, where fixed heterozygotes and other deviations from Hardy-Weinberg equilibria can be detected. With the exception of C. albicans, medically important fungi are not diploid, and therefore single-locus studies are not informative. However, assaying variation at more than one locus to create a multilocus genotype for haploid fungi is straightforward, as discussed above. Tibayrenc et al. (113) point out that with multilocus genotypes, overrepresented genotypes and association among alleles at different loci or among alleles and phenotypic traits provide evidence of clonality. Overrepresentation of a genotype is certainly a direct consequence of clonal reproduction, but the predominance of one genotype does not rule out recombination in the relationships among the different genotypes. As stated above, it is known that mitosporic fungi can reproduce asexually; the important question is whether they also can recombine. Analyzing representatives of each different multilocus genotype for association among loci is a well-tested method of distinguishing between the null hypothesis of recombination and the alternate hypothesis of clonality (82). A simple way to assess association is to see if all possible combinations of alleles are found for a given pair of loci. This test is best suited for biallelic loci, where the allele frequencies are balanced, and has been used with Coccidioides immitis (22) and C. albicans (54).
With genotypes made from several discrete loci, association among loci can be measured by a technique developed for barley (20) and adapted for microbes (82). Known as the index of association (IA) test, it is an application of genetic distance analysis. With data sets simulated to represent observed multilocus genotypes for clonal and recombining organisms (Fig. 8), the calculation of IA is demonstrated (Fig. 9). First, distances between multilocus genotypes are determined for all pairs of taxa, and the variance of the distances is calculated (Fig. 9a). The distribution of pairwise distances for recombining organisms should be normal, with a few close relatives, a few distant relatives, and most of the distances near the mean (Fig. 9b); the variance for this distribution is low. For clonal organisms, there should be many close relatives (clone mates), many distant relatives (those on different branches of the tree), and not many at the mean; the variance for this distribution is high (Fig. 9b). The observed data can be used to simulate the type of data expected for a recombining organism by shuffling (resampling without replacement) the alleles at each locus of the observed data (Fig. 10). The variance for the observed data is then compared to the distribution of variances for 1,000 or more independently shuffled data sets, and significance is determined by the fraction of shuffled data sets with higher variances than the observed one. The IA is simply a rescaled variance, with the mean of the distribution of variances for the recombined data sets being defined as an IA of zero. As can be seen in Fig. 9c, the IA of the clonal data is significantly higher than the distribution of IAs for the recombined data (P = 0.003), but not for the recombined data set (P = 0.785).
|
|
|
The aspect of phylogenetics concerned with building evolutionary trees provides another set of useful methods for distinguishing between clonality and recombination, provided that the multilocus genotypes comprise discrete loci. Clonal organisms evolve like phylogenetic trees; each individual has one parent (ancestor), and there is no horizontal transfer of genes or recombination because there is no mating or meiosis. All loci should reflect the same phylogeny, and there will be little homoplasy in a phylogenetic tree based on multilocus genotypes. In a species of recombining organisms, evolution is not tree-like but is more like a net, with each individual having two parents and horizontal gene transfer occurring with each mating (6, 78). Each locus may have a different phylogeny, and a combined phylogenetic analysis of all loci will show more homoplasy than a clonal organism. As can be seen in Fig. 11, it is possible to compare the lengths of phylogenetic trees (which reflects the relative amount of homoplasy) to distinguish between clonal and recombining populations. This type of test was developed to test for evolutionary signal (or, conversely, homoplasy) in phylogenetic data sets (4) and was adapted for population genetics by Burt et al. (22). Here, too, if homoplasy from other sources could be excluded, a high phylogenetic signal would signify high association between alleles at the different loci used to build the tree, while a low signal would indicate low association and hence recombination between the alleles. The same data sets used for the IA test (Fig. 8), one representing a clonal organism and the second representing a recombining one, were used to make parsimony trees (Fig. 11a and b). The most parsimonious tree for the clonal species (Fig. 11a) is as short as possible (each step corresponds to one allele change for each of the seven variable [biallelic] loci), because in clonal species all loci have the same evolutionary history and trees made from different loci will have compatible topologies. In this example, there is no homoplasy in the clonal data set. In a real data set, some homoplasy might be expected, even for strictly clonal organisms. Figure 11b presents the result of parsimony analysis for the recombining population, in this case a consensus tree of the nine most parsimonious trees. Although trees based on each locus alone would be well resolved and short (a tree based on one biallelic locus would have just one branch that partitioned the individuals into two groups), they would not necessarily partition the individuals into the same groups, making trees based on all the loci poorly resolved and longer due to homoplasy. In this case, the tree is four steps longer than the shortest possible tree. As with the IA test, mimicking the effect of recombination by shuffling the alleles for each locus (resampling without replacement) can be used to determine if the tree length for the observed data is significantly shorter than would be expected for recombining populations. The position of the tree length for the observed data in a distribution of tree lengths for 1,000 resampled data sets provides the P value (Fig. 11c and d). The tree length for the clonal data set is significantly shorter than expected for recombining organisms (P = 0.002), allowing us to reject the null hypothesis of recombination; however, this was not so for the recombined data set (P = 0.996).
|
In the phylogenetic analysis described above, each locus was defined as one variable nucleotide position, or site. If gene sequences for several genes were available, so that the gene was the locus with several variable sites, the same principle of concordance of evolutionary histories could be used to distinguish clonality from recombination. Detection of microbial recombination by comparing phylogenetic trees built for different genes was first used with Escherichia coli (34). If the trees were concordant, there was strong evidence for clonality, but if the trees were in conflict, recombination was a likely explanation. As with all tests of the mode of reproduction, the sample of individuals must be from one population and the loci cannot be hypervariable. Violating the former will bias the analysis toward clonality, and violating the latter will bias the analysis toward recombination. A test developed to evaluate the congruence of phylogenetic trees of taxa above the species level, the partition homogeneity test (PHT), is ideal for this intraspecific comparison (37, 62). For congruent gene trees, i.e., those from a clonal species, the sum of the lengths of the most parsimonious trees for each gene should not change significantly if the different sites (polymorphic nucleotides in each gene) are swapped among genes. In other words, whether the polymorphic sites are contiguous in the same gene or are drawn from other genes throughout the genome, the sum of gene tree lengths should remain the same because the entire genome evolves as a unit. For incongruent gene trees, as would be expected for a recombining fungus, the sum of the trees for the observed data should be shorter than the sum of the trees made after the sites have been swapped among genes. This disparity would result from bringing together sites from distant regions of the genome into one resampled gene. Under recombination, the disparate regions would have different evolutionary histories, and making a tree with these loci would require extra steps due to homoplasy. As before, significance is established by making many randomly swapped data sets and comparing the observed sum of lengths to the distribution for 1,000 or more swapped data sets (Fig. 12 and 13) (50, 67).
|
|
A likelihood method to distinguish between clonality and recombination, one that has clonality as the null hypothesis, has also been demonstrated (22). For all possible phylogenetic trees, the likelihood of the data for each locus is calculated with and without the constraint that each locus must fit the same phylogenetic tree topology. The sum of the highest likelihoods for all of the loci, without the constraint of having the same phylogenetic tree topology, then is compared to the highest sum of likelihoods for all the loci, given the constraint of having to conform to the same tree topology. Similar likelihoods would be expected for clonal organisms because all loci evolve together; much higher likelihoods would be expected for the sum of individual loci without the constraint of a common tree topology if the organism was recombining (22). Unfortunately, the number of possible trees grows much faster than the number of individuals (6 individuals have 105 possible trees, 7 have 945, 10 have a bit over 2 × 106 [38]). At present, this method is limited to a small number of individuals and loci due to computational demands.
Genetic Differentiation and Isolation
Wright's work on F statistics (121) provided the foundation for studies of gene flow by pointing out that subdivision of a large population into smaller ones could be detected by comparing allele frequencies at polymorphic loci. He compared the probability of selecting identical alleles from two individuals in a single subgroup to the probability of selecting identical alleles from one individual in each of two subgroups. If individuals in both subgroups freely interbreed, so that gene flow is high, the probabilities of retrieving identical alleles in each sampling should be identical or nearly so; if the subgroups are isolated and gene flow is low, the probability of selecting the same allele from one subgroup should be much higher than that for individuals from the two groups. His statistic for this comparison, Fst, varies from 0 (no isolation) to 1 (complete isolation) and has been estimated in a variety of ways. A convenient method of estimating Wright's Fst is theta (31). Theta behaves similarly to Fst and has been used to detect genetic differentiation and isolation in human pathogenic fungi (22). The full range of theta, from 0 for free gene flow to 1 for no gene flow, assumes that loci have been sampled randomly, with no bias for polymorphic loci. However, it is often the case that only polymorphic loci are sampled in the first population, because monomorphic loci are not useful for tests of reproductive mode. In this case, theta cannot show the full range of values from 0 to 1, and lower values (e.g., 0.5) may indicate the absence of gene flow (Fig. 14).
|
The availability of gene sequence data from several genes allows for phylogenetic tests of genetic isolation based on the same logic as was presented above for detecting recombination. With sexual fungi, when sequence data for several different genes are combined to make one phylogenetic tree, the expectation is that the combined gene tree will be poorly resolved compared to individual gene trees because the different genes have different evolutionary histories and their trees will have different topologies. However, if a branch in the combined gene tree is found and is strongly supported, there is an indication of genetic isolation. This partition in the tree reflects the fate of polymorphic loci when a population is divided by some event. At the time the population is divided, both progeny populations share the polymorphisms. However, as time goes on in the absence of selection to maintain the polymorphism, the polymorphism is lost, first from one population and then from the other, resulting in the two populations becoming fixed for different alleles (Fig. 3). Only when allele frequencies are significantly different, culminating when they have become fixed for different alleles, can they be detected. For every ancestral polymorphic locus that becomes fixed for alternate alleles in the two descendent populations, a greater number become fixed for the same allele (only those with a 50:50 balance of alleles in the ancestral population would have an equal chance of becoming fixed for alternate alleles). This approach has been used to demonstrate possible cryptic species for Coccidioides immitis (67).
As mentioned in the Introduction, there is a relationship between the detection of reproductive mode and genetic isolation. To search for recombining groups, it is essential to sample individuals from a small geographic area so that the fungi have the opportunity to interbreed, whether or not they are capable of doing so. If isolates are inadvertently taken from more than one genetically isolated population, their relationships will appear clonal because recombination between the populations is prevented. Therefore, if the goal of the study is to analyze the mode of reproduction, it is important to first test for genetic isolation.
SPECIFIC FUNGAL EXAMPLES
|
|
|---|
Cases in which the aforementioned methods of data acquisition and analysis have been used with pathogenic fungi are reviewed below. Highlighted are studies of strain typing, reproductive mode, genetic differentiation and isolation, and the uncovering of cryptic species.
Coccidioides immitis
Coccidioides immitis is a haploid, filamentous ascomycete that makes mitospores (arthroconidia) in the environment and spherules filled with endospores in patients (92). Neither mating nor meiosis has ever been found in this fungus.
Strain typing. C. immitis recently entered the molecular age with an RFLP study of repeated DNA (probably mtDNA and rDNA) from 1 Venezuelan and 14 Californian isolates (122). The technique sorted the isolates into two groups, one with two Californian clinical isolates, including the often studied Silveira isolate, and the other with the rest. Other studies shortly followed that provided information for strain typing set in the context of fungal evolutionary biology.
Reproductive mode.
Clonality and recombination were
examined in 25 isolates of C. immitis from Tucson, Az.,
through analysis of 14 polymorphic loci (22). Each locus was
discovered via SWAPP (23), and the variation was localized
to a single variable nucleotide position or small length mutation
(22). The study found that isolates from different patients
were genetically distinct, i.e., that there were no overrepresented
genotypes. In addition to the 25 isolates, another 5 came from repeated
isolations from the same patients, and these fungi showed no genetic
changes over a maximum of 8 years. Comparison of the observed genotypes
to artificially recombined datasets by the IA
test failed to reject the null hypothesis of recombination, and the
tree length test showed a much longer tree than the number of biallelic
loci (38:14), although there was support for association at
P = 0.04 (Table 1). The
tree length test appears to be a more sensitive method of detecting
association among loci than the IA test.
C. immitis, with its abundant mitospores, should be capable
of clonal reproduction. However, none of the isolates in this study
were identical, and so other causes of association among loci may be
responsible for the tree length result. Of the 14 loci, 3 had more than
one variable nucleotide position within ca. 500 bp, but there was no
evidence of recombination over this short distance. Among several pairs
of loci, however, all possible combinations of alleles were found, as
would be expected with recombination. Finally, using a subset of the
data, a likelihood-ratio test with and without the constraint of
associated loci allowed rejection of the clonal hypothesis. A criticism
of this type of study concerns the possibility of hypervariable loci,
which would give the observed result regardless of the reproductive
mode. In this case, the authors presented several arguments against hypervariability in the loci, including the occurrence of only two of
four possible alleles (nucleotides) at each of the 14 loci, the lack of
recombination among variable nucleotide positions separated by 500 bp
or less, and the lack of variation at the same loci in individual fungi
collected from geographically distant locations. Another criticism
concerns the relatively small number of isolates sampled. Given that
C. immitis produces mitospores, it would be reassuring to
find isolates with identical genomes; the fact that they have not been
found suggests that the extent of genetic variation in C. immitis has not yet been thoroughly sampled.
|
Genetic differentiation and isolation. The same loci used in the Arizona study of C. immitis were characterized for collections of clinical isolates in California and Texas to study gene flow among the populations (24). At four of the loci, allele frequencies between California on the one hand and Arizona or Texas on the other were very different and F statistics (estimated as theta) showed little or no gene flow. Gene flow between Arizona and Texas was also significantly reduced, but not as much as between either of these states and California. A criticism of this study is that the test was biased because the Arizona loci were selected to be polymorphic, preventing any loci from being fixed for either allele in Arizona (cf. Fig. 14); in this case, the effect of the bias was to overestimate gene flow. It seems safe to say that genetically isolated populations of C. immitis are present, although sampling at locations between Tucson, Az., and San Antonio, Tex., might show a gradient of changing gene frequencies instead of a sharp boundary.
Cryptic species.
Seventeen C. immitis
isolates were separated into two groups, those inside and those outside
California, in phylogenetic trees made from partial gene sequence of
five protein-coding loci (chitin synthase, chitinase, orotidine
monophosphate decarboxylase, serine proteinase, and a T-cell reactive
protein with similarity to a mammalian dioxygenase) (67,
68). There were no shared polymorphisms between the two groups;
i.e., polymorphisms in the ancestral population had been eliminated in
at least one of the two groups over time, so that none remained.
Assuming a mutation rate of 10
9 per nucleotide per year,
the two groups have been genetically isolated for ca. 11 × 106 years, considerably longer than for sibling
Drosophila species (108). The Silveira isolate,
collected in California in 1951, fell in the non-California group. This
isolate was different from most California isolates in the RFLP study
(122). It is possible that the patient acquired
coccidioidomycosis outside California, in a manner similar to that of a
Texas patient who had acquired C. immitis in California
(24). It is also possible that isolates with non-California
genotypes are found in California, necessitating a renaming of the
groups. Because C. immitis is not transmitted from host to
host and must be acquired from the environment, examining environmental
isolates may provide the best data on which genotypes are endemic in
any given locale.
Histoplasma capsulatum
Histoplasma capsulatum (anamorph or mitosporic state of Ajellomyces capsulatus) is a haploid, filamentous ascomycete that grows as a yeast in patients. It makes two types of mitospores as well as meiospores. Mating is regulated by two alleles at one locus and results in meiotic progeny (ascospores), as observed in cultivation (70).
Strain typing. H. capsulatum was an early subject of studies of molecular variation. Two RFLP studies involving mtDNA and rDNA probes (106, 116) revealed polymorphisms in H. capsulatum isolates, and the availability of a yeast phase-specific nuclear gene (yps-3) provided an additional tool to type strains (65). Types based on mtDNA variation or yps-3 variation are congruent for U.S. H. capsulatum isolates from individuals without AIDS. Central American isolates, or U.S. isolates from AIDS patients, which may have originated in Central America or the Caribbean, have a different yps-3 genotype and harbor several mtDNA subtypes (65). Recent characterization of 16 biallelic loci characterized as single-nucleotide substitutions have produced unique multilocus genotypes for each of 30 H. capsulatum isolates from Indianapolis (26). The more recent discovery of multiallelic microsatellite loci indicates that individual isolates may be characterized with just a few loci (25).
Cryptic species. H. capsulatum has three varieties, capsulatum, duboisii, and farciminosum. H. capsulatum var. capsulatum and H. capsulatum var. duboisii have the same teleomorph, Ajellomyces capsulatus. Although H. capsulatum var. farciminosum has not reproduced meiotically in cultivation, it does respond to pairings with mating tester strains of var. capsulatum, but not to the point of making meiospores. H. capsulatum var. capsulatum is endemic in North America and South America; H. capsulatum var. duboisii is endemic in equatorial west Africa, and the yeast cells that it makes in vivo are reported to be larger than those of H. capsulatum var. capsulatum. H. capsulatum var. farciminosum is said to be found only in animals, whereas the two other varieties can be found in both humans and animals. H. capsulatum var. capsulatum is most commonly associated with lung disease, which H. capsulatum var. duboisii is not. Judging from their clinical behavior, they seem to be overt species, but morphologically there are few differences. Comparison of nucleic acid sequences from the large-subunit rDNA and DNA-DNA hybridization of single isolates of all three H. capsulatum varieties showed that they are closely related, but the relationships among them were not resolved (56).
Reproductive mode. Carter et al. (26) used the aforementioned biallelic nucleotide substitution loci to address the question of reproductive mode in 30 clinical H. capsulatum isolates. With the IA test, the null hypothesis of recombination could not be rejected, and, as with C. immitis, the tree length test showed a tree much longer than a strictly clonal tree (45:11), but one that was significantly (P = 0.05) shorter than those of most artificially recombined data (Table 1). This result is consistent with the knowledge that H. capsulatum mates in culture (70). Unlike the mtDNA and yps-3 data, the nucleotide substitution loci did not place isolates from AIDS patients into a group separate from the isolates from individuals without AIDS. However, the segregation of isolates from AIDS patients in previous studies may be related to their suspected Central American or Caribbean origin (65), and isolates from these locations were not included in the study by Carter et al. (26).
Genetic differentiation and isolation. A pioneering isoenzyme study of H. capsulatum surveyed 339 isolates for eight polymorphic enzyme loci, at least one of which probably contained more than one locus (46, 47). Based on allele frequencies reported for each of four sites (two in Missouri and one each in Kentucky and Michigan), there was as much variation within populations as between them, and the strongest genetic differentiation occurred between Michigan and the southern sites. Unfortunately, the data were not presented as multilocus genotypes for each isolate, and so tests of reproductive mode are not possible.
H. capsulatum var. capsulatum mates in cultivation and recombines in nature. The abundant production of mitospores also suggests that clonal reproduction is common in the fungus, although identical genotypes were not found in an 11-locus study of 30 clinical isolates (26). The three varieties based on phenotypic and clinical differences may represent species, but the necessary genetic studies have not been performed. If the varieties are as genetically distinct as species and they can still mate, the scarcity of fungal hybridization might be explained. That is, the fungal species that can hybridize are those that morphologically are so similar that they are not recognized as distinct species, letting their hybridization go undetected. Populations within varieties have not been identified, and isozyme data from North America do not support genetically isolated populations. Given the New World distribution of H. capsulatum var. capsulatum, it might be useful to compare isolates from North and South America.Candida albicans
The ascomycete Candida albicans, a relative of Saccharomyces cerevisiae (7, 69), is apparently diploid and has never been observed to mate in the laboratory or in nature. It grows as a yeast and as a hypha.
Strain typing. The full complement of methods has been tried on C. albicans. A recent comparison of techniques (33) showed that RFLPs of middle repeated sequences (103) and PCR fingerprinting with M13 provided the best discrimination. Pujol et al. (95) took the comparison among techniques a step further by making comparisons among four methods, three that generate many bands in one procedure and one that analyzes loci singly, MLEE. They made the comparison with similarity coefficients and phenetic trees, which carry the assumption of clonal reproduction. Two of the multiband methods (RAPDs and hybridization of restriction digestions to the Ca3 repeat) showed agreement with MLEE at the deep branches of the phenograms but not in the fine relationships among closely related isolates. The third method, hybridization of CARE2 repeated DNA to restriction digests, was at odds with the other three. While it is clear that the multiband methods allow many different genotypes to be recognized, use of the data to infer genetic relationships is compromised by problems with homology of bands or fragments. Previous analysis of the Ca3 repeated sequence (101) showed that the sequence evolves extremely rapidly, even between transfers in the laboratory. This hypervariability is probably due to transposition of the repeated sequence used to type the strains (76). Another type of hypervariable locus, the microsatellites, has also been assayed in C. albicans, showing that single loci can be extremely polymorphic, with as many as 11 alleles in this diploid ascomycete (19, 39). A key question for the use of repeated sequences or microsatellites to do more than strain typing is, "Are the alleles identical by descent or by convergence?" A study comparing results from these extremely variable loci to more sedate, biallelic loci could answer the question about hypervariability and identity and provide a useful service to the field. Such a study in horseshoe crabs (Limulus polyphemus) found incongruence between genealogies based on the microsatellites or the flanking regions (89). However, these hypervariable loci may be just what is needed when the question concerns the origin of nosocomial infections, because they have so many alleles.
Cryptic species. Sullivan et al. (109) uncovered a group of C. albicans isolates that were genetically isolated and that could be distinguished from other Candida species by a combination of characters: chlamydospore aggregation, serotype, and carbohydrate utilization. Their discovery employed RFLPs in DNA hybridizing to microsatellite sequences and was corroborated by using RAPDs, isoenzyme analysis, and comparison of sequence of the V3 region of large-subunit rDNA. The genetically isolated group is now known as C. dubliniensis (110).
Reproductive mode. Pujol et al. (96) used isoenzyme variation to study the reproductive mode of oral C. albicans isolates from human immunodeficiency virus (HIV)-positive patients. Analysis of 55 isolates at 13 polymorphic loci found 2 significantly overrepresented genotypes and between 3 and 6 of 13 loci with genotypic frequencies that differed significantly from Hardy-Weinberg expectation; both results provide evidence of clonality. However, genotypic frequencies for most of the loci were not significantly different from the Hardy-Weinberg expectation, indicating that recombination may also be operating (95). Our analysis of their isozyme data (12 loci and 41 genotypes) by both the IA and the tree length test allowed rejection of the null hypothesis of recombination at P < 0.001 (Table 1); i.e., by these tests, the population is clonal.
Boerlin et al. (10) characterized 10 polymorphic isoenzyme loci in a collection of 189 C. albicans isolates from oral cavities of both HIV-positive and HIV-negative patients, as well as some isolates from patients with invasive disease. They found 52 different genotypes, 1 of which was overrepresented. Of the 10 loci, 8 showed genotypic frequencies significantly different from Hardy-Weinberg expectations, and only 1 of 28 pairwise comparisons loci showed a lack of association. Clearly, there is good evidence for clonality. The two loci with genotypic frequencies expected under Hardy-Weinberg equilibrium may argue for recombination or may represent hypervariable loci. The authors used a distance method (unweighted pair group method with arithmetic mean [UPGMA]) to make a phylogenetic tree based on the isoenzyme data, and it failed to show grouping of isolates from HIV-positive or HIV-negative patients or of invasive isolates. This type of analysis makes the assumption of purely clonal reproduction, which may be justified in this case. Gräser et al. (54) used 12 nucleic acid loci, characterized as single nucleotide substitutions or small length mutations on six PCR fragments, to study 52 C. albicans isolates taken from 52 asymptomatic subjects. These codominant loci were discovered by using SWAPP (23), and the isolates showed overrepresentation of some multilocus genotypes and significant deviation from Hardy-Weinberg expectations in 8 of the 12 loci; again, this provides evidence for clonality. However, genotypic frequencies were within Hardy-Weinberg expectations for 4 of the loci, and most of 66 pairs of loci were not associated (48 to 53, depending on the method of determining significance). Our analysis of their data (Table 1, conservatively grouping 10 of their nucleotide sites into 5 independent loci for 26 genotypes) by IA test supported rejection of recombination (P < 0.05), while use of the tree length test did not support rejection of recombination (P < 0.1). As with the Pujol et al. (96)