CMR FigSearch
Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowReprints and Permissions
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Mathema, B.
Right arrow Articles by Kreiswirth, B. N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mathema, B.
Right arrow Articles by Kreiswirth, B. N.

 Previous Article  |  Next Article 

Clinical Microbiology Reviews, October 2006, p. 658-685, Vol. 19, No. 4
0893-8512/06/$08.00+0     doi:10.1128/CMR.00061-05
Copyright © 2006, American Society for Microbiology. All Rights Reserved.

Molecular Epidemiology of Tuberculosis: Current Insights

Barun Mathema,1,2 Natalia E. Kurepina,1 Pablo J. Bifani,3 and Barry N. Kreiswirth1*

Tuberculosis Center, Public Health Research Institute, Newark, New Jersey,1 Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, New York,2 Molecular Pathology of Tuberculosis, Pasteur Institute, Brussels, Belgium3

SUMMARY
INTRODUCTION
    Epidemiology of Tuberculosis
        Global incidence and prevalence.
        Drug resistance.
        HIV/AIDS.
    Natural Course of Tuberculosis
MOLECULAR EPIDEMIOLOGY
    Genotyping of M. tuberculosis: Current Methods
        IS6110.
        PGRS.
        Spacer oligonucleotide typing.
        VNTR and MIRU analysis.
        SNP.
        Genomic deletion analysis.
        Identification of strain-specific markers for rapid diagnosis.
MOLECULAR EPIDEMIOLOGY AND PUBLIC HEALTH
    Transmission dynamics.
    Molecular studies on drug resistance.
    Recurrent TB.
    Laboratory error/cross-contamination.
PHYLOGENY AND STRAIN FAMILIES OF M. TUBERCULOSIS
STRAIN-SPECIFIC VARIATIONS IN IMMUNITY AND PATHOGENESIS
VACCINES
STRAIN FITNESS
CONCLUSION
ACKNOWLEDGMENTS
REFERENCES

   SUMMARY
 Top
 Next
 References
 
Molecular epidemiologic studies of tuberculosis (TB) have focused largely on utilizing molecular techniques to address short- and long-term epidemiologic questions, such as in outbreak investigations and in assessing the global dissemination of strains, respectively. This is done primarily by examining the extent of genetic diversity of clinical strains of Mycobacterium tuberculosis. When molecular methods are used in conjunction with classical epidemiology, their utility for TB control has been realized. For instance, molecular epidemiologic studies have added much-needed accuracy and precision in describing transmission dynamics, and they have facilitated investigation of previously unresolved issues, such as estimates of recent-versus-reactive disease and the extent of exogenous reinfection. In addition, there is mounting evidence to suggest that specific strains of M. tuberculosis belonging to discrete phylogenetic clusters (lineages) may differ in virulence, pathogenesis, and epidemiologic characteristics, all of which may significantly impact TB control and vaccine development strategies. Here, we review the current methods, concepts, and applications of molecular approaches used to better understand the epidemiology of TB.


   INTRODUCTION
 Top
 Previous
 Next
 References
 
Consumption, King’s Evil, lupus vulgaris, and phthisis are some of the more colorful names for tuberculosis (TB) that have been used in the last several centuries. Archeological findings from a number of Neolithic sites in Europe and sites from ancient Egypt to the Greek and Roman empires show evidence of a disease consistent with modern TB. TB was described by Hippocrates (400 B.C.) in Of the Epidemics and was clearly documented by Claudius Galen during the Roman Empire. Likewise, TB has been more recently immortalized by artists such as John Keats, D. H. Lawrence, Anton Chekhov, Emily Bronte, Charlotte Bronte, Franz Kafka, Amedeo Modigliani, and Frederick Chopin, all of whom were afflicted by the disease.

In 1882, Robert Koch made the landmark discovery that TB is caused by an infectious agent, Mycobacterium tuberculosis. Although demystifying, Koch's findings introduced the possibility that antimicrobial agents could be developed to combat this age-old scourge (144). Today, despite the availability of effective antituberculosis chemotherapy for over 50 years, TB remains a major global health problem. As the rates of TB infection have fallen dramatically in industrialized countries in the past century, resource-poor countries now bear over 90% of all cases globally. In fact, there are more cases of TB today than ever recorded. As such, there is a need for new therapeutics, diagnostics, and vaccines in conjunction with improved operational guidelines to enhance current TB control strategies. While much is known about the epidemiology of TB, key questions have eluded classical epidemiologists for decades. These include the current rates of active transmission by differentiating disease due to recent or previous infection; the determination of whether recurrent tuberculosis is attributable to exogenous reinfection; whether all M. tuberculosis strains exert similar epidemiologic characteristics in populations; and an understanding of transmission dynamics on a population- or group-specific level, as well as in identifying extensive transmission or outbreaks from what appear to be sporadic, epidemiologically unrelated cases. Molecular epidemiologic methods have facilitated studies that address some of these very questions. In this review, we present the current approaches and issues surrounding the molecular epidemiology of M. tuberculosis and the insights that this relatively new field has contributed to our general understanding of TB epidemiology, pathogenesis, and evolution.

Epidemiology of Tuberculosis

Global incidence and prevalence. The World Health Organization (WHO) estimates that approximately one-third of the global community is infected with M. tuberculosis (86). In 2000, an estimated 8 to 9 million incident cases and approximately 3 million deaths due to TB occurred worldwide (63). After human immunodeficiency virus (HIV)/AIDS, TB is the second most common cause of death due to an infectious disease, and current trends suggest that TB will still be among the 10 leading causes of global disease burden in the year 2020 (184).

The global distribution of TB cases is skewed heavily toward low-income and emerging economies. The highest prevalence of cases is in Asia, where China, India, Bangladesh, Indonesia, and Pakistan collectively make up over 50% of the global burden. Africa, and more specifically sub-Saharan Africa, have the highest incidence rate of TB, with approximately 83 and 290 per 100,000, respectively. TB cases occur predominantly (approximately 6 million of the 8 million) in the economically most productive 15- to 49-year-old age group (86). Our understanding of TB epidemiology and the efficacy of control activities have been complicated by the emergence of drug-resistant bacilli and by the synergism of TB with HIV coinfection.

Drug resistance. No sooner were the first antituberculosis agents introduced in humans than the emergence of drug-resistant isolates of M. tuberculosis was observed (172, 190, 293). In vitro studies showed that spontaneous mutations in M. tuberculosis can be associated with drug resistance, while selective (antibiotic) pressure can lead to enhanced accumulation of these drug-resistant mutants (72, 73). The efficient selection of drug resistance in the presence of a single antibiotic led investigators to recommend combination therapy using more than one antibiotic to reduce the emergence of drug resistance during treatment (40, 47, 88). Indeed, when adequate drug supplies are available and combination treatment is properly managed, TB control has been effective (145, 178).

Selection for drug-resistant mutants in patients mainly occurs when patients are treated inappropriately or are exposed to, even transiently, subtherapeutic drug levels, conditions that may provide adequate positive selection pressure for the emergence and maintenance of drug-resistant organisms de novo. One of the contributing factors is the exceptional length of chemotherapy required to treat and cure infection with M. tuberculosis (142). The need to maintain high drug levels over many months of treatment, combined with the inherent toxicity of the agents, results in reduced patient compliance and subsequently higher likelihood of acquisition of drug resistance (74). Therefore, in addition to identifying new antituberculosis agents, the need for shortening the length of chemotherapy is paramount, as it would greatly impact clinical management and the emergence of drug resistance. Since the early 1990s, an alarming trend and a growing source of public health concern has been the emergence of resistance to multiple drugs (MDR-TB), defined as an isolate that is resistant to at least isoniazid (INH) and rifampin (RIF), the two most potent antituberculosis drugs (133, 269). Recent estimates suggest that in 2003 there were 458,000 incident cases (including new and retreatment cases) of MDR-TB globally (95% confidence interval, 321,000 to 689,000) (85, 297). These figures suggest that prevalent cases may be two or three times more numerous than incident cases and that a far greater number of individuals are latently infected (33, 284). While treatment for MDR-TB has greatly improved (mainly in resource-rich settings), it is generally more difficult to treat and has been associated with very high morbidity and mortality, prolonged treatment to cure, and an increased risk of spreading drug-resistant isolates in the community (26, 67, 132, 178).

HIV/AIDS. HIV infection exerts immense influence on the natural course of TB disease. Individuals with latent M. tuberculosis infection who contract HIV are at risk of developing active TB at a rate of 7 to 10% per year, compared to approximately 8% per lifetime for HIV-negative individuals (219, 220). HIV-infected persons recently infected with M. tuberculosis may progress to active disease at a rate over 35% within the first 6 months, compared to 2 to 5% in the first 2 years among HIV-negative individuals (70). With the introduction of highly active antiretroviral therapy for HIV, the risk of progression to TB among those coinfected with M. tuberculosis, while higher than among HIV-negative cases, is considerably lower (8, 111). The role for CD4+ T cells in protecting against disease progression is underscored by the marked susceptibility to TB in patients with advanced HIV-induced CD4+ T-cell depletion (70, 77, 219). The natural course of HIV disease may also be influenced by M. tuberculosis infection. M. tuberculosis infection results in macrophage activation, which can house resident HIV virions, resulting in active expression of HIV antigens rather than the prolonged latency without antigenic expression of HIV proteins (252). In support of this, Pape et al. observed more rapid progression to AIDS among tuberculin skin test (TST)-positive individuals not given treatment for latent TB infection (INH) than among those who were treated with INH (195). Thus, HIV infection tends to accelerate the progression of TB, while in turn, the host immune response to M. tuberculosis can enhance HIV replication and may accelerate the natural course of HIV/AIDS (252).

Natural Course of Tuberculosis

Historically, much of our understanding of TB has stemmed from descriptive epidemiological studies, limited animal studies, and clinical observations that were made in the early half of the 20th century. These studies have been central to formulating a generalized hypothesis regarding all phases of TB pathogenesis, from exposure to successful infection and subsequent disease (59, 61, 91, 170, 171, 237, 240). Infection is established in approximately one-third of individuals exposed to the tubercle bacillus, and among those infected only 10% ever become symptomatic (61, 134, 203). In most populations, TB involves a long latency period, with symptomatic presentation occurring from 3 months (mainly in the immunocompromised) to decades after the establishment of infection (61, 142). Latency is one of the main hallmarks of M. tuberculosis infection and pathogenesis and has been reviewed specifically elsewhere (4, 118).

TB is spread by aerosolization of droplet nuclei bearing M. tuberculosis particles released from the lungs of patients with cavitary pulmonary or laryngeal disease. Once the particles, of 1 to 5 µm in diameter, are inhaled and phagocytosed by resident alveolar macrophages, a vigorous host cellular immune response involving cytokines and a large number of chemokines ensues (126, 164, 212). This response presumably arrests and limits infection to the primary site of invasion, the lung parenchyma and the local draining lymph nodes ("Ghon complex"), in the majority (90%) of immunocompetent individuals (31, 110). Protective immunity is characterized by granuloma formation that consists primarily of activated M. tuberculosis-infected macrophages and T cells. In 10% of presumed immunocompetent individuals, the infection is not contained and continual bacillary replication (doubling time, 25 to 32 h) results in disease symptoms and associated pathology, including tissue necrosis and cavitation (175). In most instances, patients respond to antibiotic treatment by clearance of the bacilli from tissues and subsequently from sputum, partial reversal of the granulomatous process, and clinical cure (227). When disease ensues, the presentation is variable in regard to severity, duration, therapeutic response, and tissue tropism. Although commonly pulmonary, M. tuberculosis can infect a variety of tissues, such as the meninges, lymph nodes, and tissues of the spine (134, 221). A number of external factors may influence the progression and nature of disease. These include comorbid conditions that dampen the host immune system, such as poorly controlled diabetes mellitus, renal failure, chemotherapy, malnutrition, or intrinsic host susceptibility (19, 281).

Due to the variability in time from infection to disease between individuals, incident cases are comprised of reactivation of a historic infection or the result of a recent transmission event (274, 275). While treatment of reactive and recent transmission cases is similar, the latter may be part of an ongoing outbreak or series of transmission events that warrants control measures. Therefore, a central limitation in understanding the transmission dynamics of M. tuberculosis is that patient links often become obscured as the concentric circles of traditional epidemiological relatedness (contact tracing) are more removed from the index case. In general, cases in low-incidence areas tend to comprise mostly reactive disease, while those in high-incidence regions include both reactive disease and recent transmission.

A crucial aspect in understanding the dynamics of a TB epidemic is the ability to track the spread of specific strains in the population. As discussed below (see "Molecular Epidemiology and Public Health"), over the past two decades, previously unresolved issues, such as population estimates of recent transmission and the ability to distinguish endogenous reactivation from exogenous reinfection, have been made possible by the use of a variety of molecular techniques (15, 35, 46, 109, 224, 263, 275).


   MOLECULAR EPIDEMIOLOGY
 Top
 Previous
 Next
 References
 
Molecular epidemiology is a field that has emerged largely from the integration of molecular biology, clinical medicine, statistics, and epidemiology. In essence, molecular epidemiology focuses on the role of genetic and environmental risk factors, at the molecular/cellular or biochemical level, in disease etiology and distribution among populations. More specifically to infectious diseases, molecular epidemiology attempts to utilize a multidisciplinary approach to identify factors that determine disease causation, propagation/dissemination, and distribution (in time and space). This is primarily achieved by associating epidemiologic characteristics with the biologic properties of clinical isolates recovered from symptomatic individuals.

The mid-1980s saw the first integration of molecular methods to discriminate between clinical isolates of M. tuberculosis. While previous methods, such as colony morphology, comparative growth rates, susceptibility to select antibiotics, and phage typing, were useful, they did not provide sufficient discrimination, thus limiting their utility in TB epidemiology. That is, prior to molecular methods, understanding the spread of TB was imprecise and relied on observational data or anecdotal correlations. However, given the plethora of molecular tools available, it is critical to choose an appropriate method(s) to address a particular study question, e.g., transmission dynamics, outbreaks, or phylogenetics. In general, the key aspects in choosing an adequate molecular approach for studying TB epidemiology are the observed rate of polymorphism (stability of biomarker) and the genetic diversity of strains in the population. That is, the rate of change of a biomarker must be adequate to distinguish nonepidemiologically related strains and yet sufficiently "slow" to reliably link related cases. This issue, coupled with general background TB prevalence, should be taken into consideration when choosing molecular epidemiologic methods or in evaluating data.

Genotyping of M. tuberculosis: Current Methods

The TB research community entered the genomic era in 1998 with the publication of the complete annotated genome of M. tuberculosis laboratory strain H37Rv (60). Since then, M. tuberculosis clinical strain CDC1551 and six related mycobacteria, M. leprae, M. ulcerans, M. avium, M. avium paratuberculosis, M. smegmatis, and M. bovis, have been fully sequenced; others, including M. microti, M. marinum, M. tuberculosis strain 210, and M. bovis BCG (bacillus Calmette-Guérin), are nearing completion.

Studies show that the M. tuberculosis complex (i.e., M. tuberculosis, M. bovis, M. microti, M. africanum, M. canettii, and, more recently, M. pinnipedii and M. caprae [7, 64, 103]) genomes are highly conserved: comparative sequence analysis of the 275-bp internal transcribed spacer (ITS) region, an otherwise highly polymorphic region which separates the 16S rRNA and the 23S rRNA, revealed complete conservation between members of the M. tuberculosis complex. Furthermore, sequence analysis of 56 structural genes in several hundred phylogenetically and geographically diverse M. tuberculosis complex isolates suggested that allelic polymorphisms are extremely rare (139, 188, 235). While the members of the M. tuberculosis complex display diverse phenotypic characteristics and host ranges, they represent an extreme example of interspecies genetic homogeneity, with an estimated rate of synonymous nucleotide polymorphisms of 0.01% to 0.03% (60, 98, 123, 235) and no significant evidence for horizontal genetic transfer between genomes, unlike most bacterial pathogens (3, 41, 123, 248).

While the M. tuberculosis complex genome is highly restricted (conserved) in relation to other bacterial pathogens, this monomorphic species does have polymorphic genomic regions. Much like eukaryotic genomes, those of prokaryotes (such as M. tuberculosis) are characteristically punctuated by monomeric sequences repeated periodically (repeated units). There are two types of repetitive units, interspersed repeats (IR) (direct repeats and insertion sequence-like repeats) and tandem repeats (TR) (head-to-tail direct uninterrupted repeats). Prokaryotic microsatellites (1- to 10-bp repeats) and minisatellites (10- to 100-bp repeats, commonly referred to as variable-number tandem repeats [VNTR]) are located in intergenic regions, in regulatory regions, or within open reading frames and are abundant throughout most bacterial genomes. Below, we describe some of the most common genotyping methods currently used. Table 1 summarizes the advantages, limitations, and applications of the various molecular techniques.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Evaluation of methods currently used to study the molecular epidemiology of TB

 
IS6110. Insertion sequences (IS) are small mobile genetic elements, usually less than 2.5 kb in size, that are widely distributed in most bacterial genomes (52). IS elements are commonly defined as carrying only the genetic information related to their transposition and regulation, unlike transposons, which can also carry genes that encode phenotypic markers (e.g., antibiotic resistance). Transposition of IS elements often causes gene disruptions that can have strong polar effects and in other cases can lead to the activation or alteration of expression of adjacent genes due to the regulatory sequences, including promoters and protein-binding sequences (52, 216a). From an evolutionary perspective, there are at least two distinct hypotheses explaining the role of IS elements in genomes. One regards the elements as genomic parasites that, on balance, harm their hosts (i.e., bacteria) (53). In contrast, others postulate that IS elements are important to their hosts for adaptive evolution, which is maintained by selection of occasional advantageous IS-derived mutations (32).

IS elements in bacterial species are present in varying numbers of copies: IS1 in Escherichia coli strains is present in 2 to 17 copies, whereas the Shigella species contain from 2 to 40 copies (52). Thierry et al. first described IS6110, a 1,355-bp member of the IS3 family that, when intact, is unique to the M. tuberculosis complex (250). IS6110 has an imperfect 28-bp inverted repeat at its ends and generates a 3- to 4-bp target duplication on insertion. Although "hot spots" have been noted (regions in the M. tuberculosis chromosome where IS6110 seems to preferentially insert), IS6110 elements are more or less randomly distributed throughout the genome, with copy numbers ranging from rare clones lacking any IS6110 elements to those with 26 copies (Fig. 1) (152, 174). In 1993, van Embden and colleagues proposed a standardized method for performing IS6110-based Southern blot hybridization analysis (259). The recommendation was based on the use of a common restriction endonuclease (PvuII, which cleaves IS6110 at a single asymmetric site and yields reasonable-size M. tuberculosis chromosomal fragments), a hybridization probe (specific to the right side of IS6110, whereby each hybridizing band corresponds to a PvuII-PvuII chromosomal fragment with a single IS6110 insertion), and standardized molecular weight markers (127). The concurrent development of software applications that assist in the analysis of the resulting IS6110-based restriction fragment length polymorphism (RFLP) patterns has allowed for intra- and interlaboratory comparisons of clinical isolates and the establishment of large national and international strain (and genotype) archives (e.g., Centers for Disease Control and Prevention, Atlanta, GA; Public Health Research Institute, Newark, NJ; National Institute of Public Health and Environment, Bilthoven, The Netherlands) (125, 150, 244, 261).


Figure 1
View larger version (21K):
[in this window]
[in a new window]
 
FIG. 1. Chromosomal maps of three M. tuberculosis strains: CDC1551 (http://tigrblast.tigr.org/cmr-BLAST/), H37Rv (http://genolist.pasteur.fr/TubercuList/index.html), and 210 (http://tigrblast.tigr.org/ufmg/index.cgi?database=m_tuberculosis-strain210%7Cseq). Arrows show the positions and orientations of IS6110 insertions. Left- and right-oriented arrows on the maps indicate IS6110 insertions, confirmed by insertion site mapping at the Public Health Research Institute Tuberculosis Center (unpublished data) and the positions of IS6110 according to Beggs et al. (18). The coordinates of the IS6110 insertions in all three strains correspond to the H37Rv annotated sequence. IS6110 mapping indicates that despite the insertion-preferential loci ("hot spots"), the precise positions (flanking sequences of the IS element) and orientations of the inserted sequences are not repeated in the three strains analyzed; i.e., none of the flanking regions of the four IS6110 copies in CDC1551 correspond to the 17 insertion sites in H37Rv or to the 23 positions (resulting in 21 hybridization bands) of the IS element in strain 210. The structures of the DR loci within these three strains are shown; black dots indicate spacers in the DR locus of corresponding strains, and triangles indicate deleted spacers. The chromosomal loci oriC and DR were described previously (137, 152, 167).

 
Initially, the dynamics of IS6110 transposition juxtaposed with the stability required for use in epidemiologic investigations was a cause for concern. However, when strains were cultured in vitro (liquid media) for 6 months, in macrophages over a 4-week period, and in a guinea pig model for more than 2 months, their IS6110-based RFLP patterns remained stable (50, 267). These studies attest to the stability of IS6110 over short time periods while transposing over longer time intervals. The IS6110 transposition half-life (t1/2) (the period over which the IS-specific hybridization pattern does not change), taken from sequentially positive culture with sampling intervals ranging from days to months, was estimated to be between 3 and 4 years (75, 291). Warren et al. investigated the stability of IS6110 banding patterns in serial M. tuberculosis isolates collected from patients living in areas of high TB incidence and noted a half-life of 8.74 years when a constant rate of change was assumed (278). The authors note that the rate may be composed of the high rate of change seen during the early disease phase (t1/2 = 0.57 years), when the mycobacterial replication rate is presumably high, and the lower rate in the late disease phase (t1/2 = 10.69 years), when bacterial doubling times are longer during or after treatment. Therefore, they conclude that the observed IS6110 stability is strongly influenced by the time between onset of disease and sample collection. Another investigation of serial patient isolates used deterministic and stochastic simulation models to estimate an IS half-life of 2.4 years for a strain that has 10 IS6110 copies (215). Indeed, IS6110 transposition, which is a replicative process, and half-life may be heavily dependent on strain-specific in vivo replication rates, host-pathogen interactions, or anatomical properties. Nonetheless, IS6110-based RFLP patterns seem to be sufficiently stable (and polymorphic) for studying TB transmission dynamics at the local or population level and over time. For instance, Lillebaek et al. used IS6110 genotyping to demonstrate endogenous reactivation of TB after over 30 years of latency (156).

The utility of any molecular epidemiologic method in population analysis, in addition to adequate stability/polymorphism, is reliant on sufficient biomarker-specific diversity of isolates. Assignment of a genotype is strengthened when there is adequate background strain diversity. In a population-based study in New Jersey, Bifani et al. noted that approximately one-third of the 1,207 clinical isolates subjected to IS6110-based RFLP analysis were unique (or "orphans") to the sample, while a third of the isolates were categorized into 11 major strain groups that consisted of isolates from 10 or more patients (25). Presumably there is a discrete number of distinct strain types circulating within any given population; classifying a genotype as rare or unique is heavily dependent on the isolate sampling schemes and the size and diversity of the reference database.

As with any genotyping system, there are limitations inherent to IS6110-based RFLP analysis. One such limitation, not partial to IS6110 genotyping, is the interpretation of molecular data in drawing epidemiologic conclusions. That is, genotypic clustering (identical/similar fingerprints of strains isolated from at least two patients) is not synonymous with epidemiologically defined clustering (patient-patient link). This is especially important to keep in mind in areas with low M. tuberculosis genetic diversity or in areas of high endemicity (13, 27, 29, 38). In such situations, strain clustering may involve a number of distinct transmission pathways that finally may not be epidemiologically informative (false-positive links). This shortcoming is similar to that of conventional field epidemiologic investigations where distinct transmission patterns are often elusive, particularly in areas of high TB incidence (29, 112, 224). Therefore, suggested molecular epidemiologic links are greatly strengthened when they are in concordance with conventional methods of TB control (25, 27, 29). A second limitation often cited is the limited resolution in analyzing clinical strains with six or fewer copies of IS6110 ("low-copy-number" strains, clusters I, IIA, IV, and V [Fig. 2A ]) (14, 150, 288, 290). The resolution afforded by the IS6110 RFLP genotyping method is inversely proportional to the number of IS elements, such that identical hybridization patterns may not indicate clonality when six or fewer bands comigrate. Although an IS6110-probed band on a hybridization blot indicates the presence and size of the PvuII-PvuII IS6110-associated DNA fragment, it does not provide the chromosomal location of the IS element. Therefore, identical bands may be from distinct genomic locales. Low-copy-number isolates have been shown to be genetically distinct when secondary independent biomarkers were used (14, 54, 210, 288). In contrast, high-copy-number strains (i.e., bearing more than six IS6110 copies, clusters I to III and VI to VIII [Fig. 2A]) with identical patterns are more likely to be clonal, as the probability of hybridization bands of similar size originating from different IS6110 locations is low. There exist, albeit rarely, strains that lack IS6110, rendering this genotyping method irrelevant (71, 217). Additional limitations of this genotyping system include its inability to distinguish among M. tuberculosis complex members and its labor intensiveness (Table 1) (69).


Figure 2
Figure 2
View larger version (93K):
[in this window]
[in a new window]
 
FIG.2. Representative genotypes superimposed on the SNP-derived phylogenetic framework of M. tuberculosis. Based on SNP analysis of M. tuberculosis clinical isolates (including 1,743 strains from Public Health Research Institute Tuberculosis Center strain collection), a phylogenetic tree with the nine clusters of M. tuberculosis isolates was used to illustrate common genotypic patterns (122). (A) IS6110-based RFLP images. (B) Spoligotype patterns (black dots show spacers present in the chromosomal DR region of strains, and open triangles indicate deleted spacers). The strain spoligofamily definition corresponds to the SpolDB4 (43). Cluster I includes M. tuberculosis complex strains and TbD1+ ancestral isolates. Cluster II is represented by the W-Beijing strain family, including strain 210. Cluster II.A comprises the CAS spoligotype isolates. Clusters I and II belong to PGG1, while II.A comprises both PGG1 and PGG2. The coclustering of isolates from PGG1 and PGG2 in cluster II.A is also shared by some spoligotypes (panel B). PGG2 is further delineated into clusters III, IV, V (including CDC1551), and VI, while PGG3 is represented by clusters VII and VIII (including H37Rv). Isolates with a single IS6110 insertion are found in clusters I, IIA, and IV. Likewise, some spoligotypes appear in more than one cluster. Similar/identical spoligopatterns may be found in unrelated strain clusters (e.g., "Beijing" spoligotypes in cluster VI or "Haarlem" spoligotypes in cluster VII) as a result of independent spacer deletion events; this convergence of spoligotypes could lead to the misinterpretation of genotyping results and illustrates the necessity of using two or more techniques in genotypic analysis. *, annotated laboratory strains (CDC1551 and H37Rv). (Adapted from reference 122 with permission. © 2005 by the Infectious Diseases Society of America. All rights reserved.)

 
PGRS. Like IS6110-based RFLP analysis, polymorphic GC-rich repetitive sequence (PGRS) genotyping, first described by Ross et al., is a Southern blot hybridization technique that utilizes the PGRS-specific probe (a 3.4-kb fragment of the PGRS sequence) cloned in plasmid pTBN12 (216). When pTBN12 is used on AluI-digested DNA, it can distinguish strains from unrelated cases of TB and demonstrate identical banding patterns for isolates from epidemiologically related cases (216, 288). In fact, isolates clustered by IS6110-based RFLP analysis were further discriminated by PGRS typing (54). This is particularly the case when IS6110 low-copy-number strains are further analyzed by PGRS genotyping (210, 289). This method, like IS6110 genotyping, is resource intensive, but unlike the IS6110 system, the hybridization patterns generated by PGRS typing are often too complex to computerize for standardization and analysis.

Spacer oligonucleotide typing. After IS6110-based RFLP analysis, spacer oligonucleotide typing (spoligotyping) is the most commonly used PCR-based technique for subspeciating M. tuberculosis strains (121). M. tuberculosis complex strains contain a distinct chromosomal region consisting of multiple 36-bp direct repeats (DRs) interspersed by unique spacer DNA sequences (35 to 41 bp) (Fig. 1). Two forms of genetic rearrangements have been observed: one type consists of variation in one or a few discrete, contiguous repeats plus spacer sequences (DVRs), which is probably driven by homologous recombination between adjacent or distant chromosomal DRs; the other is driven by transposition of IS6110, which is almost invariably present in the DR locus of M. tuberculosis complex strains (260). As a result of these events, some spacers may be deleted from the genome.

Spoligotyping is based on the detection of 43 interspersed spacer sequences (originally identified in laboratory strain H37Rv and M. bovis BCG vaccine strain P3) in the genomic DR region of M. tuberculosis complex strains. Additional spacers in this region have been reported (260). Membranes spotted with 43 synthetic oligonucleotides are hybridized with labeled PCR-amplified DR locus of the tested strain, resulting in a pattern that can be detected by chemiluminescence (137). The results are highly reproducible, and the binary (present/absent) data generated can be easily interpreted and computerized and are amenable to intralaboratory comparisons. A recent edition of the international spoligotyping database, SpolDB4, contains 1,939 different spoligotypes (ST) identified worldwide that are organized into large ST families (43). ST families are nominated based on the common motif of deleted spacers. Recently, a web-based program has been developed to place spoligotypes into ST families (273). Spoligotyping, unlike IS6110 genotyping, which requires approximately 2 µg of bacterial DNA, can be performed with considerably less DNA and in a fraction of the time; it also allows genotyping of boiling-prepared or impure DNA, nonviable specimens, paraffin-embedded material, and material from slides of Ziehl-Neelsen stainings (82, 205, 258). In some instances, spoligotyping can distinguish among members of the M. tuberculosis complex based on the species-specific presence/absence of spacers (129, 137). It is thought that DR regions irreversibly lose spacers due to homologous recombination or IS6110 transposition events and cannot gain additional DNA fragments. Of note, deletions of DRs and spacers can occur multiple times and independently in unrelated strains, leading to convergent evolution, i.e., the appearance of identical spoligopatterns in phylogenetically unrelated M. tuberculosis strains (Fig. 2B) (277).

Although spoligotyping can be a powerful method to study the molecular epidemiology of M. tuberculosis, its discriminatory power in general is inferior to that afforded by IS6110-based RFLP analysis (150). Strains having identical spoligotype patterns yet distinct IS6110 fingerprint profiles are often encountered (22, 167, 260). For instance, the W-Beijing family of strains, a large phylogenetically related group of M. tuberculosis isolates that comprise hundreds of similar yet distinct IS6110 variations (Fig. 2A, cluster II), all have an almost identical spoligopattern lacking spacers 1 through 34 (Fig. 2B, Beijing) (24, 149, 268). In this case, spoligotyping may be useful in identifying W-Beijing strains in a population; however, this approach will not be able to discern transmission events, especially in regions where these genotypes are noted to be endemic, such as Russia, China, and South Africa (24, 115, 181). In contrast, spoligotyping has been shown to further discriminate IS6110 low-copy-number strains (14, 230). Kremer et al. have shown that spoligotyping together with IS6110 genotyping can provide an accurate and discriminatory genotyping system (150); this approach has been adopted for the universal genotyping program in New York, N.Y. (56). When used alone, the limited discriminatory power of spoligotyping is primarily because it targets a single locus that accounts for less than 0.1% of the M. tuberculosis genome (Fig. 1), unlike IS6110-based RFLP analysis, which examines the distribution of IS6110 throughout the entire genome.

VNTR and MIRU analysis. Frothingham and Meeker-O'Connell performed a systematic analysis of VNTR loci in M. tuberculosis complex strains and found 11 loci comprising five major polymorphic tandem repeats (MPTR) (A to E) and six exact tandem repeats (ETR) (A to F) ranging in size from 53 to 79 bp (104). Since then, additional VNTR loci have been reported (119, 128, 146, 159, 189, 223, 229). Supply et al. identified 41 VNTR of mycobacterial interspersed repetitive units (MIRU) (tandem repeats of 40 to 100 bp) located in mammalian-like minisatellite regions scattered around the chromosome of H37Rv, CDC1551, and AF2122/97 (169, 247), including loci 4 (VNTR0580) and 31 (VNTR3192), which correspond to ETR D and E, respectively (104). Twelve of the 41 MIRU loci were selected for genotyping of M. tuberculosis clinical isolates and were reported in a 12-digit format corresponding to the number of repeats at each chromosomal locus (169, 247). The digitized data generated by MIRU-VNTR profiling is highly amenable to inter- and intralaboratory comparisons. As additional M. tuberculosis VNTR loci have been included, the various nomenclature from one laboratory to another has created some confusion. As such, Smittipat et al. have proposed a standardization of the VNTR nomenclature based on the four digits of the locus position on the H37Rv genome (for an equivalence table, see reference 228).

The discriminatory power of MIRU-VNTR analysis is typically proportional to the number of loci evaluated; in general, when only the 12 loci are used, it is less discriminating relative to IS6110 RFLP genotyping for isolates with high-copy-number IS6110 insertions but more discriminating than IS6110 RFLP genotyping for isolates with low-copy-number IS6110. When more than 12 loci are used, or MIRU analysis is combined with spoligotyping, the discriminatory power approximates that of IS6110 RFLP analysis. Recently, a comparative study of genotyping methods aimed at evaluating novel PCR-based typing techniques found VNTR analysis to have the greatest discriminatory power among amplification-based approaches (147). MIRU-VNTR genotyping has been used in a number of molecular epidemiologic studies, as well as to elucidate the phylogenetic relationships of clinical isolates (148, 231, 246, 248, 280). VNTR analysis has also been used to evaluate M. bovis transmission (214). A high-resolution MIRU-VNTR genotyping system using an automated sequencer and PCR primers tagged with one of four fluorescent dyes (FAM, NED, VIC, and HEX) has been developed, allowing amplification of four different loci simultaneously by multiplex PCR.

VNTR loci have a variable range of alleles; for example, within the 12 MIRUs, MIRU loci 2 (VNTR0154) and 24 (VNTR2687) have mostly 1 or 2 copies, while VNTR3820 can have from 3 to 32 copies (66, 228, 231, 246). Likewise, the discriminating capacity of a given locus, the molecular clock, or variability in alleles also varies extensively among the loci. For example MIRU10 (VNTR0960) has been found to be the most polymorphic, having mostly 1 to 7 copies or up to 12 alleles in the M. tuberculosis collections analyzed (66, 231, 246; also, unpublished data). Variability at specific MIRU loci often depends on the sample collection (e.g., nationwide, population based, or convenience sampling), geographic origin, and inherent genetic diversity of the strains. For example VNTR2059 has been found to be polymorphic in some studies but not in others (66, 228). An alternative selection of VNTRs should consider the intrinsic differences and variability within different genetic groups and the endemicity or predominance of clones in specific geographic and demographic populations. The use of different sets of VNTR from one collection to another would hamper the ease of interlaboratory analysis, one of the advantages of VNTR analysis. On the other hand, broadly increasing the overall number of loci for genotyping would increase the cost and labor required for analysis and complicate analysis and interpretation, not to mention reducing enthusiasm for routine epidemiological investigations. Presently, there is a concerted effort to select a better combination of VNTR for genotyping (248a). Fifteen of 29 MIRU-VNTR were selected, and >800 clinical isolates of diverse origin were analyzed for discriminatory power relative to IS6110 genotyping. Although promising, this new selection of MIRU-VNTR has yet to be evaluated in different settings.

SNP. As extensive comparative genomic analysis of M. tuberculosis has revealed remarkable DNA conservation between chromosomes, noted genetic polymorphisms at the nucleotide level have provided researchers with markers to differentiate clinical isolates as well as to study the phylogenetic relatedness of clinical strains. Both nonsynonymous single-nucleotide polymorphisms (nsSNP) and synonymous SNP (sSNP) provide useful genetic information that can be applied to differentiate M. tuberculosis strains; however, they address different biologic questions.

In general, nonsynonymous polymorphisms create an amino acid change that might be subject to internal or external selection pressure. As such, nonsynonymous changes in drug resistance-determining genetic loci can result in phenotypic drug resistance. Accordingly, M. tuberculosis resistance to antituberculosis agents nearly always correlates with genetic alterations (nonsynonymous point mutations, small duplications, or deletions) in resistance-conferring chromosomal regions (Table 2) (168, 206, 208, 295). nsSNP in genes that confer drug resistance can aid in understanding the nature and spread of resistance between and within populations (see "Molecular studies on drug resistance," below).


View this table:
[in this window]
[in a new window]
 
TABLE 2. Genomic regions associated with decreased susceptibility to antituberculosis agentsa

 
In contrast, synonymous changes, which are considered functionally neutral, do not alter the amino acid profile. These neutral alterations, when in structural or housekeeping genes, can provide the basis to study genetic drift and evolutionary relationships among mycobacterial strains. Sreevatsan et al. exploited two functionally neutral nsSNP in codon 463 (Leu463Arg) of the catalase-peroxidase-encoding gene katG and codon 95 (Thr95Ser) of the A subunit of DNA gyrase gene gyrA to divide the modern M. tuberculosis complex into three principle genetic groups (PGGs), designated PGG1 (katG463 CTC [Leu] gyrA95 ACC [Thr]), PGG2 (katG463 CGG [Arg] gyrA95 ACC [Thr]), and PGG3 (katG463 CGG [Arg] gyrA95 AGC [Ser]) (235). A more robust analysis by Gutacker et al. further divided the three PGGs into nine major clusters (I to VIII and II.A) (122, 123). Other investigators have similarly used sSNP analysis to infer the phylogenetic structure of M. tuberculosis populations and have largely reported consistent findings (3, 9) (see "Phylogeny and Strain Families of M. tuberculosis," below). While these studies have shed more light on the phylogenetic relatedness of clinical isolates, they also serve as a broad framework to examine whether different lineages display different epidemiologies in populations. Furthermore, SNP analysis is amenable to targeting multiple polymorphisms that are informative in one platform, such as phylogenetic grouping, drug resistance, virulence, and other epidemiologically instructive markers.

Genomic deletion analysis. Comparative genomic analysis of strains H37Rv and CDC1551 has revealed large-sequence polymorphisms (LSP) in addition to SNP (98). LSP are thought to mainly occur as a result of genomic deletions and rearrangements rather than through recombination following horizontal transfer (42). In the absence of horizontal gene transfer, deletions are irreversible and often unique events and therefore have been proposed for genotyping as well as for constructing phylogenies (41, 117, 255). It was found that up to 4.2% of the entire genome can be deleted in clinical isolates compared to the genome of laboratory strain H37Rv (255). Brosch and colleagues were able to discern the M. tuberculosis complex by deletion analysis by showing that the majority of deletions are not the outcome of independent events but rather are scars of successive deletions (41). Once a deletion occurs in the progenitor strain, the specific deletion can serve as a genetic marker for the genotyping progenies of this strain. For instance, deletion of TbD1 (for "M. tuberculosis specific deletion 1," a 2,153-bp fragment) was identified in all modern M. tuberculosis strains; in contrast, ancestral strains tested have this locus present (246). Studies of genomic LSP have indicated that deletions are not always randomly distributed in the chromosome but tend to be aggregated (141, 255). Some loci are "hot spots" for DNA deletions and can occur independently in unrelated strains or lineages. Some chromosomal deletions are associated with IS transposition; this is particularly true of loci which are hot spots for IS6110 insertions, such as in the RvD5 and DR regions (41, 218). For other deletions (such as TbD1), the correlation with IS elements has not been determined. Deleted sequences can include putative open reading frames as well as intergenic regions and housekeeping genes (41, 141). Using deleted fragments as genetic markers, this analysis can be performed by a simple PCR-based method or by automated GeneChip techniques (255).

Both ancestral and frequent deletions can correlate with clonal lineages and be used to examine strain relatedness (141, 254, 255). However, careful selection of the deletions should be made when undertaking such studies. For example, deletions within the above-mentioned hot spots for IS insertion can occur independently in different strains, hence a form of convergent evolution. Nonetheless, analysis of chromosomal deletions has proven to be a powerful tool in investigating the global evolution and phylogeny of the M. tuberculosis complex (41, 182, 197). The resolution of deletion analysis can be greatly improved when the exact flanking sequence of lost DNA is determined, especially when analyzing deletions in hot-spot loci. The use of deletion analysis (or deligotyping) for epidemiological investigations is still nascent. This approach has proven very efficient when the presence of a specific deletion associated with a single strain has been predetermined. Under such circumstances, a single PCR may suffice to track down the spread of a single strain (99, 254). However, in studies in which no particular clone or strain is targeted, simultaneous analysis of multiple deletions is required. Recently, a high-throughput method for detecting large polymorphic deletions was developed (117). Here, 43 genomic regions for large-scale deligotyping analysis were selected, and amplicons generated from these 43 deligosites were hybridized to a membrane containing the target sequences of the 43 loci. This approach proved to be highly sensitive and efficient for the rapid screening of clinical isolates. As is the case for other techniques, high-throughput deligotyping needs to be evaluated against different panels of clinical strains and in different epidemiologic and geographic settings.

Identification of strain-specific markers for rapid diagnosis. Rapid identification of TB transmission is greatly facilitated when strain-specific properties are targeted, as in the case of MDR-TB outbreaks. Genetic markers can comprise any "unique" characteristic that can distinguish target isolates, including unique fragment sequences, duplications, deletions, neutral SNP or polymorphisms associated with IS6110, or a drug resistance phenotype. For example, insertion site mapping (ISM) is a method that can be applied using IS6110 junctions for such purposes. Kurepina et al. used the unique IS6110 insertion site (A1) in the intergenic region in the origin of chromosome replication (oriC) as a marker to identify and classify members of the W-Beijing strain family (Fig. 1, strain 210) (151, 152). Likewise, Plikaytis and coworkers used multiplex PCR to determine two IS6110 insertions within an NTF locus to identify the W-MDR outbreak isolates from New York, N.Y. (201). In addition, an investigation of a strain with a single IS6110 insertion with others possessing two and three insertions was made possible through ISM. This approach (with spoligotyping) allowed the detection of an otherwise unsuspected M. tuberculosis strain cluster (167). In another application, van Rie et al. described a PCR-based method identifying mixed infections from primary samples (262). Other variations on this technique, such as insertion site typing ("Insite"), which uses PCR amplification of IS6110-flanking sequences followed by hybridization against known IS6110-flanking regions, have been reported, allowing for large-scale screening of clinical isolates (241). Therefore, the use of specific markers is highly amenable to studying transmission, aiding in public health activities, and providing valuable evolutionary information.


   MOLECULAR EPIDEMIOLOGY AND PUBLIC HEALTH
 Top
 Previous
 Next
 References
 
The field of molecular epidemiology generally aims to investigate whether naturally occurring strains differ in epidemiology. For instance, do specific clinical strains differ in their infectiousness, severity of disease, or susceptibility to antituberculosis agents? In general, the increased resolution afforded by molecular techniques has enabled both short-term (local epidemiological), such as in suspected outbreaks or laboratory error (26, 89, 193), and long-term (global epidemiological) investigations, such as understanding spatiotemporal transmission and evolutionary dynamics (41, 123, 124, 182, 232, 235).

In addition, molecular epidemiology can serve to better inform routine TB control activities. Successful molecular epidemiological investigations have sought to estimate the fraction of cases attributable to recent transmission or reactivation (12, 35, 224), confirm laboratory-based errors (39, 193), distinguish between endogenous reactivation and exogenous reinfection (10, 46, 233, 263), investigate properties and patterns of drug resistance with specific populations or groups of strains (26, 89, 107, 158, 180, 191, 202, 264), and better understand transmission dynamics within specific populations (25, 109, 167, 177). Since molecular techniques do not substitute for classical approaches, the direct utility of molecular epidemiologic investigations for TB control activities are best illustrated when using both molecular and epidemiologic data sources. In addition to use in the study of transmission patterns within populations, molecular markers can be used to evaluate host- and strain-specific risk factors and possible genotypic-specific differences in phenotypes such as virulence, organ tropism, and transmissibility (108, 207, 209, 235, 256). Below we highlight some instances in which the utility of molecular epidemiologic methods has been realized. Table 3 summarizes some of applications of molecular techniques in the study of TB epidemiology.


View this table:
[in this window]
[in a new window]
 
TABLE 3. Applications of molecular techniques in studies of TB

 
Transmission dynamics.

The difficulty in studying the transmission dynamics of M. tuberculosis within a given population stems partly from the natural history of the pathogen itself. Since most successful infections are followed by a variable latency period, the timing of transmission events often remains elusive. Indeed, most immunocompetent individuals (approximately 90%) infected with M. tuberculosis remain disease-free during the course of their lives. Therefore, the long-term persistence of this organism, juxtaposed with the generally high reproductive number (i.e., the average number of new infections that one case causes annually), makes charting transmission pathways within and between communities extremely difficult. Most TB control programs (especially in more developed countries) rely on contact tracing, whereby individuals named by the index case are screened (using purified protein derivative [PPD]-based tuberculin skin testing or chest X rays) and, if indicated, recommended for treatment of latent TB infection (4). While these prevention activities in low-incidence communities have been useful (101), they are often imprecise and tend to underestimate the level of transmission (25, 29, 167, 224).

The imprecision of contact-tracing investigations has been highlighted by several reports that have indicated that limited or casual contact is sufficient for M. tuberculosis transmission. Large population-based molecular epidemiologic studies conducted in San Francisco, Calif., and Baltimore, Md., uncovered, through extensive contact investigations, approximately 10% to 25% epidemiologic links between patients in designated molecular clusters (29, 224). Similarly, Mathema et al., reporting on a molecular cluster of closely related strains identified from a population-based study in New Jersey, was able to uncover only 30% case links within the prescribed cluster with clinical, demographic, and contact-tracing information (167). The Baltimore study reported that within molecular clusters, patients with no known epidemiologic links shared similar risk and demographic profiles. Furthermore, Valway et al., investigating an outbreak of strain CDC1551 in a small, rural community with low TB incidence, documented extensive transmission associated with casual contact (256). Yaganehdoost and colleagues reported on complex transmission patterns among "bar-hopping" patrons in a specific neighborhood in Houston, Tex., as part of a population-based molecular epidemiology study (286). These studies suggest that the imprecision of contact tracing may in part be due to complex transmission patterns in which casual contact may account for a considerable proportion of molecularly clustered yet epidemiologically unlinked cases, highlighting one of the advantages of performing population-based molecular epidemiologic studies: identifying high-risk groups or areas where transmission is ongoing.

Levels of molecular clustering, much like epidemiologic definitions, are subject to the stringency imposed (i.e., identical, related, or unrelated) by the investigators as well as by the specific molecular method used. The criteria of strain relatedness should be assessed with the appropriate study question and population in mind. In regions where the genetic diversity of the bacillary population is limited, for instance, in East Asia, where W-Beijing strains are predominant, clustering based on identical spoligotyping may lead to gross overestimations of true clustering rates since the W-Beijing strains have mostly identical spoligotype patterns. Similarly, clustering based on identical IS6110-based RFLP patterns may yield numerous hybridization profiles with various degrees of similarity and provide an underestimate of true clustering rates, since some strains that have similar IS6110 profiles may in fact be related by a recent common progenitor. For instance, a population-based study conducted in New Jersey identified 68 strains belonging to the W-Beijing family (25). Upon closer inspection, using subtle motifs in the IS6110 banding patterns and secondary molecular methods (VNTR and PGRS), the investigators divided the 68 strains into two groups, A and B. Unlike group B, group A consisted of five closely related IS6110 profiles that shared epidemiologic and demographic characteristics. Here the authors suggested that while groups A and B are phylogenetically related (i.e., both are members of the W-Beijing lineage, PGG-1, and sSNP cluster II [123, 151]), group A represents a clone that has recently spread and evolved in specific U.S.-born communities (i.e., clonal expansion) whereas group B was recovered from mainly non-U.S.-born patients of East Asian origin. Unlike group A, the diverse patterns and contrasting demographic profiles (i.e., mainly HIV seronegative, older age, no known risk factors for TB) noted among group B patients suggest that this group represents mainly reactive cases. Similarly, another study conducted with the same New Jersey population uncovered a large cluster based on a distinct spoligotype that displayed three distinct IS6110 profiles (167). Here, use of ISM and sequencing of "naked" IS6110 flanking regions revealed a stepwise acquisition of IS6110 elements from one to two to three copies. The clonality of the three strains was confirmed by multiple molecular methods. Patients harboring these three IS6110 patterns and distinct spoligotype were similar with respect to geographic and epidemiologic characteristics and differed from patients with unrelated strains, further supporting the molecular method-based clustering. These studies indicate those strains with similar yet nonidentical IS6110 fingerprint patterns may share a recent progenitor and therefore be closely related and part of or an extension of an ongoing series of transmission events.

As highlighted in the New Jersey and Houston studies, population-based studies, when used in conjunction with traditional TB control activities, may facilitate the identification of previously unrecognized transmission events or even outbreaks in populations where there is a high background of reported cases (25, 160, 167, 173, 286). Until recently, it was thought that in low-incidence countries, such as the United States, the majority of TB cases were due to endogenous reactivation. Population-based molecular epidemiologic studies from San Francisco, New York, The Netherlands, and Denmark noted that molecular clustering ranged on average between 35% and 45%, indicating a substantial proportion most probably due to recent transmission (2, 17, 224, 265). This is in contrast to high-incidence populations where clustering is substantially higher, with proportions approaching 70% in some situations (113, 116, 271). It is important to note that molecular clustering is not synonymous with levels of recent transmission. Although molecular clustering may approximate epidemiologic clustering or recent transmission in low-incidence populations, the approximations tend to be more divergent in high-incidence populations due to high rates of infection and multiple transmission pathways. The precise proportion of disease due to recent transmission or endogenous reactivation is variable and heavily dependent on a number of factors, including the annual rate of TB infection, the molecular method employed, effective TB control programs, the size of the infected pool of individuals, age cohort effects (275), immigration history (109, 155), population susceptibility (e.g., genetic susceptibility, HIV prevalence, BCG vaccination), and the sampling strategies employed to derive estimates (114, 185, 186). Thus, when studies account for these independent factors and are used in conjunction with conventional epidemiologic methods, greater resolution and insight into transmission dynamics can be gleaned for the specific communities or populations studied. As such, universal genotyping has been implemented in some cities and countries (15, 56, 65, 136, 265).

Molecular studies on drug resistance.

Over the past decade, much has been learned of the drug targets and mechanisms of resistance to first-line and several second-line antituberculosis agents (Table 2) (168, 206, 208). As mentioned above, M. tuberculosis generally acquires drug resistance via de novo nsSNP, small deletions, or insertions in specific chromosomal loci, unlike most other pathogenic bacteria, which often acquire drug resistance via horizontal transfer. This attribute of M. tuberculosis drug resistance, coupled with fast and efficient DNA sequencing methods, makes studying drug resistance highly amenable for molecular epidemiologic investigations. Molecular epidemiologic studies on drug resistance have generally sought to examine the nature (e.g., genotype-specific mutations, association of specific mutations with phenotypic resistance) and extent (e.g., prevalence of specific mutations in a population) of drug resistance and patient risk factors (e.g., HIV) for acquiring resistance. Some studies have queried the contribution of primary (infection by an already-resistant organism) versus acquired (acquisition of drug resistance within a patient, de novo) drug resistance in specific populations (158, 285), while others have aimed to describe the evolutionary dynamics of drug resistance during clonal expansion or dissemination between and within patients (22, 26, 202). Of note, the terms primary and acquired resistance are not used in the current WHO guidelines, since they cannot be distinguished by most public health programs (unlike molecular epidemiologic studies); rather, they are divided by treatment history. Patients never treated or for treated <1 month and harboring drug-resistant TB are considered primary, and those previously treated or treated for >1 month are labeled acquired (283).

The report by Bifani et al. provides an example of a study of the nature and evolution of drug resistance during a clonal MDR-TB outbreak. Here the investigators describe the genotypic drug resistance profile of strain W and its variants during the outbreak in the early 1990s in New York, N.Y. (26, 101). Of the 357 patient isolates that were invariably resistant to INH, RIF, ethambutol (EMB), streptomycin (STR), and pyrazinamide (PZA) and often resistant to kanamycin (KAN), 253 isolates displayed 18 identical hybridizing bands, strongly indicating clonality and an outbreak. Analysis of five drug resistance chromosomal targets among 50 randomly chosen isolates revealed an identical array of polymorphisms, including a rare dinucleotide substitution in katG315 further supporting clonality and suggesting that the spread of this clone was primary in nature (i.e., acquired resistance prior to dissemination). Since then, at least 11 MDR W variants with subtle variations in IS6110 RFLP profiles have been recovered from New York patients. DNA sequence analyses of drug resistance targets confirm these variants as descendants of the original outbreak strain (i.e., mutations identical to those of strain W); however, in some variants additional resistance to fluoroquinolone and capreomycin was noted (26, 183, 245). Sequence analysis of the fluoroquinolone-resistant strains revealed five different gyrA polymorphisms, indicating that strain W had acquired resistance to first-line agents prior to dissemination and subsequently acquired resistance to fluoroquinolone de novo (285). Here, documentation of primary spread was critical in devising appropriate TB prevention and control measures (1, 100).

Molecular epidemiologic studies of drug resistance have also focused on describing the nature of resistance within patients. For instance, Post et al. sought to better understand the dynamics of drug-resistant subpopulations resident within a patient by characterizing serial isolates recovered from 13 chronic HIV negative MDR-TB patients (202). Serial isolates were characterized by IS6110-based RFLP analysis, spoligotyping, and sequencing of a number of drug resistance-determining regions. It was found that while all cases were infected by a single strain of M. tuberculosis, sputum-derived isolates of 4 of the 13 patients had acquired additional drug resistance mutations during the study. Heterogeneous populations of bacilli with different resistance mutations, as well as mixtures of drug-susceptible and drug-resistant genotypes to specific genetic targets, were noted. This observation was furthered by another study that noted bacilli with additional drug resistance mutations recovered from different human lung lesions (138). Taken together, these studies suggest that a single founder strain of M. tuberculosis may undergo genetic changes during treatment, leading to the accumulation of additional drug resistance independently in discrete physical locales. In addition, it is possible that in patients with mixed infections (more than one infecting strain), the drug resistance profile may be composed of strains with different susceptibilities (e.g., simultaneous infection with mono-INH- and mono-RIF- resistant strains), leading to incorrect MDR resistance profiles (211, 262). Therefore, genetic heterogeneity may require therapeutic targeting of both drug-resistant and drug-susceptible phenotypes, especially with first-line agents.

As shown in the examples above, examinations of target mutations have enabled investigators to determine whether the occurrence of drug resistance among clinical samples is due to primary infection or to de novo acquisition (acquired); the former would implicate active transmission of already-resistant strains in the community studied, and the latter would suggest suboptimal case management or other patient factors, such as poor therapeutic compliance, drug malabsorption, or low drug bioavailability (90, 133, 199, 200). Additionally, drug resistance genetic markers have been used to demonstrate clonality and transmission and to elucidate microevolutionary pathways (22, 26, 183, 272). Other mechanisms or characteristics, not discussed in this review, such as increased expression of specific genes or drug tolerance, which are specific to the infecting strain or restricted to specific phylogenetic lineages may also contribute to reduced susceptibility to antituberculosis agents (36, 79, 198, 276).

Recurrent TB.

Recurrent TB is the reoccurrence of disease after a previous episode has been considered clinically cured or resolved. In general, active TB may develop due to either recent infection or endogenous reactivation of historic infection. However, for the past few decades the role of exogenous reinfection, i.e., caused by a new strain of M. tuberculosis, in TB patients who had previous disease has been heavily debated (48, 236, 243, 274). Stead and others in the 1960s reported that TB always begin with primary infection, and subsequent episodes (of disease) are due to reactivation of these dormant organisms (also known as the unitary concept of pathogenesis) (236). Romeyn suggested that in environments of high TB infectivity, exogenous reinfection does have a role in active TB cases, unlike in communities or countries where TB incidence is low (213). Furthermore, Canetti used the epidemiologic concept of cohort effect to support the notion of exogenous reinfection (48). Between 1962 and 1970 in France, drug resistance to the available antituberculosis agents was 7.8% among patients over the age of 60 years. Since the prevalence of TB infection was as high as 90% 30 years prior (1935), a cohort member aged 60 would have been 30 years old in 1935 and would have invariably been infected with M. tuberculosis. As resistant forms of the bacilli did not appear until the middle to late 1940s, Canetti suggested that these patients harboring drug-resistant organisms acquired it exogenously.

Nevertheless, the long-standing belief was that the majority of recurrent TB is due to endogenous reactivation of the primary M. tuberculosis infection. Although an unsettled issue, clinical cure or successful completion of an appropriate chemotherapy regimen may not result in mycobacterial sterilization (161, 236). Guinea pig studies of M. tuberculosis infection do not support the notion that secondary or tertiary exposure to the bacilli leads to an adverse effect on the host response to the primary infecting strain (296). As such, there have been limited case reports documenting exogenous reinfection occurring among immunocompromised individuals (97, 226); however, host-pathogen factors influencing this seemingly uncommon event have yet to be elucidated (142).

Distinguishing recurrent TB (due to exogenous reinfection or endogenous reactivation) raises several epidemiologic questions regarding the level of active transmission, the infectious burden, and environmental and specific host susceptibility factors in a given population. Clinical, epidemiological, and/or microbiological data cannot conclusively differentiate recurrent TB caused by reactivation or reinfection. Molecular techniques can, when primary (or historical) and secondary samples are available, help distinguish endogenous from exogenous infection. van Rie et al. reported the first comprehensive study of recurrent TB attributable to exogenous reinfection in an ongoing population-based study in Cape Town, South Africa (263). Here, roughly 700 patient isolates were genotyped by IS6110-based RFLP analysis over a 6-year period from a high-TB-incidence region (225 cases/100,000). M. tuberculosis isolates recovered from primary and secondary episodes were available and genotyped for 16 of the 48 recurrent patients with previous TB treatment and documented cure. All but one of the 16 patients was HIV-seronegative. IS6110 analysis indicated that in this group, 75% (12 of 16) of the recurrences were due to an exogenous insult, i.e., a new strain of M. tuberculosis. Concern over the interpretation of data has been noted, as the study included patients with a single positive culture as recurrent cases (76, 95; W. W. Stead and J. H. Bates, Letter, N. Engl. J. Med. 342:1050, 2000; A. van Rie et al., Author's Reply, N. Engl. J. Med 342:1051, 2000). Nevertheless, Vynnycky and Fine suggested, as did Romeyn, that the contribution of exogenous reinfection increases proportionally to the regional incidence of TB, thus supporting the findings of van Rie et al. (213, 274).

Studies conducted in countries with different rates of TB incidence have demonstrated various levels of recurrent disease attributable to exogenous reinfection (46, 135, 233). Sonnenberg et al. reported on the incidence of recurrence among a cohort of HIV-1-infected and uninfected South African mine workers (233). Of the 65 patients with recurrent disease, 39 were available for IS6110 fingerprinting. Of these, 14 patients (36%) were considered exogenously reinfected. The authors found the recurrence rate to be about 2.4 times higher among the HIV-1-infected subgroup than among the uninfected group, suggesting that in regions with a high incidence of TB infection, HIV enhances the rate of recurrence due to high risk of exogenous reinfection. Recently, Verver et al. reported that the age-adjusted incidence rate of TB due to reinfection among patients with successful prior treatment was four times higher than the rate of new TB in Cape Town, South Africa (270). At least in this high-incidence community (313 cases/100,000), individuals with previous TB are strongly associated with an increased risk of developing disease when reinfected. Although potential confounding by HIV status or socioeconomic background may have biased the estimates, this study raises the possibility that there may exists a subgroup of individuals with a predisposition to TB infection and/or that TB disease itself increases the susceptibility to recurrence (19, 49, 292).

Molecular tools have not only facilitated direct evidence for the occurrence of exogenous reinfection among both immunocompetent and immunocompromised individuals but have also provided a platform for studies aimed at assessing the current rates of active transmission in the community, the rate at which this phenomenon occurs in various epidemiologic scenarios (e.g., low TB incidence) among individuals with different risk factors and comorbidities or different race and/or ethnic groups. Although limited, there is evidence for racial variation in the susceptibility or level of innate immunity to M. tuberculosis infection (68, 239); more studies on host susceptibility to reinfection/infection that utilize molecular epidemiologic approaches need to be performed. As such, the finding that M. tuberculosis infection or disease does not afford sufficient acquired immunity against further insults will have profound implications for TB control activities, such as enhanced case finding and preventive therapy among at-risk population groups in areas of high disease prevalence, and for vaccine development. In addition, a study from a high-incidence region reported that multiple infections with different strains are common, suggesting significant reinfection rates and the absence of effective protective immunity conferred by the initial episode, further reinforcing the role of exogenous reinfection in the epidemiology and control of TB (279). Comprehensive reviews of this topic have been published elsewhere (55, 153).

Laboratory error/cross-contamination.

Laboratory error resulting in false-positive cultures of M. tuberculosis can cause erroneous administration of medications, disruption of daily life, and expenditure of resources required for isolation and contact investigations. Mechanisms of laboratory error generally occur when clinical samples are mislabeled or medical devices are contaminated and during the handling or processing of primary patient samples subjected to mycobacterial analyses (laboratory cross-contamination) (39, 45). As this error is often random in nature, the rate of occurrence may be quite variable from one clinical laboratory to another.

Investigating false-positive cultures typically begins by first determining whether the patient has only a single positive acid-fast stain (out of three smears taken on consecutive days) that results in a single positive culture and if the laboratory had processed any other M. tuberculosis isolates during the same time period. When multiple M. tuberculosis patient isolates have been processed in the laboratory during the same time frame, genotyping may be used to determine the possibility of cross-contamination. A study by Small et al. suggested that two identical fingerprints cultured from separate patients within 7 days should be investigated (225). The use of DNA fingerprinting has markedly improved the timeliness and ability to confirm or refute false-positive cultures (16, 56, 96, 192, 225).

Several studies have shown, by molecular techniques, that laboratory error occurs more frequently than previously thought (21, 39, 45). Molecular confirmation of false-positive cultures is often based on sufficient strain diversity in the TB population and distinction between clinical and laboratory strains. That is, when there is sufficient heterogeneity in M. tuberculosis genotypes in a population, the chance of processing two patient isolates with identical fingerprints (within a short time frame) is low and warrants further investigation. This is especially the case in communities with relatively low TB incidence. It is important to note that genotypic heterogeneity and the ability to discriminate genotypes may be dependent on the molecular method. For instance, while most members of the W-Beijing family have identical spoligotypes, they exhibit similar yet distinct IS6110-based RFLP patterns. The