Clostridium (Clostridioides) difficile infection (CDI) remains a significant global One Health threat. The genetic heterogeneity seen across the species (>600 sequence types, some with access to massive gene pools) underscores its wide ecological versatility and sympatric lifestyle and has driven the significant changes in CDI epidemiology seen in the last 2 decades. Considering such enormous diversity, and recent contentious taxonomic revisions, this study explored the hypothesis that C. difficile comprises a complex of distinct species divided along the major evolutionary clades. Whole-genome average nucleotide identity (ANI), pangenomic and Bayesian analyses were used to explore an international collection of over 12,000 C. difficile genomes spanning the eight currently defined phylogenetic clades (major clades 1-5 and cryptic clades I-III), providing new insights into ancestry, genetic diversity and evolution of pathogenicity in this enigmatic pathogen.
We identified major taxonomic incoherence and clear species boundaries separating the three cryptic clades I-III into three novel genomospecies. The emergence of these three independent genomospecies predates clades 1-5 by millions of years, rewriting the global population structure of C. difficile and the taxonomy of the Peptostreptococcaceae. Divergence of these genomospecies was likely due to a separation in their habitats or hosts, as the new genomospecies possessed several genetic loci which allow them to thrive in different ecological niches. These genomospecies also show unique and highly divergent toxin gene architecture (which may escape current diagnostic tests), advancing our understanding of the evolution of C. difficile and close relatives. Beyond the taxonomic ramifications, this work may impact the diagnosis of CDI worldwide.