Beyond observation: genomic traits and machine learning algorithms for predicting fungal lifestyles

作者全名:"Chen, Y. P.; Su, P. W.; Stadler, M.; Xiang, R.; Hyde, K. D.; Tian, W. H.; Maharachchikumbura, S. S. N."

作者地址:"[Chen, Y. P.; Su, P. W.; Tian, W. H.; Maharachchikumbura, S. S. N.] Univ Elect Sci & Technol China, Ctr Informat Biol, Sch Life Sci & Technol, Chengdu 610054, Peoples R China; [Stadler, M.] Helmholtz Ctr Infect Res GmbH, Dept Microbial Drugs, Braunschweig, Germany; [Stadler, M.] German Ctr Infect Res DZIF, Partner Site Hannover Braunschweig, Inhoffenstr 7, D-38124 Braunschweig, Germany; [Stadler, M.] Tech Univ Carolo Wilhelmina Braunschweig, Inst Microbiol, Spielmannstr 7, D-38106 Braunschweig, Germany; [Xiang, R.] Chongqing Med Univ, Precis Med Ctr, Affiliated Hosp 2, Chongqing 404100, Peoples R China; [Hyde, K. D.] Mae Fah Luang Univ, Ctr Excellence Fungal Res, Chiang Rai 57100, Thailand; [Hyde, K. D.] Zhongkai Univ Agr & Engn, Innovat Inst Plant Hlth, Guangzhou 510225, Peoples R China"

通信作者:"Maharachchikumbura, SSN (通讯作者),Univ Elect Sci & Technol China, Ctr Informat Biol, Sch Life Sci & Technol, Chengdu 610054, Peoples R China."

来源:MYCOSPHERE

ESI学科分类:PLANT & ANIMAL SCIENCE

WOS号:WOS:001095656100001

JCR分区:Q1

影响因子:10

年份:2023

卷号:14

期号:1

开始页:1530

结束页:1563

文献类型:Article

关键词:CAZymes; FCWDEs; Genomics; genomic profile; PCWDEs; secretome; TEs

摘要:"Economically and agriculturally important fungal species exhibit various lifestyles, and they can switch their life modes depending on the habitat, host tolerance, and resource availability. Traditionally, fungal lifestyles have been determined based on observation at a particular host or habitat. Therefore, potential fungal pathogens have been neglected until they cause devastating impacts on human health, food security, and ecosystem stability. This study focused on the class Sordariomycetes to explore the genomic traits that could be used to determine the lifestyles of fungi and the possibility of predicting fungal lifestyles using machine learning algorithms. A total of 638 representative genomes encompassing 5 subclasses, 17 orders, and 50 families were selected and annotated. Through an extensive literature survey, the lifestyles of 553 genomes were determined, including plant pathogens, saprotrophs, entomopathogens, mycoparasites, endophytes, human pathogens and nematophagous fungi. We first tried to examine the relationship between fungal lifestyles and transposable elements. We unexpectedly discovered that second-generation sequencing technologies tend to result in reduced size of transposable elements while having no discernible impact on the content of protein-coding genes. Then, we constructed three numerical matrices: 1) a basic genomic feature matrix including 25 features; 2) a functional protein matrix including 24 features; 3) and a combined matrix. Meanwhile, we reconstructed a genome-scale phylogeny, across which comprehensive comparative analyses were conducted. The results indicated that basic genomic features reflected more on phylogeny rather than lifestyle, but the abundance of functional proteins exhibited relatively high discrimination not only in differentiating taxonomic groups at the higher levels but also in differentiating lifestyles. Among these lifestyles including plant pathogens, saprotrophs, entomopathogens, mycoparasites, endophytes, and human pathogens, plant pathogens exhibited the largest secretomes, while entomopathogens had the smallest secretomes. The abundance of secretomes served as a valuable indicator for differentiating plant pathogens from mycoparasites, saprotrophs, and entomopathogens, as well as for discriminating endophytes from entomopathogens. Effectors have long been considered disease determinants, and indeed, we observed a higher presence of effectors in plant pathogens than in saprotrophs and entomopathogens. However, surprisingly, endophytes also exhibited a similar abundance of effectors, challenging their role as a reliable indicator for pathogenic fungi. A single functional protein group could not differentiate all lifestyles, but their combinations resulted in accurate differentiation for most lifestyles. Furthermore, models of six machine learning algorithms were trained, optimized, and evaluated based on the labeled genomes. The best-performance model was used to predict the lifestyle of 83 unlabeled genomes. Although insufficient genome sampling for several lifestyles and inaccurate lifestyle assignments for some genomes, the predictive model still obtained a high degree of accuracy in differentiating plant pathogens. The predictive model can be further optimized with more sequenced genomes in the future and provide a more reliable prediction. It can serve as an early warning system, enabling the identification of potentially devastating fungi and facilitating the implementation of appropriate measures to prevent their spread."

基金机构:"Talent Introduction and Cultivation Project, University of Electronic Science and Technology of China [A1098531023601245]"

基金资助正文:"This research was funded by Talent Introduction and Cultivation Project, University of Electronic Science and Technology of China, grant number A1098531023601245."