"Machine learning for identifying benign and malignant of thyroid tumors: A retrospective study of 2,423 patients"

作者全名:"Guo, Yuan-yuan; Li, Zhi-jie; Du, Chao; Gong, Jun; Liao, Pu; Zhang, Jia-xing; Shao, Cong"

作者地址:"[Guo, Yuan-yuan; Li, Zhi-jie; Liao, Pu; Zhang, Jia-xing] Chongqing Gen Hosp, Dept Lab Med, Chongqing, Peoples R China; [Du, Chao] Fuling Ctr Hosp Chongqing City, Dept Lab Med, Chongqing, Peoples R China; [Gong, Jun] Chongqing Med Univ, University Town Hosp, Dept Informat Ctr, Chongqing, Peoples R China; [Shao, Cong] Chongqing Gen Hosp, Dept Breast & Thyroid Surg, Chongqing, Peoples R China"

通信作者:"Liao, P (通讯作者),Chongqing Gen Hosp, Dept Lab Med, Chongqing, Peoples R China."

来源:FRONTIERS IN PUBLIC HEALTH

ESI学科分类:SOCIAL SCIENCES, GENERAL

WOS号:WOS:000861667000001

JCR分区:Q1

影响因子:5.2

年份:2022

卷号:10

期号: 

开始页: 

结束页: 

文献类型:Article

关键词:thyroid tumor; machine learning; predictive model; BRAFV600E gene mutation; risk-factors

摘要:"Thyroid tumors, one of the common tumors in the endocrine system, while the discrimination between benign and malignant thyroid tumors remains insufficient. The aim of this study is to construct a diagnostic model of benign and malignant thyroid tumors, in order to provide an emerging auxiliary diagnostic method for patients with thyroid tumors. The patients were selected from the Chongqing General Hospital (Chongqing, China) from July 2020 to September 2021. And peripheral blood, BRAFV600E gene, and demographic indicators were selected, including sex, age, BRAFV600E gene, lymphocyte count (Lymph#), neutrophil count (Neu#), neutrophil/lymphocyte ratio (NLR), platelet/lymphocyte ratio (PLR), red blood cell distribution width (RDW), platelets count (PLT), red blood cell distribution width-coefficient of variation (RDW-CV), alkaline phosphatase (ALP), and parathyroid hormone (PTH). First, feature selection was executed by univariate analysis combined with least absolute shrinkage and selection operator (LASSO) analysis. Afterward, we used machine learning algorithms to establish three types of models. The first model contains all predictors, the second model contains indicators after feature selection, and the third model contains patient peripheral blood indicators. The four machine learning algorithms include extreme gradient boosting (XGBoost), random forest (RF), light gradient boosting machine (LightGBM), and adaptive boosting (AdaBoost) which were used to build predictive models. A grid search algorithm was used to find the optimal parameters of the machine learning algorithms. A series of indicators, such as the area under the curve (AUC), were intended to determine the model performance. A total of 2,042 patients met the criteria and were enrolled in this study, and 12 variables were included. Sex, age, Lymph#, PLR, RDW, and BRAFV600E were identified as statistically significant indicators by univariate and LASSO analysis. Among the model we constructed, RF, XGBoost, LightGBM and AdaBoost with the AUC of 0.874 (95% CI, 0.841-0.906), 0.868 (95% CI, 0.834-0.901), 0.861 (95% CI, 0.826-0.895), and 0.837 (95% CI, 0.802-0.873) in the first model. With the AUC of 0.853 (95% CI, 0.818-0.888), 0.853 (95% CI, 0.818-0.889), 0.837 (95% CI, 0.800-0.873), and 0.832 (95% CI, 0.797-0.867) in the second model. With the AUC of 0.698 (95% CI, 0.651-0.745), 0.688 (95% CI, 0.639-0.736), 0.693 (95% CI, 0.645-0.741), and 0.666 (95% CI, 0.618-0.714) in the third model. Compared with the existing models, our study proposes a model incorporating novel biomarkers which could be a powerful and promising toot for predicting benign and malignant thyroid tumors."

基金机构:Science and Technology and Health Commission program of Chongqing; [2020FYYX157]

基金资助正文:Funding This work was supported by a grant for the Science and Technology and Health Commission program of Chongqing (2020FYYX157).