Long short-term memory model - A deep learning approach for medical data with irregularity in cancer predication with tumor markers

作者全名:"Wu, Xiaoxing; Wang, Hsin-Yao; Shi, Peichang; Sun, Rong; Wang, Xiaolin; Luo, Zhixiao; Zeng, Fanling; Lebowitz, Michael S.; Lin, Wan-Ying; Lu, Jang-Jih; Scherer, Richard; Price, Olivia; Wang, Ziwei; Zhou, Jiming; Wang, Yonghong"

作者地址:"[Wu, Xiaoxing; Wang, Ziwei; Wang, Yonghong] Chongqing Med Univ, Affiliated Hosp 1, Chongqing 400016, Peoples R China; [Wang, Hsin-Yao; Shi, Peichang; Lebowitz, Michael S.; Lu, Jang-Jih; Scherer, Richard; Price, Olivia; Zhou, Jiming] 20 20 GeneSyst Inc, Gaithersburg, MD 20877 USA; [Wang, Hsin-Yao; Lu, Jang-Jih] Chang Gung Mem Hosp Linkou, Dept Lab Med, Taoyuan 33305, Taiwan; [Wang, Hsin-Yao] Chang Gung Univ, PhD Program Biomed Engn, Taoyuan 33301, Taiwan; [Sun, Rong; Wang, Xiaolin; Luo, Zhixiao; Zeng, Fanling; Wang, Yonghong] Chongqing Med Univ, Affiliated Hosp 1, Hlth Management Ctr, Chongqing 400016, Peoples R China; [Wang, Ziwei] Chongqing Med Univ, Affiliated Hosp 1, Dept Gastrointestinal Surg, Chongqing 400016, Peoples R China; [Lin, Wan-Ying] Syu Kang Sport Clin, Taipei 11217, Taiwan; [Shi, Peichang] Univ Maryland, Baltimore, MD 20723 USA"

通信作者:"Wang, YH (通讯作者),Chongqing Med Univ, Affiliated Hosp 1, Chongqing 400016, Peoples R China.; Zhou, JM (通讯作者),15810 Gaither Dr,Suite 235, Gaithersburg, MD 20877 USA."

来源:COMPUTERS IN BIOLOGY AND MEDICINE

ESI学科分类:COMPUTER SCIENCE

WOS号:WOS:000806840900006

JCR分区:Q1

影响因子:7.7

年份:2022

卷号:144

期号: 

开始页: 

结束页: 

文献类型:Article

关键词: 

摘要:"Background: Machine learning (ML) has emerged as a superior method for the analysis of large datasets. Application of ML is often hindered by incompleteness of the data which is particularly evident when approaching disease screening data due to varied testing regimens across medical institutions. Here we explored the utility of multiple ML algorithms to predict cancer risk when trained using a large but incomplete real-world dataset of tumor marker (TM) values. Methods: TM screening data were collected from a large asymptomatic cohort (n =163,174) at two independent medical centers. The cohort included 785 individuals who were subsequently diagnosed with cancer. Data included levels of up to eight TMs, but for most subjects, only a subset of the biomarkers were tested. In some instances, TM values were available at multiple time points, but intervals between tests varied widely. The data were used to train and test various machine learning models to evaluate their robustness for predicting cancer risk. Multiple methods for data imputation were explored and models were developed for both single time point as well as time-series data. Results: The ML algorithm, long short-term memory (LSTM), demonstrated superiority over other models for dealing with irregular medical data. A cancer risk prediction tool was trained and validated for a single time-point test of a TM panel including up to four biomarkers (AUROC = 0.831, 95% CI: 0.827-0.835) which outperformed a single threshold method using the same biomarkers. A second model relying on time series data of up to four time-points for 5 TMs had an AUROC of 0.931. Conclusions: A cancer risk prediction tool was developed by training a LSTM model using a large but incomplete real-world dataset of TM values. The LSTM model was best able to handle irregular data compared to other ML models. The use of time-series TM data can further improve the predictive performance of LSTM models even when the intervals between tests vary widely. These risk prediction tools are useful to direct subjects to further screening sooner, resulting in earlier detection of occult tumors."

基金机构:"National Natural Science Foundation of China [81974385]; Chang Gung Memorial Hospital (Linkou) , Taiwan [CMRPG3J1791]; 20/20 GeneSysthems, Inc. US [G2021]"

基金资助正文:"Study is funded by National Natural Science Foundation of China (NO.81974385) , Chang Gung Memorial Hospital (Linkou) , Taiwan (CMRPG3J1791) and 20/20 GeneSysthems, Inc. US (G2021) ."