MATNet: Exploiting Multi-Modal Features for Radiology Report Generation

作者全名:"Shang, Caozhi; Cui, Shaoguo; Li, Tiansong; Wang, Xi; Li, Yongmei; Jiang, Jingfeng"

作者地址:"[Shang, Caozhi; Cui, Shaoguo; Li, Tiansong] Chongqing Normal Univ, Sch Comp & Informat Sci, Chongqing 401331, Peoples R China; [Wang, Xi] Peking Univ, Dept Econ, Beijing 100000, Peoples R China; [Li, Yongmei] Chongqing Med Univ, Affiliated Hosp 1, Dept Radiol, Chongqing 400016, Peoples R China; [Jiang, Jingfeng] Michigan Technol Univ, Dept Biomed Engn, Houghton, MI 49931 USA"

通信作者:"Cui, SG (通讯作者),Chongqing Normal Univ, Sch Comp & Informat Sci, Chongqing 401331, Peoples R China."

来源:IEEE SIGNAL PROCESSING LETTERS

ESI学科分类:ENGINEERING

WOS号:WOS:000915831400003

JCR分区:Q2

影响因子:3.9

年份:2022

卷号:29

期号: 

开始页:2692

结束页:2696

文献类型:Article

关键词:Diseases; Transformers; Decoding; Visualization; Feature extraction; Radiology; MIMICs; Radiology report generation; multi-modal learning; medical image processing

摘要:"Medical imaging is widely used in hospital clinical workflows. Assisting physicians in diagnosis by automatically generating reports from radiological images is an unmet clinical demand and requires urgent attention. However, this task suffers from two significant problems: 1) visual and textual data biases, and 2) the Transformer decoder makes no distinction between visual and non-visual words. We propose a novel multi-task approach combining natural language processing with machine learning techniques to meet this clinical need, i.e., creating fluent and accurate radiology reports. We name our system as Multi-modal Adaptive Transformer (MATNet), which consists of three key modules. First, Multi-Modal Encoder (MME) explores the relationship between radiology images and clinical notes. Second, Disease Classifier (DC) classifies the states of each disease topic and provides state-aware disease embeddings to alleviate visual data bias. Last, Adaptive Decoder (AD) dynamically measures the contribution of source signals and target signals when generating the next word. Based on our evaluations using benchmark IU-XRay and MIMIC-CXR datasets, the proposed MATNet outperformed previous state-of-the-art models on language fluency and clinical accuracy metrics such as BLEU scores."

基金机构:"National Natural Science Foundation of China [62003065]; Natural Science Foundation Project of Chongqing Science and Technology Bureau [CSTB2022NSCQ-MSX1206, CSTB2022TFII-OFX0042, cstc2019jscx-mbdxX0061]; Key Science and Technology Research Program of Chongqing Municipal Education Commission [KJZD-K202200510]"

基金资助正文:"This work was supported in part by the National Natural Science Foundation of China under Grant 62003065, in part by the Natural Science Foundation Project of Chongqing Science and Technology Bureau under Grants CSTB2022NSCQ-MSX1206, CSTB2022TFII-OFX0042, and cstc2019jscx-mbdxX0061, and in part by the Key Science and Technology Research Program of Chongqing Municipal Education Commission under Grant KJZD-K202200510."