The performance of arti fi cial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard

作者全名:Daraqel, Baraa; Wafaie, Khaled; Mohammed, Hisham; Cao, Li; Mheissen, Samer; Liu, Yang; Zheng, Leilei

作者地址:[Daraqel, Baraa; Cao, Li; Liu, Yang; Zheng, Leilei] Chongqing Med Univ, Stomatol Hosp, Dept Orthodont, 426 Songshibei Rd, Chongqing 401147, Peoples R China; [Daraqel, Baraa; Cao, Li; Liu, Yang; Zheng, Leilei] Chongqing Med Univ, Chongqing Key Lab Oral Dis & Biomed Sci, Chongqing, Peoples R China; [Daraqel, Baraa; Cao, Li; Liu, Yang; Zheng, Leilei] Chongqing Med Univ, Chongqing Municipal Key Lab Oral Biomed Engn Highe, Chongqing, Peoples R China; [Daraqel, Baraa; Mohammed, Hisham] Al Quds Univ, Oral Hlth Res & Promot Unit, Jerusalem, Palestine; [Wafaie, Khaled; Mheissen, Samer] Zhengzhou Univ, Affiliated Hosp 1, Fac Dent, Dept Orthodont, Zhengzhou, Henan, Peoples R China

通信作者:Daraqel, B; Zheng, LL (通讯作者),Chongqing Med Univ, Stomatol Hosp, Dept Orthodont, 426 Songshibei Rd, Chongqing 401147, Peoples R China.

来源:AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS

ESI学科分类:CLINICAL MEDICINE

WOS号:WOS:001247324400001

JCR分区:Q1

影响因子:2.7

年份:2024

卷号:165

期号:6

开始页:652

结束页:662

文献类型:Article

关键词: 

摘要:Introduction: This study aimed to evaluate and compare the performance of 2 artificial intelligence (AI) models, Chat Generative Pretrained Transformer -3.5 (ChatGPT-3.5; OpenAI, San Francisco, Calif) and Google Bidirectional Encoder Representations from Transformers (Google Bard; Bard Experiment, Google, Mountain View, Calif), in terms of response accuracy, completeness, generation time, and response length when answering general orthodontic questions. Methods: A team of orthodontic specialists developed a set of 100 questions in 10 orthodontic domains. One author submitted the questions to both ChatGPT and Google Bard. The AI-generated responses from both models were randomly assigned into 2 forms and sent to 5 blinded and independent assessors. The quality of AI-generated responses was evaluated using a newly developed tool for accuracy of information and completeness. In addition, response generation time and length were recorded. Results: The accuracy and completeness of responses were high in both AI models. The median accuracy score was 9 (interquartile range [IQR]: 8-9) for ChatGPT and 8 (IQR: 8-9) for Google Bard (Median difference: 1; P \0.001). The median completeness score was similar in both models, with 8 (IQR: 8-9) for ChatGPT and 8 (IQR: 7-9) for Google Bard. The odds of accuracy and completeness were higher by 31% and 23% in ChatGPT than in Google Bard. Google Bard's response generation time was significantly shorter than that of ChatGPT by 10.4 second/question. However, both models were similar in terms of response length generation. Conclusions: Both ChatGPT and Google Bard generated responses were rated with a high level of accuracy and completeness to the posed general orthodontic questions. However, acquiring answers was generally faster using the Google Bard model. (Am J Orthod Dentofacial Orthop 2024;165:652-62)

基金机构: 

基金资助正文: