Avaliação de grandes modelos de linguagem na extração de informações clínica

Carlos Eduardo Rodrigues Mello; Elisa Terumi Rubel Schneider; Lucas Emanuel Silva e Oliveira; Juliana Nabbouh do Nascimento; Yohan Bonescki Gumie; Isabela Fontes de Araújo; Claudia Moro

doi:10.59681/2175-4411.v16.iEspecial.2024.1306

Evaluating of large language models in extracting clinical information

Authors

Carlos Eduardo Rodrigues Mello Pontifica Universidade Católica do Paraná
Elisa Terumi Rubel Schneider Instituto do Coração
Lucas Emanuel Silva e Oliveira Comsentimento
Juliana Nabbouh do Nascimento PUC-PR
Yohan Bonescki Gumie HC FMUSP
Isabela Fontes de Araújo PUC-PR
Claudia Moro PUC-PR

DOI:

https://doi.org/10.59681/2175-4411.v16.iEspecial.2024.1306

Keywords:

Syndrome, Signs and Symptoms, Machine Learning, Natural Language Processing

Abstract

Objective: investigate the effectiveness of large language models (LLMs) in named entity recognition (NER) in clinical notes in Brazilian Portuguese. Method: We evaluated the NER task in 30 clinical notes using the metrics and methods of precision, recall, and F-score. In the experiment conducted, we compared the performance of the models GPT-3.5, Gemini, Llama-3, and Sabiá-2 in extracting the entities "Signs or Symptoms," "Diseases or Syndromes," and "Negated Data." Results: We found that the Llama-3 model showed superior performance, especially in sensitivity, achieving an F-score of 0.538. GPT-3.5 demonstrated balanced performance, while Gemini showed higher precision but lower sensitivity. Conclusion: Our results indicate that the choice of model depends on the appropriate weighting of these factors concerning the individual requirements of each clinical application.

Author Biographies

Carlos Eduardo Rodrigues Mello, Pontifica Universidade Católica do Paraná

Graduando em Ciência da Computação, Pontifica Universidade Católica do Paraná (PUCPR), Curitiba, PR, Brasil

Elisa Terumi Rubel Schneider, Instituto do Coração

Doutora em Informática, Pesquisadora, Instituto do Coração (HC FMUSP), São Paulo - SP, Brasil

Lucas Emanuel Silva e Oliveira, Comsentimento

Doutor em Tecnologia em Saúde, Comsentimento, Curitiba, PR, Brasil

Juliana Nabbouh do Nascimento, PUC-PR

Graduanda de Engenharia Biomédica - PUCPR, Curitiba, PR, Brasil

Yohan Bonescki Gumie, HC FMUSP

Doutor em Tecnologia em Saúde, Pesquisador Instituto do Coração (HC FMUSP), São Paulo - SP, Brasil

Isabela Fontes de Araújo, PUC-PR

Mestranda PPGTS/PUCPR, Curitiba, PR, Brasil

Claudia Moro, PUC-PR

Doutora Engenharia Elétrica, Professora Titular - PPGTS/PUCPR, Curitiba, PR, Brasil

References

Yadav, P., Steinbach, M., Kumar, V., & Simon, G. (2018). Mining Electronic Health Records (EHRs). ACM Computing Surveys, 50(6), 1–40. doi:10.1145/3127881 DOI: https://doi.org/10.1145/3127881

Jensen, P. B., Jensen, L. J., & Brunak, S. (2012). Mining electronic health records: towards better research applications and clinical care. Nature Reviews Genetics, 13(6), 395–405. doi:10.1038/nrg3208 DOI: https://doi.org/10.1038/nrg3208

Assale, M., Dui, L. G., Cina, A., Seveso, A., & Cabitza, F. (2019). The Revival of the Notes Field: Leveraging the Unstructured Content in Electronic Health Records. Frontiers in Medicine, 6. doi:10.3389/fmed.2019.00066 DOI: https://doi.org/10.3389/fmed.2019.00066

Sun, Peng et al. "An overview of named entity recognition." 2018 International Conference on Asian Language Processing (IALP). IEEE, 2018. p. 273-278. DOI: https://doi.org/10.1109/IALP.2018.8629225

DA SILVA, Diego Pinheiro et al. "Exploring named entity recognition and relation extraction for ontology and medical records integration". Journal of Informatics in Medicine Unlocked vol. 43 (2023): 2352-9148. doi:10.1016/j.imu.2023.101381 DOI: https://doi.org/10.1016/j.imu.2023.101381

Liu, Zhengliang, et al. "Deid-gpt: Zero-shot medical text de-identification by gpt-4." arXiv preprint arXiv:2303.11032 (2023).

Schneider, Elisa Terumi Rubel et al. "BioBERTpt: a portuguese neural language model for clinical Named Entity Recognition." Proceedings of the 3rd Clinical Natural Language Processing Workshop. 19 November 2020, 2020. DOI: https://doi.org/10.18653/v1/2020.clinicalnlp-1.7

Schneider, E. T. R, et al., "CardioBERTpt: Transformer-based Models for Cardiology Language Representation in Portuguese," 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS), L'Aquila, Italy, 2023, pp. 378-381, doi: 10.1109/CBMS58004.2023.00247. DOI: https://doi.org/10.1109/CBMS58004.2023.00247

Oliveira, L.E.S.e., Peters, A.C., da Silva, A.M.P. et al.. SemClinBr - a multi-institutional and multi-specialty semantically annotated corpus for Portuguese clinical NLP tasks. J Biomed Semantics. 2022;13(1):13. Published 2022 May 8. doi:10.1186/s13326-022-00269-1 DOI: https://doi.org/10.1186/s13326-022-00269-1

https://openai.com/index/chatgpt/ [Internet]. San Francisco: OpenAI; c2024 [cited 2024 May 31]. Available from: https://openai.com/index/chatgpt/.

Apresentando o Gemini: nosso maior e mais hábil modelo de IA. [Internet]. California: Google; c2024 [cited 2024 May 31]. Available from: https://blog.google/intl/pt-br/novidades/tecnologia/apresentando-o-gemini-nosso-maior-e-mais-habil-modelo-de-ia/#mensagem-sundar.

https://llama.meta.com/llama3/ [Internet]. California: Meta; c2024 [cited 2024 May 31]. Available from: https://llama.meta.com/llama3/

https://www.maritaca.ai/sabia-2 Internet]. São Paulo: Maritaca AI; c2024 [cited 2024 May 31]. Available from: https://www.maritaca.ai/sabia-2

GE, Yao et al. "Few-shot learning for medical text: A review of advances, trends, and opportunities". Journal of Biomedical Informatics vol. 144 (2023): 1532-0464. doi: 10.1016/ j.jbi.2023.104458 DOI: https://doi.org/10.1016/j.jbi.2023.104458

Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc."

Downloads

PDF (Português (Brasil))

Published

2024-11-19

How to Cite

Mello, C. E. R., Schneider, E. T. R., Silva e Oliveira, L. E., do Nascimento, J. N., Gumie, Y. B., de Araújo, I. F., & Moro, C. (2024). Evaluating of large language models in extracting clinical information. Journal of Health Informatics, 16(Especial). https://doi.org/10.59681/2175-4411.v16.iEspecial.2024.1306

Download Citation

Issue

Vol. 16 No. Especial (2024): Congresso Brasileiro de Informática em Saúde

Section

CBIS 2024

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Submission of a paper to Journal of Health Informatics is understood to imply that it is not being considered for publication elsewhere and that the author(s) permission to publish his/her (their) article(s) in this Journal implies the exclusive authorization of the publishers to deal with all issues concerning the copyright therein. Upon the submission of an article, authors will be asked to sign a Copyright Notice. Acceptance of the agreement will ensure the widest possible dissemination of information. An e-mail will be sent to the corresponding author confirming receipt of the manuscript and acceptance of the agreement.

Most read articles by the same author(s)

Elisa Terumi Rubel Schneider, Yohan Bonescki Gumiel, Lucas Ferro Antunes de Oliveira, Carolina de Oliveira Montenegro, Laura Rubel Barzotto, Claudia Moro, Adriana Pagano, Emerson Cabrera Paraiso, Developing a Transformer-based Clinical Part-of-Speech Tagger for Brazilian Portuguese , Journal of Health Informatics: Vol. 15 No. Especial (2023): XIX Congresso Brasileiro de Informática em Saúde
Gabrielle dos Santos Leandro, Claudia Moro, SISVAL-RENAL: clinical decision support to manage the anemia in chronic kidney failure , Journal of Health Informatics: Vol. 15 No. Especial (2023): XIX Congresso Brasileiro de Informática em Saúde
João Vitor Andrioli de Souza, Elisa Terumi Rubel Schneider, Josilaine Oliveira Cezar, Lucas Emanuel Silva e Oliveira, Yohan Bonescki Gumiel, Emerson Cabrera Paraiso, Douglas Teodoro, Claudia Maria Cabral Moro Barra, A Multilabel Approach to Portuguese Clinical Named Entity Recognition , Journal of Health Informatics: Vol. 12 (2020): Suplemento I - XVII Congresso Brasileiro de Informática em Saúde - CBIS 2020
Giovanni Pazini Meneghel Paiva, Elisa Terumi Rubel Schneider, Josilaine Oliveira Cezar, Lucas Ferro Antunes de Oliveira, João Vitor Andrioli, Claudia Maria Cabral Moro Barra, Emerson Cabrera Paraiso, Lucas Emanuel Silva e Oliveira, Yohan Bonescki Gumiel, COVID 19: O que sentem os brasileiros de acordo com o Twitter? , Journal of Health Informatics: Vol. 12 (2020): Suplemento I - XVII Congresso Brasileiro de Informática em Saúde - CBIS 2020
Elisa Terumi Rubel Schneider, Fernando Henrique Schneider, Yohan Bonescki Gumiel, Lilian Mie Mukai Cintho, Adriana Pagano, Emerson Cabrera Paraiso, Marina de Sa Rebelo, Marco Antonio Gutierrez, Jose Eduardo Krieger, Claudia Moro, De-identification of clinical narratives with open source generative models , Journal of Health Informatics: Vol. 16 No. Especial (2024): Congresso Brasileiro de Informática em Saúde