Machine Learning Algorithms for Prediction of Breast Cancer Survival
DOI:
https://doi.org/10.59681/2175-4411.v15.iEspecial.2023.1091Keywords:
Survival Analysis, Machine Learning, Breast cancerAbstract
Objective: This paper aims to show a comparative analysis of Machine Learning algorithms applied to Breast Cancer Survival prediction. Methods: Descriptive study that considered data from 1,570 patients with stage I-III breast cancer. The Synthetic Minority Oversampling Technique was applied due to an imbalance in the dataset. The Naive Bayes, Random Forest, Multilayer Perceptron and AdaBoost algorithms were considered in the study, and cross-validation as a learning strategy. Results: The model developed from the Random Forest algorithm showed greater accuracy (96.2%; 95%CI: 95.5%-96.9%) and specificity (97.4%; 95%CI: 96.6%-98.2% ); and the model developed from AdaBoost, greater sensitivity (95.3%; 95%CI: 94.3%-96.4%). Conclusion: Thus, among the models presented in our study, the one developed from the Random Forest algorithm presented, in general, the best evaluation measures in the prediction of breast cancer survival.
References
Hassan MA, Ates-Alagoz Z. Cyclin-Dependent Kinase 4/6 Inhibitors Against Breast Cancer. Mini Rev Med Chem. 2022.
INCA. Estimativa 2020. In: Saúde Md, editor. Incidência de Câncer no Brasil. Brasil: Instituto Nacional de Câncer José Alencar Gomes da Silva (INCA); 2019.
Yersal O, Barutca S. Biological subtypes of breast cancer: Prognostic and therapeutic implications. World J Clin Oncol. 2014;5(3):412-24.
Milosevic M, Jankovic D, Milenkovic A, Stojanov D. Early diagnosis and detection of breast cancer. Technol Health Care. 2018;26(4):729-59.
Trister AD, Buist DSM, Lee CI. Will Machine Learning Tip the Balance in Breast Cancer Screening? JAMA Oncol. 2017;3(11):1463-4.
Montazeri M, Montazeri M, Montazeri M, Beigzadeh A. Machine learning models in breast cancer survival prediction. Technol Health Care. 2016;24(1):31-42.
Nandakumar A, Anantha N, Venugopal TC, Sankaranarayanan R, Thimmasetty K, Dhar M. Survival in breast cancer: a population-based study in Bangalore, India. Int J Cancer. 1995;60(5):593-6.
Puja G, Shruti G. Breast Cancer Prediction using varying Parameters of Machine Learning Models. Procedia Computer Science. 2020;171:593-601.
Pinheiro TS, Yahata E, Santos PDd, Oliveira FSd, Takahata AK, Suyama R, et al. Machine Learning e Análise Multivariada aplicados à Sobrevida do Câncer Mama. Journal of Health Informatics. 2022;14(0).
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. 2002;16:321-57.
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. SIGKDD Explor Newsl. 2009;11(1):10–8.
Frank E, Hall M, Witten I. Appendix B - The WEKA workbench. In: Ian HW, Eibe F, Mark AH, Christopher JP, editors. Data Mining (Fourth Edition). Fourth Edition ed: Morgan Kaufmann; 2017. p. 553-71.
Nindrea RD, Aryandono T, Lazuardi L, Dwiprahasto I. Diagnostic Accuracy of Different Machine Learning Algorithms for Breast Cancer Risk Calculation: a Meta-Analysis. Asian Pac J Cancer Prev. 2018;19(7):1747-52.
Kalafi EY, Nor NAM, Taib NA, Ganggayah MD, Town C, Dhillon SK. Machine Learning and Deep Learning Approaches in Breast Cancer Survival Prediction Using Clinical Data. Folia Biol (Praha). 2019;65(5-6):212-20.
Le Thien MA, Redjdal A, Bouaud J, Seroussi B. Deep Learning, a Not so Magical Problem Solver: A Case Study with Predicting the Complexity of Breast Cancer Cases. Stud Health Technol Inform. 2021;287:144-8.
Freund Y, Schapire RE, editors. A desicion-theoretic generalization of on-line learning and an application to boosting. Computational Learning Theory; 1995 1995//; Berlin, Heidelberg: Springer Berlin Heidelberg.
Henry R, Meltzer MI. Etymologia: Bayesian Probability. Emerg Infect Dis. 2017;23(1):28.
Krishnan S. 6 - Machine learning for biomedical signal analysis. In: Krishnan S, editor. Biomedical Signal Analysis for Connected Healthcare: Academic Press; 2021. p. 223-64.
Biau G. Analysis of a Random Forests Model. Journal of Machine Learning Research. 2010;13.
Liaw A, Wiener M. Classification and regression by randomForest. R news. 2002;2(3):18-22.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Pablo Deoclecia dos Santos, Erika Yahata, Talita Santos Piheiro, Fellipe Soares de Oliveira, Priscyla Waleska Simões
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Submission of a paper to Journal of Health Informatics is understood to imply that it is not being considered for publication elsewhere and that the author(s) permission to publish his/her (their) article(s) in this Journal implies the exclusive authorization of the publishers to deal with all issues concerning the copyright therein. Upon the submission of an article, authors will be asked to sign a Copyright Notice. Acceptance of the agreement will ensure the widest possible dissemination of information. An e-mail will be sent to the corresponding author confirming receipt of the manuscript and acceptance of the agreement.