Hate speech detection for mental health support
DOI:
https://doi.org/10.59681/2175-4411.v16.iEspecial.2024.1255Keywords:
Natural Language Processing, Artificial Intelligence, HateAbstract
Objective: This article aims to explore the classification of texts extracted from social media comments containing offensive language and hate speech. Interactions on social networks with this bias can have harmful effects on the population's mental health. Method: We used Natural Language Processing and Machine Learning techniques, applying them to a Brazilian dataset. We investigated the use of embeddings, the deployment of Long Short-Term Memory (LSTM) neural networks, and a hybrid approach with Convolutional Neural Network (CNN). The analysis includes evaluating data imbalance and applying undersampling and oversampling techniques. Results and conclusion: LSTM optimization resulted in modest gains, being more effective when combined with CNN, especially with oversampling. However, the latter raises concerns about overfitting. The results indicate that the developed model is more reliable for detecting offensive language than hate speech.
References
Fortuna P, Nunes S. A Survey on Automatic Detection of Hate Speech in Text. ACM Computing Surveys. 2019;51(4):1-30.
Salminen J, Almerekhi H, Milenković M, Jung S-G, An J, Kwak H, et al. Anatomy of Online Hate: Developing a Taxonomy and Machine Learning Models for Identifying and Classifying Hate in Online News Media. Proceedings of the International AAAI Conference on Web and Social Media. 2018;12(1).
Nguyen T. Merging public health and automated approaches to address online hate speech. AI and Ethics. 2023.
Saha K, Chandrasekharan E, Choudhury MD. Prevalence and Psychological Effects of Hateful Speech in Online College Communities. Proceedings of the 10th ACM Conference on Web Science; Boston, Massachusetts, USA: Association for Computing Machinery; 2019. p. 255–64.
Vargas F, Carvalho I, Rodrigues de Góes F, Pardo T, Benevenuto F, editors. HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Offensive Language and Hate Speech Detection2022 June; Marseille, France: European Language Resources Association.
Fortuna P, Rocha Da Silva J, Soler-Company J, Wanner L, Nunes S, editors. A Hierarchically-Labeled Portuguese Hate Speech Dataset. Proceedings of the Third Workshop on Abusive Language Online; 2019 2019-01-01: Association for Computational Linguistics.
Badjatiya P, Gupta S, Gupta M, Varma V, editors. Deep Learning for Hate Speech Detection in Tweets. Proceedings of the 26th International Conference on World Wide Web Companion - WWW '17 Companion; 2017 2017-01-01: ACM Press.
Garg M, Saxena C, Saha S, Krishnan V, Joshi R, Mago V, editors. CAMS: An Annotated Corpus for Causal Analysis of Mental Health Issues in Social Media Posts2022 June; Marseille, France: European Language Resources Association.
Hartmann N, Fonseca E, Shulby C, Treviso M, Rodrigues J, Aluisio S. Portuguese word embeddings: Evaluating on word analogies and natural language tasks. arXiv preprint arXiv:170806025. 2017.
Rajalaxmi RR, Prasad LVN, Janakiramaiah B, Pavankumar CS, Neelima N, Sathishkumar VE. Optimizing Hyperparameters and Performance Analysis of LSTM Model in Detecting Fake News on Social media. ACM Trans Asian Low-Resour Lang Inf Process. 2022.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Submission of a paper to Journal of Health Informatics is understood to imply that it is not being considered for publication elsewhere and that the author(s) permission to publish his/her (their) article(s) in this Journal implies the exclusive authorization of the publishers to deal with all issues concerning the copyright therein. Upon the submission of an article, authors will be asked to sign a Copyright Notice. Acceptance of the agreement will ensure the widest possible dissemination of information. An e-mail will be sent to the corresponding author confirming receipt of the manuscript and acceptance of the agreement.