An ensemble approach to detect hate speech in Tamil tweets

Shibly, F. H. A.; Sharma, Uzzal; Naleer, H. M. M.

Please use this identifier to cite or link to this item: http://ir.lib.seu.ac.lk/handle/123456789/6463

Title:	An ensemble approach to detect hate speech in Tamil tweets
Authors:	Shibly, F. H. A. Sharma, Uzzal Naleer, H. M. M.
Keywords:	Machine Learning Deep Learning Algorithms Hate Speech Detection and Ensemble Model
Issue Date:	28-Sep-2022
Publisher:	Faculty of Islamic Studies and Arabic Language, South Eastern University of Sri Lanka, University Park, Oluvil.
Citation:	Proceedings of the 9th International Symposium - 2022 on “Socio-Economic Development through Arabic and Islamic Studies”. 28th September 2022. South Eastern University of Sri Lanka, University Park, Oluvil, Sri Lanka. pp. 511.
Abstract:	People have converged on a worldwide level because of advancements in communication technologies. They are critical in ensuring freedom of speech by allowing individuals to express their thoughts, behaviors, and opinions openly. Although this presents an excellent chance for racism, trolling, and exposure to a flood of offensive online content. As a result, the exponential growth of hate speech on social media significantly impacts society. In this research, we applied machine learning and deep learning algorithms to detect hate speech and compared the performances of those algorithms to develop an ensemble model. Researchers collected and combined two Tamil languages hate speech tweets datasets created by Bharathi Raja Chakravarthi et al. Tweets in this dataset are classified into two categories: not offensive and offensive. This dataset contains 10,129 tweets. Also, researchers selected six machine and deep learning algorithms for this study. Support Vector Machine (SVM), Logistic Regression (LR), Naïve Bayes (NB), Bidirectional LSTM, Multi-layer Perceptron (MLP) and Multilingual BERT were applied. Regarding detecting hate speech, SVM (82%) and LR (82%) have the best Accuracy. Furthermore, researchers developed two ensemble algorithms to construct the most efficient model. The first ensemble model was created by combining SVM, LR and NB and the second ensemble was developed using SVM and LR. Four algorithms, including the two ensemble models, obtained the same Accuracy. Therefore, the researchers compared the F1 score and found that the ensemble model 02 outperformed other classifiers. The findings of this research study are essential because these findings can be utilized as a model study for Tamil language hate speech to evaluate future research works using different machine learning algorithms for detecting hate speech more accurately and efficiently.
URI:	http://ir.lib.seu.ac.lk/handle/123456789/6463
ISSN:	978-624-5736-55-3
Appears in Collections:	9th International Symposium

Files in This Item:

File	Description	Size	Format
9th intsymfia - 2022 (finalized UNICODE - Proceeding) 511.pdf		128.07 kB	Adobe PDF	View/Open

Show full item record