Please use this identifier to cite or link to this item:
http://ir.lib.seu.ac.lk/handle/123456789/6463
Title: | An ensemble approach to detect hate speech in Tamil tweets |
Authors: | Shibly, F. H. A. Sharma, Uzzal Naleer, H. M. M. |
Keywords: | Machine Learning Deep Learning Algorithms Hate Speech Detection and Ensemble Model |
Issue Date: | 28-Sep-2022 |
Publisher: | Faculty of Islamic Studies and Arabic Language, South Eastern University of Sri Lanka, University Park, Oluvil. |
Citation: | Proceedings of the 9th International Symposium - 2022 on “Socio-Economic Development through Arabic and Islamic Studies”. 28th September 2022. South Eastern University of Sri Lanka, University Park, Oluvil, Sri Lanka. pp. 511. |
Abstract: | People have converged on a worldwide level because of advancements in communication technologies. They are critical in ensuring freedom of speech by allowing individuals to express their thoughts, behaviors, and opinions openly. Although this presents an excellent chance for racism, trolling, and exposure to a flood of offensive online content. As a result, the exponential growth of hate speech on social media significantly impacts society. In this research, we applied machine learning and deep learning algorithms to detect hate speech and compared the performances of those algorithms to develop an ensemble model. Researchers collected and combined two Tamil languages hate speech tweets datasets created by Bharathi Raja Chakravarthi et al. Tweets in this dataset are classified into two categories: not offensive and offensive. This dataset contains 10,129 tweets. Also, researchers selected six machine and deep learning algorithms for this study. Support Vector Machine (SVM), Logistic Regression (LR), Naïve Bayes (NB), Bidirectional LSTM, Multi-layer Perceptron (MLP) and Multilingual BERT were applied. Regarding detecting hate speech, SVM (82%) and LR (82%) have the best Accuracy. Furthermore, researchers developed two ensemble algorithms to construct the most efficient model. The first ensemble model was created by combining SVM, LR and NB and the second ensemble was developed using SVM and LR. Four algorithms, including the two ensemble models, obtained the same Accuracy. Therefore, the researchers compared the F1 score and found that the ensemble model 02 outperformed other classifiers. The findings of this research study are essential because these findings can be utilized as a model study for Tamil language hate speech to evaluate future research works using different machine learning algorithms for detecting hate speech more accurately and efficiently. |
URI: | http://ir.lib.seu.ac.lk/handle/123456789/6463 |
ISSN: | 978-624-5736-55-3 |
Appears in Collections: | 9th International Symposium |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
9th intsymfia - 2022 (finalized UNICODE - Proceeding) 511.pdf | 128.07 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.