Enhanced toxic comment detection model through Deep Learning models using Word embeddings and transformer architectures

Sushma S; Sasmita Kumari  Nayak; M. Vamsi  Krishna

Download

PDF

Statistic

Read Counter : 62 Download : 50

Abstract

The proliferation of harmful and toxic comments on social media platforms necessitates the development of robust methods for automatically detecting and classifying such content. This paper investigates the application of natural language processing (NLP) and ML techniques for toxic comment classification using the Jigsaw Toxic Comment Dataset. Several deep learning models, including recurrent neural networks (RNN, LSTM, and GRU), are evaluated in combination with feature extraction methods such as TF-IDF, Word2Vec, and BERT embeddings. The text data is pre-processed using both Word2Vec and TF-IDF techniques for feature extraction. Rather than implementing a combined ensemble output, the study conducts a comparative evaluation of model-embedding combinations to determine the most effective pairings. Results indicate that integrating BERT with traditional models (RNN+BERT, LSTM+BERT, GRU+BERT) leads to significant improvements in classification accuracy, precision, recall, and F1-score, demonstrating the effectiveness of BERT embeddings in capturing nuanced text features. Among all configurations, LSTM combined with Word2Vec and LSTM with BERT yielded the highest performance. This comparative approach highlights the potential of combining classical recurrent models with transformer-based embeddings as a promising direction for detecting toxic comments. The findings of this work provide valuable insights into leveraging deep learning techniques for toxic comment detection, suggesting future directions for refining such models in real-world applications.

Keywords

Toxic comment classification Word embeddings Ensemble modeling

How to Cite

S, S., Nayak, S. K. ., & Krishna, M. V. . (2025). Enhanced toxic comment detection model through Deep Learning models using Word embeddings and transformer architectures. Future Technology, 4(3), 76–84. Retrieved from https://fupubco.com/futech/article/view/324

Download Citation

References

Sasmita Kumari Nayak, “Classification of cyclones using machine learning techniques,” World Journal of Advanced Research and Reviews, vol. 20, no. 2. GSC Online Press, pp. 433–440, Nov. 30, 2023. doi: 10.30574/wjarr.2023.20.2.2156.
Z. Hao et al., "A Novel Public Sentiment Analysis Method Based on an Isomerism Learning Model via Multiphase Processing," in IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 1, pp. 249-259, Jan. 2025, doi: 10.1109/TNNLS.2023.3274912.
X. Wang, J. Lyu, B. -G. Kim, B. D. Parameshachari, K. Li and Q. Li, "Exploring Multimodal Multiscale Features for Sentiment Analysis Using Fuzzy-Deep Neural Network Learning," in IEEE Transactions on Fuzzy Systems, vol. 33, no. 1, pp. 28-42, Jan. 2025, doi: 10.1109/TFUZZ.2024.3419140.
H. T. Phan, V. D. Nguyen and N. T. Nguyen, "MulGCN: MultiGraph Convolutional Network for Aspect-Level Sentiment Analysis," in IEEE Access, vol. 13, pp. 26304-26317, 2025, doi: 10.1109/ACCESS.2025.3537340.
S. Ali, U. Jamil, M. Younas, B. Zafar and M. Kashif Hanif, "Optimized Identification of Sentence-Level Multiclass Events on Urdu-Language-Text Using Machine Learning Techniques," in IEEE Access, vol. 13, pp. 1-25, 2025, doi: 10.1109/ACCESS.2024.3522992.
W. Gong, "Text Sentiment Classification Algorithm Based on BiLSTM Deep Learning," 2024 International Conference on Industrial IoT, Big Data and Supply Chain (IIoTBDSC), Wuhan, China, 2024, pp. 83-87,doi:10.1109/IIoTBDSC64371.2024.00025.
S. A. Mostafa, W. S. Al-Dayyeni, A. N. Kareem, M. A. Jubair, M. M. Jaber and B. A. Khalaf, "Classification and Sentiment Analysis of Amazon Alexa Reviews," 2024 1st International Conference on Logistics (ICL), Jeddah, Saudi Arabia, 2024, pp. 1-5, doi: 10.1109/ICL62932.2024.10788570.
S. Mehta and A. Bhalla, "Enhanced Sentiment Classification with Federated Learning CNNs: Exploring Five Sentiment Categories," 2024 3rd International Conference for Advancement in Technology (ICONAT), GOA, India, 2024, pp. 1-5, doi: 10.1109/ICONAT61936.2024.10774751.
X. He, "Sentiment Classification of Social Media User Comments Using SVM Models," 2024 5th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT), Nanjing, China, 2024, pp. 1755-1759, doi: 10.1109/AINIT61980.2024.10581547.
M. Aamir, L. J and Sweety, "A Comparative Study of ML and DL Approaches for Twitter Sentiment Classification," 2024 International Conference on Electrical Electronics and Computing Technologies (ICEECT), Greater Noida, India, 2024, pp. 1-5, doi: 10.1109/ICEECT61758.2024.10738938.
Q. Zeng, "Design of Intelligent Sentiment Classification Model Based on Deep Neural Network Algorithm in Social Media," in IEEE Access, vol. 12, pp. 81047-81056, 2024, doi: 10.1109/ACCESS.2024.3409818.
M. Khalid et al., "Novel Sentiment Majority Voting Classifier and Transfer Learning-Based Feature Engineering for Sentiment Analysis of Deepfake Tweets," in IEEE Access, vol. 12, pp. 67117-67129, 2024, doi: 10.1109/ACCESS.2024.3398582.
A. Lakshmanarao, C. Gupta and T. S. R. Kiran, "Airline Twitter Sentiment Classification using Deep Learning Fusion," 2022 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON), Bangalore, India, 2022, pp. 1-4, doi: 10.1109/SMARTGENCON56628.2022.10084207.
Y. Matrane, F. Benabbou and Z. Ellaky, "Enhancing Moroccan Dialect Sentiment Analysis Through Optimized Preprocessing and Transfer Learning Techniques," in IEEE Access, vol. 12, pp. 187756-187777, 2024, doi: 10.1109/ACCESS.2024.3514934.
H. Shuqin and R. C. Raga, "A Deep Learning Model for Student Sentiment Analysis on Course Reviews," in IEEE Access, vol. 12, pp. 136747-136758, 2024, doi: 10.1109/ACCESS.2024.3463793.
A. He and M. Abisado, "Text Sentiment Analysis of Douban Film Short Comments Based on BERT-CNN-BiLSTM-Att Model," in IEEE Access, vol. 12, pp. 45229-45237, 2024, doi: 10.1109/ACCESS.2024.3381515.
Z. Wang, G. Xu, X. Zhou, J. Y. Kim, H. Zhu and L. Deng, "Deep Tensor Evidence Fusion Network for Sentiment Classification," in IEEE Transactions on Computational Social Systems, vol. 11, no. 4, pp. 4605-4613, Aug. 2024, doi: 10.1109/TCSS.2022.3197994.
S. K. Putri, A. Amalia and T. F. Abidin, "Sentiment Analysis Multi-Label of Toxic Comments using BERT-BiLSTM Methods," 2024 International Conference on Electrical Engineering and Informatics (ICELTICs), Banda Aceh, Indonesia, 2024, pp. 120-124, doi: 10.1109/ICELTICs62730.2024.10776338.
A. Lakshmanarao, A. Srisaila and T. S. R. Kiran, "Twitter Sentiment Classification with Deep Learning LSTM for Airline Tweets," 2022 8th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 2022, pp. 520-524, doi: 10.1109/ICACCS54159.2022.9785208.
S. Dutta, M. Neog and N. Baruah, "Assamese Toxic Comment Detection On Social Media Using Machine Learning Methods," 2024 Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE), Vellore, India, 2024, pp. 1-8, doi: 10.1109/ic-ETITE58242.2024.10493331.
Y. Mamani-Coaquira and E. Villanueva, "A Review on Text Sentiment Analysis With Machine Learning and Deep Learning Techniques," in IEEE Access, vol. 12, pp. 193115-193130, 2024, doi: 10.1109/ACCESS.2024.3513321.
Rahul, H. Kajla, J. Hooda and G. Saini, "Classification of Online Toxic Comments Using Machine Learning Algorithms," 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 2020, pp. 1119-1123, doi: 10.1109/ICICCS48265.2020.9120939.
M. Aquino et al., "Toxic Comment Detection: Analyzing the Combination of Text and Emojis," 2021 IEEE 18th International Conference on Mobile Ad Hoc and Smart Systems (MASS), Denver, CO, USA, 2021, pp. 661-662, doi: 10.1109/MASS52906.2021.00097.
T. V. Sai Krishna, T. S. Rama Krishna, S. Kalime, C. V. Murali Krishna, S. Neelima, and R. R. PBV, “A novel ensemble approach for Twitter sentiment classification with ML and LSTM algorithms for real-time tweets analysis,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 34, no. 3. Institute of Advanced Engineering and Science, p. 1904, Jun. 01, 2024. doi: 10.11591/ijeecs.v34.i3.pp1904-1914..
N. K. Singh and S. Chand, "Machine Learning-based Multilabel Toxic Comment Classification," 2022 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India, 2022, pp. 435-439, doi: 10.1109/ICCCIS56430.2022.10037626..
N. L. V. Venugopal, P. Kanchanamala, S. Muppidi, T. B. Prakash, T. Neelima and S. A. Devi, "Multilingual Toxic Comment Classification using Deep Learning," 2024 2nd International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS), Erode, India, 2024, pp. 752-757, doi: 10.1109/ICSSAS64001.2024.10760913.
N. Boudjani, Y. Haralambous and I. Lyubareva, "Toxic Comment Classification For French Online Comments," 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 2020, pp. 1010-1014, doi: 10.1109/ICMLA51294.2020.00164.
A. Jessica, M. S. Sugiarto, Jerry, S. Achmad and R. Sutoyo, "A Hybrid Deep Learning Techniques Using BERT and CNN for Toxic Comments Classification," 2024 International Conference on Information Management and Technology (ICIMTech), Bali, Indonesia, 2024, pp. 393-398, doi: 10.1109/ICIMTech63123.2024.10780934.
https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification/data.

References

Sasmita Kumari Nayak, “Classification of cyclones using machine learning techniques,” World Journal of Advanced Research and Reviews, vol. 20, no. 2. GSC Online Press, pp. 433–440, Nov. 30, 2023. doi: 10.30574/wjarr.2023.20.2.2156.

Z. Hao et al., "A Novel Public Sentiment Analysis Method Based on an Isomerism Learning Model via Multiphase Processing," in IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 1, pp. 249-259, Jan. 2025, doi: 10.1109/TNNLS.2023.3274912.

X. Wang, J. Lyu, B. -G. Kim, B. D. Parameshachari, K. Li and Q. Li, "Exploring Multimodal Multiscale Features for Sentiment Analysis Using Fuzzy-Deep Neural Network Learning," in IEEE Transactions on Fuzzy Systems, vol. 33, no. 1, pp. 28-42, Jan. 2025, doi: 10.1109/TFUZZ.2024.3419140.

H. T. Phan, V. D. Nguyen and N. T. Nguyen, "MulGCN: MultiGraph Convolutional Network for Aspect-Level Sentiment Analysis," in IEEE Access, vol. 13, pp. 26304-26317, 2025, doi: 10.1109/ACCESS.2025.3537340.

S. Ali, U. Jamil, M. Younas, B. Zafar and M. Kashif Hanif, "Optimized Identification of Sentence-Level Multiclass Events on Urdu-Language-Text Using Machine Learning Techniques," in IEEE Access, vol. 13, pp. 1-25, 2025, doi: 10.1109/ACCESS.2024.3522992.

W. Gong, "Text Sentiment Classification Algorithm Based on BiLSTM Deep Learning," 2024 International Conference on Industrial IoT, Big Data and Supply Chain (IIoTBDSC), Wuhan, China, 2024, pp. 83-87,doi:10.1109/IIoTBDSC64371.2024.00025.

S. A. Mostafa, W. S. Al-Dayyeni, A. N. Kareem, M. A. Jubair, M. M. Jaber and B. A. Khalaf, "Classification and Sentiment Analysis of Amazon Alexa Reviews," 2024 1st International Conference on Logistics (ICL), Jeddah, Saudi Arabia, 2024, pp. 1-5, doi: 10.1109/ICL62932.2024.10788570.

S. Mehta and A. Bhalla, "Enhanced Sentiment Classification with Federated Learning CNNs: Exploring Five Sentiment Categories," 2024 3rd International Conference for Advancement in Technology (ICONAT), GOA, India, 2024, pp. 1-5, doi: 10.1109/ICONAT61936.2024.10774751.

X. He, "Sentiment Classification of Social Media User Comments Using SVM Models," 2024 5th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT), Nanjing, China, 2024, pp. 1755-1759, doi: 10.1109/AINIT61980.2024.10581547.

M. Aamir, L. J and Sweety, "A Comparative Study of ML and DL Approaches for Twitter Sentiment Classification," 2024 International Conference on Electrical Electronics and Computing Technologies (ICEECT), Greater Noida, India, 2024, pp. 1-5, doi: 10.1109/ICEECT61758.2024.10738938.

Q. Zeng, "Design of Intelligent Sentiment Classification Model Based on Deep Neural Network Algorithm in Social Media," in IEEE Access, vol. 12, pp. 81047-81056, 2024, doi: 10.1109/ACCESS.2024.3409818.

M. Khalid et al., "Novel Sentiment Majority Voting Classifier and Transfer Learning-Based Feature Engineering for Sentiment Analysis of Deepfake Tweets," in IEEE Access, vol. 12, pp. 67117-67129, 2024, doi: 10.1109/ACCESS.2024.3398582.

A. Lakshmanarao, C. Gupta and T. S. R. Kiran, "Airline Twitter Sentiment Classification using Deep Learning Fusion," 2022 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON), Bangalore, India, 2022, pp. 1-4, doi: 10.1109/SMARTGENCON56628.2022.10084207.

Y. Matrane, F. Benabbou and Z. Ellaky, "Enhancing Moroccan Dialect Sentiment Analysis Through Optimized Preprocessing and Transfer Learning Techniques," in IEEE Access, vol. 12, pp. 187756-187777, 2024, doi: 10.1109/ACCESS.2024.3514934.

H. Shuqin and R. C. Raga, "A Deep Learning Model for Student Sentiment Analysis on Course Reviews," in IEEE Access, vol. 12, pp. 136747-136758, 2024, doi: 10.1109/ACCESS.2024.3463793.

A. He and M. Abisado, "Text Sentiment Analysis of Douban Film Short Comments Based on BERT-CNN-BiLSTM-Att Model," in IEEE Access, vol. 12, pp. 45229-45237, 2024, doi: 10.1109/ACCESS.2024.3381515.

Z. Wang, G. Xu, X. Zhou, J. Y. Kim, H. Zhu and L. Deng, "Deep Tensor Evidence Fusion Network for Sentiment Classification," in IEEE Transactions on Computational Social Systems, vol. 11, no. 4, pp. 4605-4613, Aug. 2024, doi: 10.1109/TCSS.2022.3197994.

S. K. Putri, A. Amalia and T. F. Abidin, "Sentiment Analysis Multi-Label of Toxic Comments using BERT-BiLSTM Methods," 2024 International Conference on Electrical Engineering and Informatics (ICELTICs), Banda Aceh, Indonesia, 2024, pp. 120-124, doi: 10.1109/ICELTICs62730.2024.10776338.

A. Lakshmanarao, A. Srisaila and T. S. R. Kiran, "Twitter Sentiment Classification with Deep Learning LSTM for Airline Tweets," 2022 8th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 2022, pp. 520-524, doi: 10.1109/ICACCS54159.2022.9785208.

S. Dutta, M. Neog and N. Baruah, "Assamese Toxic Comment Detection On Social Media Using Machine Learning Methods," 2024 Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE), Vellore, India, 2024, pp. 1-8, doi: 10.1109/ic-ETITE58242.2024.10493331.

Y. Mamani-Coaquira and E. Villanueva, "A Review on Text Sentiment Analysis With Machine Learning and Deep Learning Techniques," in IEEE Access, vol. 12, pp. 193115-193130, 2024, doi: 10.1109/ACCESS.2024.3513321.

Rahul, H. Kajla, J. Hooda and G. Saini, "Classification of Online Toxic Comments Using Machine Learning Algorithms," 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 2020, pp. 1119-1123, doi: 10.1109/ICICCS48265.2020.9120939.

M. Aquino et al., "Toxic Comment Detection: Analyzing the Combination of Text and Emojis," 2021 IEEE 18th International Conference on Mobile Ad Hoc and Smart Systems (MASS), Denver, CO, USA, 2021, pp. 661-662, doi: 10.1109/MASS52906.2021.00097.

T. V. Sai Krishna, T. S. Rama Krishna, S. Kalime, C. V. Murali Krishna, S. Neelima, and R. R. PBV, “A novel ensemble approach for Twitter sentiment classification with ML and LSTM algorithms for real-time tweets analysis,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 34, no. 3. Institute of Advanced Engineering and Science, p. 1904, Jun. 01, 2024. doi: 10.11591/ijeecs.v34.i3.pp1904-1914..

N. K. Singh and S. Chand, "Machine Learning-based Multilabel Toxic Comment Classification," 2022 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India, 2022, pp. 435-439, doi: 10.1109/ICCCIS56430.2022.10037626..

N. L. V. Venugopal, P. Kanchanamala, S. Muppidi, T. B. Prakash, T. Neelima and S. A. Devi, "Multilingual Toxic Comment Classification using Deep Learning," 2024 2nd International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS), Erode, India, 2024, pp. 752-757, doi: 10.1109/ICSSAS64001.2024.10760913.

N. Boudjani, Y. Haralambous and I. Lyubareva, "Toxic Comment Classification For French Online Comments," 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 2020, pp. 1010-1014, doi: 10.1109/ICMLA51294.2020.00164.

A. Jessica, M. S. Sugiarto, Jerry, S. Achmad and R. Sutoyo, "A Hybrid Deep Learning Techniques Using BERT and CNN for Toxic Comments Classification," 2024 International Conference on Information Management and Technology (ICIMTech), Bali, Indonesia, 2024, pp. 393-398, doi: 10.1109/ICIMTech63123.2024.10780934.

https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification/data.

Enhanced toxic comment detection model through Deep Learning models using Word embeddings and transformer architectures

Article Sidebar

Main Article Content

Abstract

Keywords

Article Details

References

References