The Role of Data Pre-Processing Techniques and Classification Algorithms on the Accuracy of Sentiment Analysis in Social Media: A Literature Review

Authors

  • Salsabila Dwi Fitri Jambi University, Jambi, Indonesia Author
  • Yorasakhi Ananta Andalas University, Padang, West Sumatra, Indonesia Author

Keywords:

Sentiment Analysis, Social Media, Data Preprocessing, Classification Algorithm, Model Accuracy

Abstract

The development of digital technology and the explosion of data on social media have increased the need for accurate sentiment analysis to understand public opinion. This article aims to systematically review the role of data pre-processing techniques and classification algorithms in improving the accuracy of sentiment analysis in social media. Through the Systematic Literature Review (SLR) approach, more than 30 scientific articles from trusted sources were reviewed between 2018 and 2024. The results of the study show that effective pre-processing such as tokenization, stemming, and stop word removal significantly improve the quality of input data, while algorithms such as SVM, Random Forest, and deep learning provide the best performance in sentiment classification. This article is expected to be a conceptual reference for further research and the development of a more precise sentiment analysis system.

References

Ali, H. (2023). Data Science dan Penerapan Big Data di Indonesia. Jakarta: Gramedia Pustaka Utama.

Ali, H., & Limakrisna, N. (2013). Metodologi Penelitian: Aplikasi dalam Pemasaran. Jakarta: Mitra Wacana Media.

Anggraini, S., Nurhadi, D., & Maulana, A. (2021). Penerapan N-gram dan SVM dalam Analisis Sentimen Produk. Jurnal Teknologi Informasi dan Ilmu Komputer, 8(2), 110-117.

Damanik, R., & Pertiwi, L. (2021). Implementasi Dashboard Analisis Sentimen untuk Pengambilan Keputusan Instansi Pemerintah. Jurnal Sistem Informasi, 14(3), 225-234.

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.

Jurafsky, D., & Martin, J. H. (2021). Speech and Language Processing (3rd ed.). Pearson.

Lestari, D., & Nugroho, A. (2022). Analisis Sentimen Kebijakan Pemerintah di Media Sosial Menggunakan Support Vector Machine. Jurnal Ilmu Komputer dan Informasi, 15(1), 17–24.

Prasetyo, E., & Rakhmawati, R. (2022). Perbandingan Akurasi Naïve Bayes dan BERT dalam Klasifikasi Sentimen Twitter. Jurnal Teknologi dan Sistem Komputer, 10(3), 239–248.

Riyanto, D., & Saputra, R. (2021). Pengaruh Pra-Pemrosesan Multibahasa pada Akurasi Analisis Sentimen. Jurnal Penelitian Ilmu Komputer Indonesia, 6(1), 1–9.

Susanto, B., & Rahayu, F. (2023). Pemanfaatan Analisis Sentimen untuk Menilai Kepuasan Pelanggan pada Marketplace. Jurnal Teknologi Informasi dan Komputer, 11(2), 87–93.

Wulandari, R., & Sari, H. (2023). Optimalisasi TF-IDF dan XGBoost untuk Klasifikasi Sentimen Ulasan Produk. Jurnal Ilmu Komputer dan Rekayasa Perangkat Lunak, 9(2), 78–86.

Published

2025-06-15