An Ensemble-based Model for Sentiment Analysis of Kurdish Tweets

Authors

DOI:

https://doi.org/10.14500/aro.12255

Keywords:

Deep learning, Ensemble method, Kurdish language, Machine learning, Roberta word embedding, Sentiment analysis

Abstract

Thousands of comments are generated daily on social media in the Kurdistan Region. Sentiment analysis (SA) of these comments is valuable for organizations. The Kurdish language has three main dialects: Sorani (Central), Northern, and Southern. This study focuses on Sorani SA, where existing methods have limited accuracy. The proposed ensemble combines diverse models to improve sentiment classification. Preprocessing and word embedding using Roberta is the first phase of the method. The second phase consists of four proposed models, namely K-nearest neighbor, support vector machine, multilayer perceptron long short-term memory (LSTM), and bidirectional-LSTM (Bi-LSTM), which are used as classifiers. Finally, the ensemble weighted averaging technique is utilized to generate the final classification. To evaluate the performance of the proposed model, a dataset including 24211 unbalanced Soran tweets is first used, and after balancing, the dataset is used. The Bi-LSTM model attained an accuracy of 89.87% on the balanced dataset, and the proposed ensemble method increased the accuracy to 91.76%, which is better than the established state-of-the-art methods of Kurdish SA.

Downloads

Download data is not yet available.

Author Biographies

Sabat S. Muhamad, Department of Computer Science, Faculty of Science, Soran University, Soran, Kurdistan Region – F.R. Iraq

Sabat Salih Muhamad is a Lecturer at the Department of Computer Science, Faculty of Science, Soran University. She received the B.Sc. degree in statistics and computer science at the University of Sulaymaniyah in 2006 and the M.Sc. degree in artificial intelligence at Soran University in 2022. Her research interests include text-to-speech systems, machine learning, natural language processing and image processing.

Abdulhady A. Abdullah, Artificial Intelligence and Innovation Centre, University of Kurdistan Hewlêr, Erbil, Kurdistan Region – F.R. Iraq

Abdulhady A. Abdullah is a Researcher at the Artificial Intelligence and Innovation Centre, University of Kurdistan Hewlêr, Erbil, Kurdistan Region – F.R. Iraq. He received the B.Sc. and M.Sc. degrees in Computer Science. His research interests include natural language processing (NLP)

Hakem Beitollahi, Department of Computer Science, Faculty of Science, Soran University, Soran, Kurdistan Region – F.R. Iraq

Hakem Beitollahi is an Assistant Professor at the department of computer science, Soran University, Kurdistan Region, Iraq. He received the B.S. degree in Computer Engineering from the University of Tehran in 2002, the M.S. degree from Sharif University of Technology, Tehran, Iran, in 2005, and the Ph.D. degree from Katholieke Universiteit Leuven (KU Leuven), Belgium, in 2012. His research interests include artificial intelligence, cybersecurity and hardware accelerators for deep learning.

Shamal A. Abdullah, Department of English, Faculty of Arts, Soran University, SoranKurdistan Region – F.R. Iraq

Shamal A. Abdullah is an Assistant Lecturer at the Department of English, Faculty of Arts, Soran University. He got the B.A. degree in English Langue, the M.A. degree in English Linguistics and he is currently a Ph.D. Student in Computational Linguistics at University of Arizona, USA. His research interests are in computational linguistics, english language, general linguistics and corpus linguistics. 

Rezhin S. Shahab, Department of Computer Science, Faculty of Science, Soran University, Soran, Kurdistan Region – F.R. Iraq

Rezhin S. Sleman is a lecturer at the Department of Computer Science at Zagros Institute. During the academic year 2025-2026, She obtained a bachelor's degree in computer science from Soran University. Throughout her studies, she has been actively involved in various volunteer activities related to her field. She has also earned several degrees in various fields.

Ashna D. Zrar, Department of Computer Science, Faculty of Science, Soran University, Soran, Kurdistan Region – F.R. Iraq

Ashna D. Zrar holds a Bachelor's degree in Computer Science from Soran University in 2025, working in the field of graphic design in the labor market. For her research, she worked on Kurdish text-related projects for NLP.

References

Abdulla, S., and Hama, M.H., 2015. Sentiment analyses for Kurdish social network texts using naive bayes classifier. Journal of University of Human Development, 1(4), pp.393-397. DOI: https://doi.org/10.21928/juhd.v1n4y2015.pp393-397

Abdullah, A.A., Abdulla, S.H., Toufiq, D.M., Maghdid, H.S., Rashid, T.A., Farho, P.F., Sabr, S.S., Taher, A.H., Hamad, D.S., Veisi, H., and Asaad, A.T., 2024. NER- RoBERTa: Fine-Tuning RoBERTa for Named Entity Recognition (NER) within Low-Resource Languages. [arXiv Preprint].

Abdullah, M., and Shaikh, S., 2018. Teamuncc at Semeval-2018 Task 1: Emotion Detection in English and Arabic Tweets Using Deep Learning. In: Proceedings of the 12th International Workshop on Semantic Evaluation. pp.350-357. DOI: https://doi.org/10.18653/v1/S18-1053

Ahmadi, S., 2020. KLPT - Kurdish Language Processing Toolkit. In: Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS). pp.72-84. DOI: https://doi.org/10.18653/v1/2020.nlposs-1.11

Alowisheq, A., Al-Twairesh, N., Altuwaijri, M., Almoammar, A., Alsuwailem, A., Albuhairi, T., Alahaideb, W., and Alhumoud, S., 2021. MARSA: Multi-domain Arabic resources for sentiment analysis. IEEE Access, 9, pp.142718-142728.

Albuhairi, T., Alahaideb, W., and Alhumoud, S., 2021. MARSA: Multi-domain Arabic resources for sentiment analysis. IEEE Access, 9, pp.142718-142728. DOI: https://doi.org/10.1109/ACCESS.2021.3120746

Al-Smadi, M., Qawasmeh, O., Al-Ayyoub, M., Jararweh, Y., and Gupta, B., 2018. Deep recurrent neural network vs. Support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews. Journal of Computational Science, 27, pp.386-393. DOI: https://doi.org/10.1016/j.jocs.2017.11.006

Amin, M.H.S.M., Al-Rassam, O., and Faeq, Z.S., 2022. Kurdish language sentiment analysis: Problems and challenges. Mathematical Statistician and Engineering Applications, 71(4), pp.3282-3293. DOI: https://doi.org/10.17762/msea.v71i4.890

Ashraf, M.R., Jana, Y., Umer, Q., Jaffar, M.A., Chung, S., and Ramay, W.Y., 2023. BERT-based sentiment analysis for low-resourced languages: A case study of Urdu language. IEEE Access, 11, pp.110245-110259. DOI: https://doi.org/10.1109/ACCESS.2023.3322101

Awlla, K., and Veisi, H., 2022. Central Kurdish sentiment analysis using deep learning. Journal of University of Anbar for Pure science, 16(2), pp.119-130. DOI: https://doi.org/10.37652/juaps.2022.176501

Badawi, S., 2023. KMD: A New Kurdish Multilabel Emotional Dataset for the Kurdish Sorani Dialect. In: Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023). pp.308-315.

Badawi, S., Kazemi, A., and Rezaie, V., 2024. KurdiSent: A corpus for Kurdish sentiment analysis. Language Resources and Evaluation, pp.1-20. Bordoloi, M., and Biswas, S.K., 2023. Sentiment analysis: A survey on design framework, applications and future scopes. Artificial Intelligence Review, 56, pp.12505-12560. DOI: https://doi.org/10.1007/s10462-023-10442-2

Chouikhi, H., Chniter, H., and Jarray, F., 2021. Arabic Sentiment Analysis Using BERT Model. In: International Conference on Computational Collective Intelligence. Springer International Publishing, Cham, pp.621-632. DOI: https://doi.org/10.1007/978-3-030-88113-9_50

Esmaili, K.S., Eliassi, D., Salavati, S., Aliabadi, P., Mohammadi, A., Yosefi, S., and Hakimi, S., 2013. Building a Test Collection for Sorani Kurdish. In: 2013 ACS International Conference on Computer Systems and Applications (AICCSA). IEEE, pp.1-7. DOI: https://doi.org/10.1109/AICCSA.2013.6616470

Eyvazi-Abdoljabbar, S., Kim, S., Feizi-Derakhshi, M.R., Farhadi, Z., and Mohammed, D.A., 2024. An Ensemble-based Model for Sentiment Analysis of Persian Comments on Instagram Using Deep Learning Algorithms. IEEE Access. DOI: https://doi.org/10.1109/ACCESS.2024.3473617

Heikal, M., Torki, M., and El-Makky, N., 2018. Sentiment analysis of Arabic tweets using deep learning. Procedia Computer Science, 142, pp.114-122. DOI: https://doi.org/10.1016/j.procs.2018.10.466

Hossin, M., Sulaiman, M.N., Mustapha, A., Mustapha, N., and Rahmat, R.W., 2011. A Hybrid Evaluation Metric for Optimizing Classifier. In: 2011 3rd Conference on Data Mining and Optimization (DMO). IEEE, pp.165-170. DOI: https://doi.org/10.1109/DMO.2011.5976522

Jaf, S., and Ramsay, A., 2014. Stemmer and a POS Tagger for Sorani Kurdish. In: 6th International Conference on Corpus Linguistics. Karim, S.H.T., 2024. Kurdish social media sentiment corpus: Misyar marriage perspectives. Data in Brief, 57, p.110989. DOI: https://doi.org/10.1016/j.dib.2024.110989

Mahmud, D., Abdalla, B.A., and Faraj, A., 2023. Twitter sentiment analysis for Kurdish language. Qalaai Zanist Journal, 8(4), pp.1132-1144. DOI: https://doi.org/10.25212/lfu.qzj.8.4.42

Medhat, W., Hassan, A., and Korashy, H., 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), pp.1093-1113. DOI: https://doi.org/10.1016/j.asej.2014.04.011

Mohammed, F.S., Zakaria, L., Omar, N., and Albared, M.Y., 2012. Automatic Kurdish SORANi Text Categorization using N-Gram based Model. In: 2012 International Conference on Computer and Information Science (ICCIS). Vol. 1, IEEE, pp.392-395. DOI: https://doi.org/10.1109/ICCISci.2012.6297277

Mustafa, A.M., and Rashid, T.A., 2018. Kurdish stemmer preprocessing steps for improving information retrieval. Journal of Information Science, 44(1), pp.15-27. DOI: https://doi.org/10.1177/0165551516683617

Paredes-Valverde, M.A., Colomo-Palacios, R., Salas-Zárate, M.D.P., and Valencia-García, R., 2017. Sentiment analysis in Spanish for improvement of products and services: A deep learning approach. Scientific Programming, 2017(1), p.1329281. DOI: https://doi.org/10.1155/2017/1329281

Pouyanfar, S., Sadiq, S., Yan, Y., Tian, H., Tao, Y., Reyes, M.P., Shyu, M.L., Chen, S.C., and Iyengar, S.S., 2018. A survey on deep learning: Algorithms, techniques, and applications. ACM Computing Surveys, 51(5), pp.1-36. DOI: https://doi.org/10.1145/3234150

Roshanfekr, B., Khadivi, S., and Rahmati, M., 2017. Sentiment Analysis Using Deep Learning on Persian Texts. In: 2017 Iranian Conference on Electrical Engineering (ICEE). IEEE, pp.1503-1508. DOI: https://doi.org/10.1109/IranianCEE.2017.7985281

Sarker, I.H., 2021. Deep learning: A comprehensive overview on techniques, taxonomy, applications and research directions. SN Computer Science, 2(6), p.420. DOI: https://doi.org/10.1007/s42979-021-00815-1

Shakeel, M.H., Faizullah, S., Alghamidi, T., and Khan, I., 2020. Language Independent Sentiment Analysis. In: 2019 International Conference on Advances in the Emerging Computing Technologies (AECT). IEEE, pp.1-5. DOI: https://doi.org/10.1109/AECT47998.2020.9194186

Sumit, S.H., Hossan, M.Z., Al Muntasir, T., and Sourov, T., 2018. Exploring Word Embedding for Bangla Sentiment Analysis. In: 2018 International Conference on Bangla Speech and Language Processing (ICBSLP). IEEE, pp.1-5. DOI: https://doi.org/10.1109/ICBSLP.2018.8554443

Tsai, C.F., Chen, K., Hu, Y.H., and Chen, W.K., 2020. Improving text summarization of online hotel reviews with review helpfulness and sentiment. Tourism Management, 80, p.104122. DOI: https://doi.org/10.1016/j.tourman.2020.104122

Vateekul, P., and Koomsubha, T., 2016. A Study of Sentiment Analysis Using Deep Learning Techniques on Thai Twitter Data. In: 2016 13th International Joint Conference on Computer Science and Software Engineering (JCSSE). IEEE, pp.1-6. DOI: https://doi.org/10.1109/JCSSE.2016.7748849

Wady, S.H., Badawi, S., and Kurt, F., 2024. A Kurdish Sorani twitter dataset for language modelling. Data in Brief, 57, 110967. DOI: https://doi.org/10.1016/j.dib.2024.110967

Walther, G., and Sagot, B., 2010. Developing a Large-Scale Lexicon for a LessResourced Language: General Methodology and Preliminary Experiments on Sorani Kurdish. In: Proceedings of the 7th SaLTMiL Workshop on Creation and Use of basic Lexical Resources for Less-Resourced Languages (LREC 2010 Workshop).

Yu, Y., Si, X., Hu, C., and Zhang, J., 2019. A review of recurrent neural networks: LSTM cells and network architectures. Neural Computation, 31(7), pp.1235-1270. DOI: https://doi.org/10.1162/neco_a_01199

Published

2025-12-11

How to Cite

Muhamad, S. S. (2025) “An Ensemble-based Model for Sentiment Analysis of Kurdish Tweets”, ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY, 13(2), pp. 349–358. doi: 10.14500/aro.12255.
Received 2025-05-07
Accepted 2025-10-24
Published 2025-12-11

Most read articles by the same author(s)

Similar Articles

<< < 9 10 11 12 13 14 15 16 17 18 > >> 

You may also start an advanced similarity search for this article.