Exploring COVID-19 public perceptions in South Africa through sentiment analysis and topic modelling of Twitter posts
DOI:
https://doi.org/10.23962/ajic.i31.14834Keywords:
sentiment analysis, sentiment classification, topic modelling, social media, Twitter, natural language processing (NLP), COVID-19, South Africa, government response, public perceptionsAbstract
The narratives shared on social media during a health crisis such as COVID-19 reflect public perceptions of the crisis. This article provides findings from a study of the perceptions of South African citizens regarding the government’s response to the COVID-19 pandemic from March to May 2020. The study analysed Twitter data from posts by government officials and the public in South Africa to measure the public’s confidence in how the government was handling the pandemic. Results produced by four popular machine-learning classifiers for sentiment analysis— logistic regression (LR), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost)—demonstrated these classifiers’ levels of effectiveness. In addition, the study used, and evaluated the effectiveness of, two topic-modelling algorithms—latent dirichlet allocation (LDA) and non-negative matrix factorisation (NMF)—in the classification of social media discourses in terms of frequently occurring topics. In terms of South African public sentiment towards COVID-19 and the government’s response, it was found that, based on the Twitter data, South Africans held predominantly negative views.
References
Aljameel, S. S., Alabbad, D. A., Alzahrani, N. A., Alqarni, S. M., Alamoudi, F. A., Babili, L. M., Aljaafary, S. K., & Alshamrani, F. M. (2021). A sentiment analysis approach to predict an individual’s awareness of the precautionary procedures to prevent COVID-19 outbreaks in Saudi Arabia. International Journal of Environmental Research and Public Health, 18(1), 1–12. https://doi.org/10.3390/ijerph18010218
Baccianella, S., Esuli, A., & Sebastiani, F. (2010). Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). http://www.lrec-conf.org/proceedings/lrec2010/pdf/769_Paper.pdf
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022. https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
Chai, D., Wu, W., Han, Q., Wu, F., & Li, J. (2020). Description based text classification with reinforcement learning. Paper presented to International Conference on Machine Learning. http://proceedings.mlr.press/v119/chai20a/chai20a.pdf
Cruz, L., Ochoa, J., Roche, M., & Poncelet, P. (2015). Dictionary-based sentiment analysis applied to a specific domain. In SIMBig 2015: Information Management and Big Data (pp. 57–68). Springer. https://doi.org/10.1007/978-3-319-55209-5_5
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407. https://asistdl.onlinelibrary.wiley.com/doi/10.1002/(SICI)1097-4571(199009)41:6%3C391::AID-ASI1%3E3.0.CO;2-9
Domalewska, D. (2021). An analysis of COVID-19 economic measures and attitudes: Evidence from social media mining. Journal of Big Data, 8(1), 1–14. https://doi.org/10.1186/s40537-021-00431-z
Esuli, A., & Sebastiani, F. (2006). Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06). http://www.lrec-conf.org/proceedings/lrec2006/pdf/384_pdf.pdf
Ghebreyesus, T. A. (2020, March 11). WHO Director-General’s opening remarks at the media briefing on COVID-19. https://www.who.int/director-general/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19---11-march-2020
Gulati, K., Kumar, S. S., Boddu, R. S. K., Sarvakar, K., Sharma, D. K., & Nomani, M. (2022). Comparative analysis of machine learning-based classification models using sentiment classification of tweets related to COVID-19 pandemic. Materials Today: Proceedings, 51(1), 38–41. https://doi.org/10.1016/j.matpr.2021.04.364
Hou, K., Hou, T., & Cai, L. (2021). Public attention about COVID-19 on social media: An investigation based on data mining and text analysis. Personality and Individual Differences, 175, 110701. https://doi.org/10.1016/j.paid.2021.110701
Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/1014052.1014073
Hutto, C., & Gilbert, E. (2014, May). VADER: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the international AAAI conference on web and social media (Vol. 8, No. 1, pp. 216-225). https://doi.org/10.1609/icwsm.v8i1.14550
Jiang, W., Zhou, K., Xiong, C., Du, G., Ou, C., & Zhang, J. (2022). KSCB: A novel unsupervised method for text sentiment analysis. Applied Intelligence. https://doi.org/10.1007/s10489-022-03389-4
Jurafsky, D., & Martin, J. H. (2018). N-gram language models. In Speech and language processing [draft book]. https://web.stanford.edu/~jurafsky/slp3/3.pdf
Kiritchenko, S., Zhu, X., Cherry, C., & Mohammad, S. (2014). NRC-Canada-2014: Detecting aspects and sentiment in customer reviews. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). https://doi.org/10.3115/v1/S14-2076
Kuyo, M., Mwalili, S., & Okang’o, E. (2021). Machine learning approaches for classifying the distribution of Covid-19 sentiments. Open Journal of Statistics, 11(5), 620–632. https://doi.org/10.4236/ojs.2021.115037
Loper, E., & Bird, S. (2002). NLTK: The natural language toolkit. arXiv preprint cs/0205028. https://doi.org/10.3115/1118108.1118117
Marivate, V., Moodley, A., & Saba, A. (2021). Extracting and categorising the reactions to COVID-19 by the South African public - A social media study. Paper presented to 2021 IEEE AFRICON. https://doi.org/10.1109/AFRICON51333.2021.9571010
Mendez-Brito, A., El Bcheraoui, C., & Pozo-Martin, F. (2021). Systematic review of empirical studies comparing the effectiveness of non-pharmaceutical interventions against COVID-19. Journal of Infection, 83(3), 281–293. https://doi.org/10.1016/j.jinf.2021.06.018
Mohammad, S. (2018). Obtaining reliable human ratings of valence, arousal, and dominance for 20,000 English words. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). https://doi.org/10.18653/v1/P18-1017
Mohammad, S. M., Kiritchenko, S., & Zhu, X. (2013). NRC-Canada: Building the state- of-the-art in sentiment analysis of tweets. arXiv preprint arXiv:1308.6242
Moussa, M. E., Mohamed, E. H., & Haggag, M. H. (2020). A generic lexicon-based framework for sentiment analysis. International Journal of Computers and Applications, 42(5), 463–473. https://doi.org/10.1080/1206212X.2018.1483813
Mutanga, M. B., & Abayomi, A. (2022). Tweeting on COVID-19 pandemic in South Africa: LDA-based topic modelling approach. African Journal of Science, Technology, Innovation and Development, 14(1), 163–172. https://doi.org/10.1080/20421338.2020.1817262
Nielsen, F. Å. (2011). A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903. https://doi.org/10.48550/arXiv.1103.2903
Nwankwo, E., Okolo, C., & Habonimana, C. (2020). Topic modeling approaches for understanding COVID-19 misinformation spread in Sub-Saharan Africa. Paper presented to AI for Social Good Workshop. https://crcs.seas.harvard.edu/files/crcs/files/ai4sg_2020_paper_70.pdf
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., & Johnson, D. (2005). Terrier information retrieval platform. In Advances in Information. Paper presented to 27th European Conference on Information Retrieval (ECIR 2005), Santiago de Compostela, Spain, 21–23 March. https://doi.org/10.1007/978-3-540-31865-1_37
Park, S.-J., Chae, D.-K., Bae, H.-K., Park, S., & Kim, S.-W. (2022). Reinforcement learning over sentiment-augmented knowledge graphs towards accurate and explainable recommendation. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. https://doi.org/10.1145/3488560.3498515
Ramaphosa, C. (2022, March 15). President Cyril Ramaphosa: Measures to combat coronavirus COVID-19 epidemic. https://www.gov.za/speeches/statement-president-cyril-ramaphosa-measures-combat-covid-19-epidemic-15-mar-2020-0000
Relman, D. A. (2020). To stop the next pandemic, we need to unravel the origins of COVID-19. Proceedings of the National Academy of Sciences, 117(47), 29246–29248. https://doi.org/10.1073/pnas.2021133117
Rice, D. R., & Zorn, C. (2021). Corpus-based dictionaries for sentiment analysis of specialized vocabularies. Political Science Research and Methods, 9(1), 20–35. https://doi.org/10.1017/psrm.2019.10
Rustam, F., Khalid, M., Aslam, W., Rupapara, V., Mehmood, A., & Choi, G. S. (2021). A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis. PLoS ONE, 16(2), e0245909. https://doi.org/10.1371/journal.pone.0245909
Soumya, S., & Pramod, K. (2021). Fine grained sentiment analysis of Malayalam tweets using lexicon based and machine learning based approaches. Paper presented to 2021 4th Biennial International Conference on Nascent Technologies in Engineering (ICNTE). https://doi.org/10.1109/ICNTE51185.2021.9487741
Stone, P. J., & Hunt, E. B. (1963). A computer approach to content analysis: Studies using the general inquirer system. In Proceedings of the May 21–23, 1963, Spring Joint Computer Conference. https://doi.org/10.1145/1461551.1461583
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational Linguistics, 37(2), 267–307. https://doi.org/10.1162/COLI_a_00049
Tao, G., Miao, Y., & Ng, S. (2020). COVID-19 topic modeling and visualization. In IV 2020: 24th International Conference Information Visualisation. https://doi.org/10.1109/IV51561.2020.00129
Wan, X., Lucic, M. C., Ghazzai, H., & Massoud, Y. (2021). Topic modeling and progression of American digital news media during the onset of the COVID-19 pandemic. arXiv:2106.09572. https://doi.org/10.1109/TTS.2021.3088800
Wiebe, J., Wilson, T., & Cardie, C. (2005). Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39(2), 165–210. https://doi.org/10.1007/s10579-005-7880-9
Xing, Y., Li, Y., & Wang, F.-K. (2021). How privacy concerns and cultural differences affect public opinion during the COVID-19 pandemic: A case study. Aslib Journal of Information Management, 73(4), 517–542. https://doi.org/10.1108/AJIM-07-2020-0216
Yan, C., Law, M., Nguyen, S., Cheung, J., & Kong, J. (2021). Comparing public sentiment toward COVID-19 vaccines across Canadian cities: analysis of comments on Reddit. Journal of Medical Internet Research, 23(9), e32685. https://doi.org/10.2196/32685
Zamani, M., Schwartz, H. A., Eichstaedt, J., Guntuku, S. C., Ganesan, A. V., Clouston, S., & Giorgi, S. (2020). Understanding weekly COVID-19 concerns through dynamic content-specific LDA topic modeling. In Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science (pp. 193–198). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.nlpcss-1.21
Downloads
Published
Issue
Section
License
Copyright (c) 2022 Temitope Kekere, Marie Hattingh, Vukosi Marivate
This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
- Abstract 733
- PDF 223