Sentimental analysis of COVID-19 twitter data using deep learning and machine learning models

Main Article Content

Simran Darad https://orcid.org/0000-0003-4629-3980
Sridhar Krishnan https://orcid.org/0000-0002-4659-564X

Abstract

The novel coronavirus disease (COVID-19) is an ongoing pandemic with large global attention. However, spreading fake news on social media sites like Twitter is creating unnecessary anxiety and panic among people towards this disease. In this paper, we applied machine learning (ML) techniques to predict the sentiment of the people using social media such as Twitter during the COVID-19 peak in April 2021. The data contains tweets collected on the dates between 16 April 2021 and 26 April 2021 where the text of the tweets has been labelled by training the models with an already labelled dataset of corona virus tweets as positive, negative, and neutral. Sentiment analysis was conducted by a deep learning model known as Bidirectional Encoder Representations from Transformers (BERT) and various ML models for text analysis and performance which were then compared among each other. ML models used were Naïve Bayes, Logistic Regression, Random Forest, Support Vector Machines, Stochastic Gradient Descent and Extreme Gradient Boosting. Accuracy for every sentiment was separately calculated. The classification accuracies of all the ML models produced were 66.4%, 77.7%, 74.5%, 74.7%, 78.6%, and 75.5\%, respectively and BERT model produced 84.2%. Each sentiment-classified model has accuracy around or above 75%, which is a quite significant value in text mining algorithms. We could infer that most people tweeting are taking positive and neutral approaches.