“Sentiment Analysis” in Google Colab using ChatGPT

Santhosh Gandhi

8 min readFeb 22, 2023

What is Sentiment Analysis?

Sometimes people use different words or have different body language to show how they’re feeling. Sentiment Analysis is kind of like a computer’s way of understanding how people are feeling when they write something down, like in an email or on social media. The computer looks at the words and tries to figure out if the written word is Positive(Happy), Negative(Sad), or somewhere in between.

This can be useful in various applications, such as understanding customer feedback, monitoring public opinions on social media, or analyzing product reviews, Especially If you have a large number of data sets.

What are the Shortcomings of Sentiment Analysis?

One of the main challenges is that it can be difficult for computers to accurately identify the sentiment of text, particularly when it comes to language that is full of nuance, sarcasm, or irony. This can lead to errors in the analysis and potentially misleading results. Additionally, the quality of the sentiment analysis model is crucial, as a poorly trained model can produce unreliable results. Therefore, it’s important to use high-quality models that have been trained on diverse and representative datasets and to carefully evaluate the results to ensure that they are accurate and reliable.

How to overcome the shortcomings?

To overcome the shortcomings of sentiment analysis, triangulation with qualitative research can be used. Triangulation is the process of using multiple methods to verify and validate research findings. In the case of sentiment analysis, this can involve using the results of automated sentiment analysis tools as a starting point, and then conducting qualitative research (such as interviews, surveys, or focus groups) to gather more in-depth insights about people’s opinions and experiences. By combining both quantitative and qualitative data, researchers can gain a more comprehensive understanding of sentiment and avoid potential biases or limitations in the automated sentiment analysis tools. Ultimately, this approach can lead to more accurate and reliable insights that can inform decision-making and improve user experiences.

What is Google Collab?

Google Colab is an online platform provided by Google that allows you to write and run computer code in your web browser, without needing to install any software on your computer. It’s like having a virtual computer where you can write and run programs or analyses, and it’s accessible from anywhere with an internet connection. You can use Colab to work on projects, collaborate with others, or learn new programming skills.

Google Colab is used by a variety of people, including students, researchers, data scientists, and developers who work with machine learning and data analysis. It is also used by businesses and organizations for collaboration and sharing of data science projects. Since it is a free and accessible platform, anyone with an internet connection can use it for their machine learning projects.

How did I do “Sentiment Analysis” in Google Colab using ChatGPT?

I used ChatGPT to write all the codes you going to see. The codes generated by ChatGPT do not necessarily need to be right always. You need to have a basic understanding of programming knowledge & patience to iterate code until it gives the right results.

You can start with a prompt like in ChatGPT “Can you please provide a step-by-step guide for performing sentiment analysis using the VADER lexicon and NLTK library in Google Collab?”

Let's start doing Sentiment Analysis in Google Collab

In this blog post, I will be using the Women’s E-Commerce Clothing Reviews dataset from Kaggle in the Google Collaboratory.

Upload the CSV file

Copy the command from google.colab import files uploaded = files.upload() and paste it into a new code cell in Google Colab. Click on the “play” button to run the code cell. A “Choose Files” button will appear. Click on this button to select the CSV file you want to upload. Once you’ve selected the file, click on “Open” to upload it to Google Colab.

2. Once the file has finished uploading, you can use the pd.read_csv() command to read the file into a pandas data frame.

For example, if the CSV file is named "data.csv", you can use the following command to read it into a data frame. In this use case, the CSV file name is Women Clothing E-Commerce Reviews.

3. Next, we can use the df.head(10) command to display the first 10 rows of the data frame.

This will give us a glimpse of the data and help us understand the structure of the dataset. The output will show us the column names and the first few rows of the dataset.

4. I want to do sentiment analysis on the “Review Text Column”. First I will import the necessary libraries to perform sentiment analysis.

In Python, a library is a collection of code that someone else has already written to perform a specific task. By importing a library, we can reuse that code and save time and effort instead of writing it from scratch.

The libraries we are importing here are:

numpy: This library provides support for large, multi-dimensional arrays and matrices. It is used for numerical computing and data analysis.
nltk: This library is used for natural language processing (NLP). It provides a variety of tools and techniques to analyze human language.
VADER lexicon: This is a pre-trained lexicon (a dictionary of words and their sentiment scores) that is used for sentiment analysis. It is provided by the nltk library.

5. In the next step, we will prepare the ‘Review Text’ column of our dataset for sentiment analysis and calculate sentiment scores using the below code.

We will first check if there are any missing values in the ‘Review Text’ column and replace them with empty strings. This is important because missing values can cause errors while analyzing the sentiment of the text.

After replacing the missing values, we will create a sentiment analyzer object and use it to iterate over each review in the ‘Review Text’ column. The sentiment analyzer will calculate the sentiment scores for each review. We will then store these scores in a list called ‘sentiment_scores’.

Finally, we will add the ‘sentiment_scores’ list as a new column in our DataFrame. This will allow us to analyze the sentiment of each review easily.

6. After calculating the sentiment scores for each review in the ‘Review Text’ column, we define a function called “get_sentiment_label” to map the sentiment scores to three sentiment labels: “Positive”, “Negative”, and “Neutral” using the below code.

We use the compound score to determine the sentiment label. If the compound score is greater than or equal to 0.05, we classify it as “Positive”. If the compound score is less than or equal to -0.05, we classify it as “Negative”. Otherwise, we classify it as “Neutral”.

We then apply the “get_sentiment_label” function to the sentiment scores to get the corresponding sentiment labels for each review. Finally, we add the sentiment labels as a new column named “Sentiment Label” to the DataFrame. This allows us to easily analyze the sentiment of each review and gain insights into customer opinions about the product.

7. The code df.head(10) will display the first 10 rows of the DataFrame df with newly added columns 'Sentiment Score' and 'Sentiment Label'

And we verify that the sentiment analysis has been performed correctly. By using the head() method, we can quickly check a small sample of the data to make sure everything looks as expected before moving on to further analysis.

Now we successfully completed the sentiment analysis on the data we had. Let's move to “How we can visualize the data?”

We can also export the data using this code

from google.colab import files
files.download(‘filename.csv’)

Data Visualization

Now we are creating a pie chart to visualize the distribution of sentiment labels in the dataset with the below code.

This visualization helps us to understand the distribution of sentiment labels in the Review text dataset.

2. Let's find How Sentiment is distributed for each rating with the below code.

Now you can able to see, How sentiment is distributed for each rating for rating 1, rating2, etc.

2. we can also plot the sentiment label against each review rating. This allows us to see if there is any correlation between sentiment and rating scores. We can use the below code to create this plot.

Now you will able to see the results,

In conclusion, sentiment analysis can provide valuable insights into customer reviews and feedback. By using tools like the VADER lexicon and Python libraries like pandas and matplotlib, we can perform sentiment analysis on text data and visualize the sentiment distribution.

When working with datasets, it’s important to first clean and prepare the data for analysis. This involves handling missing values and converting the data into a format that can be easily analyzed. It’s also important to choose the right analysis techniques and visualizations to effectively communicate the insights.

As a UX Researcher, it is important to consider not only the overall sentiment distribution but also how sentiment varies across different customer segments, such as users with different demographics or usage patterns.

it is important to take a holistic approach to data analysis, combining sentiment analysis with other methods such as usability testing and surveys to gain a more comprehensive understanding of the user experience.

Feel Free to connect with me on Linkedin! https://www.linkedin.com/in/isanthoshgandhi

Special Mention to My dear Friend Varun Anabarasu, Who introduced me to Google Collab, Jupyter Notebook and its basics.

Sources

https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews?resource=download