Welcome to the Linux Foundation Forum!
Unbalanced Sentiment Analysis Classes
vikash11
Posts: 18
Hello,
I’m presently working on a Sentiment Analysis project that uses a movie reviews dataset similar to this one, and I’m having trouble with class imbalance. The dataset comprises both good and negative evaluations, however the positive ratings outnumber the negative ones by a large margin. Here’s an example of my code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
# Load the movie reviews dataset
data = pd.read_csv('movie_reviews.csv')
# Preprocess the data
# ... (code for data preprocessing)
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data['review'], data['sentiment'], test_size=0.2, random_state=42)
# Vectorize the text data using CountVectorizer
vectorizer = CountVectorizer()
X_train_vectorized = vectorizer.fit_transform(X_train)
X_test_vectorized = vectorizer.transform(X_test)
# Train the Naive Bayes classifier
model = MultinomialNB()
model.fit(X_train_vectorized, y_train)
# Evaluate the model
accuracy = model.score(X_test_vectorized, y_test)
print(f"Accuracy: {accuracy}")
The code’s accuracy is approximately 90%, however I assume it is biassed towards the majority class (positive reviews) owing to the class imbalance issue. I want to make sure that my model responds well to both positive and negative feedback. How can I handle the issue of class imbalance while also improving the overall effectiveness of my sentiment analysis model?
Thank you for your assistance!
0
Categories
- All Categories
- 175 LFX Mentorship
- 175 LFX Mentorship: Linux Kernel
- 745 Linux Foundation IT Professional Programs
- 372 Cloud Engineer IT Professional Program
- 168 Advanced Cloud Engineer IT Professional Program
- 73 DevOps IT Professional Program - Discontinued
- 3 DevOps & GitOps IT Professional Program
- 98 Cloud Native Developer IT Professional Program
- 7.6K Training Courses & Learning Paths
- AI & ML Training
- Blockchain & Decentralized Identity Training
- 1 Cloud & Containers Training
- Cybersecurity Training
- DevOps & Site-Reliability Training
- Linux Kernel Development Training
- Networking Training
- Open Source Best Practice Training
- System Administration Training
- System Engineering Training
- Web & Application Development Training
- 2 LFD103-JP クラス フォーラム
- 4 LFD210-CN Class Forum
- 764 LFD259 Class Forum
- 681 LFS101 Class Forum
- 2 LFS158-JP クラス フォーラム
- 162 LFS207 Class Forum
- 3 LFS207-DE-Klassenforum
- 4 LFS207-JP クラス フォーラム
- 61 LFS241 Class Forum
- 52 LFS242 Class Forum
- 42 LFS243 Class Forum
- 19 LFS244 Class Forum
- 4 LFS250-JP クラス フォーラム
- 166 LFS253 Class Forum
- 1.4K LFS258 Class Forum
- 792 Hardware
- 202 Drivers
- 68 I/O Devices
- 37 Monitors
- 95 Multimedia
- 173 Networking
- 91 Printers & Scanners
- 87 Storage
- 768 Linux Distributions
- 81 Debian
- 67 Fedora
- 22 Linux Mint
- 13 Mageia
- 24 openSUSE
- 150 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 356 Ubuntu
- 465 Linux System Administration
- 31 Cloud Computing
- 73 Command Line/Scripting
- Github systems admin projects
- 98 Linux Security
- 78 Network Management
- 101 System Management
- 46 Web Management
- 106 Mobile Computing
- 18 Android
- 73 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 392 Off Topic
- 121 Introductions
- 181 Small Talk
- 29 Study Material
- 946 Programming and Development
- 310 Kernel Development
- 618 Software Development
- 979 Software
- 371 Applications
- 182 Command Line
- 5 Compiling/Installing
- 68 Games
- 317 Installation
- Archived
- 2 LFD140 Class Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)