The Journey In Words
When I started this project, I faced the challenge of analyzing thousands of Amazon product reviews to understand customer sentiment. I needed a solution that could accurately classify reviews as positive, neutral, or negative while handling the nuances of customer feedback. Here's how I tackled it.

Starting with Raw Data
I began with three distinct Amazon review datasets, which were then merged and cleaned.
PART 1: Data Pre-processing
I began by cleaning and combining these datasets, standardizing columns, and removing duplicates to create a reliable master dataset.
# Load datasets safely
file1_data = safe_read_csv('1429_1.csv')
# ... (rest of pre-processing code) ...
# Save cleaned dataset
combined_data.to_csv('combined_reviews_cleaned.csv', index=False)
print("Dataset successfully cleaned and saved.")
After further cleanup of missing values and removing unnecessary columns, I had a reliable dataset of 7,253 rows, ready for analysis.
PART 2: Sentiment Analysis with RoBERTa
To accurately analyze sentiment, I chose RoBERTa for its strong performance. I mapped the 5-star ratings to sentiment categories (Positive, Neutral, Negative) and addressed class imbalance by oversampling the minority classes.
Results & Key Achievements
The model's performance exceeded expectations, achieving 98.6% overall accuracy. The F1-scores were balanced across all classes, showing efficient and effective training.
Classification Report: precision recall f1-score support Negative 0.99 1.00 0.99 1307 Neutral 0.97 1.00 0.98 1358 Positive 1.00 0.96 0.98 1330
PART 3: Zero-Shot Classification with DeBERTa-V3
Next, to categorize products into market segments without pre-labeled data, I used zero-shot classification with the DeBERTa-v3-large model. I defined six target categories, including "Entertainment and Immersion" and "Creative and Productivity Tools."
PART 4: Statistical Validation
To ensure our findings were not due to random chance, I applied Chi-Square testing. This validated that certain words were truly characteristic of specific product categories, with over 10,744 terms showing statistical significance (p < 0.05).

Most Significant Terms (p < 0.001)
- Creative & Productivity Tools: "tablet" (χ² = 89,481), "great" (χ² = 88,951)
- Health & Wellness: "kindle" (χ² = 749), "duracell" (χ² = 683)
These results provide clear insights. For example, the strong association of "tablet" with "Creative Tools" suggests marketing strategies should center on this device. The surprising appearance of "kindle" in "Health & Wellness" suggests customers associate reading with their wellness routines—a potential marketing angle.
Conclusion & Technical Notes
This project successfully demonstrated how to turn raw customer feedback into actionable business intelligence, combining advanced NLP models with robust statistical validation. The high accuracy and clear category differentiation provide a confident foundation for strategic recommendations.
- Implementation: Python with Transformers, Pandas, Scikit-learn
- Models: RoBERTa & DeBERTa-v3-large
- Hardware: GPU-enabled environment (Google Colab)
- Overall Accuracy: 98.6%