Roboreviews Project: A Case Study

A deep dive into sentiment analysis, zero-shot classification, and statistical validation using Amazon product reviews.

The Journey In Words

When I started this project, I faced the challenge of analyzing thousands of Amazon product reviews to understand customer sentiment. I needed a solution that could accurately classify reviews as positive, neutral, or negative while handling the nuances of customer feedback. Here's how I tackled it.

A flowchart diagram illustrating the Roboreviews project pipeline, from data sourcing to final insights.

Starting with Raw Data

I began with three distinct Amazon review datasets, which were then merged and cleaned.

PART 1: Data Pre-processing

I began by cleaning and combining these datasets, standardizing columns, and removing duplicates to create a reliable master dataset.


# Load datasets safely
file1_data = safe_read_csv('1429_1.csv')
# ... (rest of pre-processing code) ...

# Save cleaned dataset
combined_data.to_csv('combined_reviews_cleaned.csv', index=False)
print("Dataset successfully cleaned and saved.")
                

After further cleanup of missing values and removing unnecessary columns, I had a reliable dataset of 7,253 rows, ready for analysis.

PART 2: Sentiment Analysis with RoBERTa

To accurately analyze sentiment, I chose RoBERTa for its strong performance. I mapped the 5-star ratings to sentiment categories (Positive, Neutral, Negative) and addressed class imbalance by oversampling the minority classes.

Results & Key Achievements

The model's performance exceeded expectations, achieving 98.6% overall accuracy. The F1-scores were balanced across all classes, showing efficient and effective training.

Classification Report:
               precision    recall  f1-score   support
    Negative       0.99      1.00      0.99      1307
     Neutral       0.97      1.00      0.98      1358
    Positive       1.00      0.96      0.98      1330
                

PART 3: Zero-Shot Classification with DeBERTa-V3

Next, to categorize products into market segments without pre-labeled data, I used zero-shot classification with the DeBERTa-v3-large model. I defined six target categories, including "Entertainment and Immersion" and "Creative and Productivity Tools."

PART 4: Statistical Validation

To ensure our findings were not due to random chance, I applied Chi-Square testing. This validated that certain words were truly characteristic of specific product categories, with over 10,744 terms showing statistical significance (p < 0.05).

A chart showing the results of the statistical analysis, highlighting key terms and their chi-square values.

Most Significant Terms (p < 0.001)

  • Creative & Productivity Tools: "tablet" (χ² = 89,481), "great" (χ² = 88,951)
  • Health & Wellness: "kindle" (χ² = 749), "duracell" (χ² = 683)

These results provide clear insights. For example, the strong association of "tablet" with "Creative Tools" suggests marketing strategies should center on this device. The surprising appearance of "kindle" in "Health & Wellness" suggests customers associate reading with their wellness routines—a potential marketing angle.

Conclusion & Technical Notes

This project successfully demonstrated how to turn raw customer feedback into actionable business intelligence, combining advanced NLP models with robust statistical validation. The high accuracy and clear category differentiation provide a confident foundation for strategic recommendations.

  • Implementation: Python with Transformers, Pandas, Scikit-learn
  • Models: RoBERTa & DeBERTa-v3-large
  • Hardware: GPU-enabled environment (Google Colab)
  • Overall Accuracy: 98.6%

Adeteju Enunwa is an Engineering Program Lead who leverages emerging technologies to architect human-centric solutions and products, all built on a foundation of trust and responsible development.