Machine Learning Based Sentiment Analysis of Distributed Customer Product Reviews Data on Amazon


  • Varsha V, Akram Pasha


Recognizing and Analysis of textual data  generated from various social media platforms  has  become  one of the most essential requirements in today’s Big Data era. The result of such analysis helps in many crucial businesses to gain clear insights about their business models and to eventually take crucial business-oriented decisions to improve their businesses. In this paper, an attempt is made to perform sentiment analysis on the distributed computing framework using the many Machine Learning (ML) models and Hadoop-based Spark programming model. The existing approaches towards sentiment analysis are limited to only a few brands and their products. Therefore, to integrate the learning abilities with distributed computing models on large textual data, we developed the recommendation framework that recommends the product to users according to user’s feature requirements collected as the huge textual  data.  The study implemented the Gaussian Naive Bayes (GNB) and Random Forest (RF) on the Spark Big Data analytics platform to process huge textual data. The experimental results have shown that the two algorithms produce superior efficiency over other methods while processing big sentiment datasets.

 Keywords: Big Data; Sentiment Analysis; Machine learning; Apache Spark; ML Pipeline; GNB; RF