Saturday, April 17, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Neural Networks

Sentiment analysis of Amazon product reviews | by Sameer Bairwa | Sep, 2020

November 1, 2020
in Neural Networks
Sentiment analysis of Amazon product reviews | by Sameer Bairwa | Sep, 2020
585
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Hey Folks, In this article I walk you through sentiment analysis of Amazon Electronics product reviews.

Contents:

  • What is sentiment analysis?
  • Download dataset
  • Analyze the dataset
  • Data pre-processing
  • Apply different machine learning algorithms
  • Training the dataset
  • Prediction
  • Confusion matrix
  • Plot ROC Curve
Big Data Jobs

Sentiment analysis is the process of using natural language processing, text analysis, and statistics to analyze customer sentiment. The best businesses understand the sentiment of their customers — what people are saying, how they’re saying it, and what they mean. Customer sentiment can be found in tweets, comments, reviews, or other places where people mention your brand. Sentiment Analysis is the domain of understanding these emotions with software, and it’s a must-understand for developers and business leaders in a modern workplace.

You might also like

AI and Human Rights, A Story About Equality | by bundleIQ | Mar, 2021

The “Blue Brain” Project-A mission to build a simulated Brain | by The A.I. Thing | Mar, 2021

Templates Vs Machine Learning OCR | by Infrrd | Mar, 2021

As with many other fields, advances in deep learning have brought sentiment analysis into the foreground of cutting-edge algorithms. Today we use natural language processing, statistics, and text analysis to extract, and identify the sentiment of words into positive, negative, or neutral categories.

Before we move forward let’s download the dataset that we use in this project.

You can download the dataset from here: http://deepyeti.ucsd.edu/jianmo/amazon/categoryFilesSmall/Electronics_5.json.gz
The download size of the dataset is 1.2GB.

The dataset is zipped so you need to unzip the dataset in your system (computer). Now the Size of the dataset around 2.5GB.

It may be possible that this dataset would not open in your Microsoft Excel.

If you still want to open you can use Delimit software for it, Here Download Link: http://delimitware.com/download.html

Delimit: Handle large delimited data files with ease.

Let’s analyze the dataset

The dataset contains these columns

reviewerID — ID of the reviewer, e.g. A2SUAM1J3GNN3B
asin — ID of the product, e.g. 0000013714
reviewerName — name of the reviewer
vote — helpful votes of the review
style — a disctionary of the product metadata, e.g., “Format” is “Hardcover”
reviewText — text of the review
overall — rating of the product
summary — summary of the review
unixReviewTime — time of the review (unix time)
reviewTime — time of the review (raw)
image — images that users post after they have received the product

The dataset has lots of features but
For sentiment analysis, we need review and rating.

1. Fundamentals of AI, ML and Deep Learning for Product Managers

2. The Unfortunate Power of Deep Learning

3. Graph Neural Network for 3D Object Detection in a Point Cloud

4. Know the biggest Notable difference between AI vs. Machine Learning

#reading the json file in a list
values=[]
with open("Electronics_5.json","r") as f:
for i in f:
values.append(json.loads(i))
print(values[:5])

now we create a dataset that has id, review, and rating of product for sentiment analysis.

we saved our filtered dataset in Electronic_review.csv file.

now we read our Electronic_review data into a data frame.

#read the dataset into a df
colnames = ["id","text","overall"]
df= pd.read_csv("Electronic_review.csv",names= colnames,header = None)

The division of sentiment, on the basis of vote value, is as follows

  • 0 < Vote < 3 => Negative sentiment (-1)
  • Vote = 3 => Neutral Sentiment (0)
  • 3 < Vote <= 5 => Positive Sentiment (1)

Let’s save this data frame as processedData.csv.

newdf.to_csv("processedData.csv",chunksize=100000)

Let’s see how our processed data look like.

df = pd.read_csv("processedData.csv",nrows = 100000)
print(df.head(5))

preprocess the text data samples

Steps for preprocessing

  • Filter out numbers
  • Stemming and lemmatization
  • Remove stopwords
  • Other root level token changes

let’s import some important libraries

from nltk.tokenize import word_tokenize
from nltk import pos_tag
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from sklearn.preprocessing import LabelEncoder
from nltk.corpus import wordnet as wn
import nltk
nltk.download("stopwords")
import re
nltk.download("punkt")

now read the processedDatat.csv

df= pd.read_csv(“processedData.csv”)

  • Stemming algorithms work by cutting off the end of the beginning of the word, taking into account a list of common prefixes and suffixes that can be found in an inflected word. This indiscriminate cutting can be successful on some occasions, but not always, and that is why we affirm that this approach presents some limitations. Below we illustrate the method with examples in both English and Spanish.
  • developing a stemmer is far simpler than building a lemmatizer. In the latter, deep linguistics knowledge is required to create the dictionaries that allow the algorithm to look for the proper form of the word. Once this is done, the noise will be reduced and the results provided on the information retrieval process will be more accurate.
df.loc[count:count+batch-1,'reviewText_final'] = fin
lat_df = df[:100000]
lat_df.to_csv("CurrentUsedFile.csv")

We saved the first 100000 rows of data as CurrentUsedFile.csv so that we can easily process the data.

Split the dataset into train and test set

#importing the new dataset
lat_df = pd.read_csv("CurrentUsedFile.csv")
print(lat_df.head(5))
#create x and y => x:textreview , y:sentiment
Train_X, Test_X, Train_Y, Test_Y = model_selection.train_test_split(lat_df['reviewText_final'],lat_df['Sentiment'],test_size=0.2,random_state = 42)
print(Train_X.shape,Train_Y.shape)
print(Test_X.shape,Test_Y.shape)
# Vectorize the words by using TF-IDF Vectorizer - This is done to find how important a word in document is in comaprison to the dffrom sklearn.feature_extraction.text import TfidfVectorizerTfidf_vect = TfidfVectorizer(max_features=500000)               #tweak features based on the dataset
Tfidf_vect.fit(lat_df['reviewText_final'])
Train_X_Tfidf = Tfidf_vect.transform(Train_X)
Test_X_Tfidf = Tfidf_vect.transform(Test_X)

Before going to head let’s create a model evaluation Function.

def modelEvaluation(predictions, y_test_set):
#Print model evaluation to predicted result

print ("nAccuracy on validation set: {:.4f}".format(accuracy_score(y_test_set, predictions)))

print ("nClassification report : n", metrics.classification_report(y_test_set, predictions))
print ("nConfusion Matrix : n", metrics.confusion_matrix(y_test_set, predictions))
# Classifier - Algorithm - Naive Bayes
# fit the training dataset on the classifier
import time
second=time.time()
Naive = naive_bayes.MultinomialNB()
historyNB = Naive.fit(Train_X_Tfidf,Train_Y)
# predict the labels on validation dataset
predictions_NB = Naive.predict(Test_X_Tfidf)
modelEvaluation(predictions_NB, Test_Y)
from sklearn.metrics import precision_recall_fscore_supporta,b,c,d = precision_recall_fscore_support(Test_Y, predictions_NB, average='macro')# Use accuracy_score function to get the accuracy
print("Naive Bayes Accuracy Score -> ",accuracy_score(predictions_NB, Test_Y)*100)
print("Precision is: ",a)
print("Recall is: ",b)
print("F-1 Score is: ",c)

Now let’s plot the ROC curve for Naive Bayes

asvm,bsvm,csvm,dsvm = precision_recall_fscore_support(Test_Y, predictions_SVM, average='macro')
# Use accuracy_score function to get the accuracy
print("SVM Accuracy Score -> ",accuracy_score(predictions_SVM, Test_Y)*100)
print("Precision is: ",asvm)
print("Recall is: ",bsvm)
third=time.time()
decTree = DecisionTreeClassifier()
decTree.fit(Train_X_Tfidf, Train_Y)
y_decTree_predicted = decTree.predict(Test_X_Tfidf)
modelEvaluation(y_decTree_predicted, Test_Y)

That’s all about sentiment analysis using machine learning.
In the next article, we apply deep-learning techniques on the dataset.

Previous articles:

Credit: BecomingHuman By: Sameer Bairwa

Previous Post

Chrome will soon have its own dedicated certificate root store

Next Post

Machine Learning in Education Market 2020-2029 COVID-19 Edition Report

Related Posts

AI and Human Rights, A Story About Equality | by bundleIQ | Mar, 2021
Neural Networks

AI and Human Rights, A Story About Equality | by bundleIQ | Mar, 2021

April 17, 2021
The “Blue Brain” Project-A mission to build a simulated Brain | by The A.I. Thing | Mar, 2021
Neural Networks

The “Blue Brain” Project-A mission to build a simulated Brain | by The A.I. Thing | Mar, 2021

April 17, 2021
Templates Vs Machine Learning OCR | by Infrrd | Mar, 2021
Neural Networks

Templates Vs Machine Learning OCR | by Infrrd | Mar, 2021

April 16, 2021
Artificial Intelligence in Radiology — Advantages, Use Cases & Trends | by ITRex Group | Apr, 2021
Neural Networks

Artificial Intelligence in Radiology — Advantages, Use Cases & Trends | by ITRex Group | Apr, 2021

April 16, 2021
A simple explanation of Machine Learning and Neural Networks and A New Perspective for ML Experts | by Akhilesh Ravi | Apr, 2021
Neural Networks

A simple explanation of Machine Learning and Neural Networks and A New Perspective for ML Experts | by Akhilesh Ravi | Apr, 2021

April 15, 2021
Next Post
Machine Learning in Education Market 2020-2029 COVID-19 Edition Report

Machine Learning in Education Market 2020-2029 COVID-19 Edition Report

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Security crucial as 5G connects more industries, devices
Internet Security

Security crucial as 5G connects more industries, devices

April 17, 2021
Relay Therapeutics pays $85M for startup with a new AI tech for drug discovery
Machine Learning

Relay Therapeutics pays $85M for startup with a new AI tech for drug discovery

April 17, 2021
Google releases Chrome 90 with HTTPS by default and security fixes
Internet Security

Google releases Chrome 90 with HTTPS by default and security fixes

April 17, 2021
ML Scaling Requires Upgraded Data Management Plan
Machine Learning

ML Scaling Requires Upgraded Data Management Plan

April 17, 2021
SolarWinds cybersecurity spending tops $3 million in Q4, sees $20 million to $25 million in 2021
Internet Security

SolarWinds: US and UK blame Russian intelligence service hackers for major cyberattack

April 17, 2021
Machine learning can be your best bet to transform your career
Machine Learning

Machine learning can be your best bet to transform your career

April 17, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Security crucial as 5G connects more industries, devices April 17, 2021
  • Relay Therapeutics pays $85M for startup with a new AI tech for drug discovery April 17, 2021
  • Google releases Chrome 90 with HTTPS by default and security fixes April 17, 2021
  • ML Scaling Requires Upgraded Data Management Plan April 17, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates