Friday, March 5, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Neural Networks

COVID-19 Fake News Detection using Naïve Bayes Classifier | by Muhammad Ardi | Aug, 2020

August 14, 2020
in Neural Networks
COVID-19 Fake News Detection using Naïve Bayes Classifier | by Muhammad Ardi | Aug, 2020
590
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

We know that any machine learning or deep learning algorithms can not directly work with words. Thus, it’s obviously necessary to convert all texts in title_text_source into numbers. In this project, I am going to use count vectorizer as the approach to do it. The concept of count vectorizer itself is pretty trivial, since we only need to count the occurrence of each word for every single text in order to create a feature vector of that. If you still don’t get the idea of count vectorizer I recommend you to read this simple explanation.

The implementation is very simple thanks to the existence of Scikit-learn module.

You might also like

Labeling Case Study — Agriculture— Pigs’ Productivity, Behavior, and Welfare Image Labeling | by ByteBridge | Feb, 2021

8 concepts you must know in the field of Artificial Intelligence | by Diana Diaz Castro | Feb, 2021

The Examples and Benefits of AI in Healthcare: From accurate diagnosis to remote patient monitoring | by ITRex Group | Mar, 2021

vectorizer = CountVectorizer()X = vectorizer.fit_transform(df['title_text_source'].values)
X = X.toarray()

Look at the code above. The first thing we do is to initialize a count vectorizer object which I call it as vectorizer. Then in the next two lines I use this vectorizer to convert all values of title_text_source column (which is still in form of text) into array of word occurrences. Now if we try to print out this X variable, we will get the following output:

array([[0, 0, 0, ..., 0, , 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int64)

The shape of the array above is (1159, 21117) which represents the number of samples and the feature vector size of each sample respectively.

Up to this point we already got feature vectors for all samples stored in X variable. To make things more intuitive, I will also define y variable, which I will use it to store all ground truths (a.k.a. labels).

y = df['label'].values

Now we can use y as the replacement of df[‘label’].values

Before training a classifier, we are going to split the data into train and test, where 20% of the entire samples in the dataset are going to be used to test the overall performance of the model. This splitting can easily be done using train_test_split function taken from Sklearn module:

X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle=True, test_size=0.2, random_state=11)

After running the code above, we got 4 new variables which I guess all of those are self-explanatory.

Now as the data already split it’s time to define a model which in this project I will be using Naïve Bayes. Mathematically speaking, this algorithm works by calculating the class (label) prediction probability based on given features (text) of each sample using Bayes’ theorem. If you wanna understand better about the mathematical concept of this algorithm you can open up this page. In my opinion that’s the best site that explains Naïve Bayes in depth.

Bayes’ theorem.

In addition, there are several types of Naïve Bayes algorithm, those are Gaussian, Multinomial and Bernoulli. In this project I will be using Multinomial Naïve Bayes since it’s the best one to be implemented in this text classification task due to its ability to maintain the number of word occurrences in each document. Fortunately, Sklearn provides an easy-to-implement object called MultinomialNB(), so that we don’t have to code the algorithm from scratch.

The code below shows how I train a Multinomial Naïve Bayes classifier on train data:

clf = MultinomialNB()
clf.fit(X_train, y_train)

Next, we can try to calculate the accuracy score of the classifier using score() method.

print(clf.score(X_train, y_train))
print(clf.score(X_test, y_test))

The output of the code above shows that our model is pretty good! We can see here that the model is slightly overfitting, but I guess it’s still a good one.

Accuracy on train data	: 0.9633225458468176
Accuracy on test data : 0.9353448275862069

After training a model, I usually also create a confusion matrix in order to find out the number of misclassified samples in more detail. In order to do so, I need to predict the class of test data first:

predictions = clf.predict(X_test)

Next, we can just compare the values of predictions variable with its ground truth y_test using confusion_matrix() function coming from Sklearn module.

cm = confusion_matrix(y_test, predictions)

Since the return value of confusion_matrix() is essentially a square array, then we can just plot that array using heatmap() function which can be taken from Seaborn module.

plt.figure(figsize=(6,6))
sns.heatmap(cm, annot=True, fmt='d', xticklabels=['FAKE', 'TRUE'], yticklabels=['FAKE', 'TRUE'], cmap=plt.cm.Blues, cbar=False)
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

We will see the following output after running the code above.

Confusion matrix of our Naïve Bayes model on COVID-19 news dataset.

Now what if we got a new news and we wanna find out whether its news is a fake one? In this part I would like to demonstrate how to perform prediction on new news data. Here I store the text in sentence variable.

sentence = 'The Corona virus is a man made virus created in a Wuhan laboratory. Doesn’t @BillGates finance research at the Wuhan lab?'sentence = clean(sentence)vectorized_sentence = vectorizer.transform([sentence]).toarray()clf.predict(vectorized_sentence)

We can see the code above that in order to predict new data, we first have to clean the sentence using clean() function I defined in the earlier part of this article. Next, the cleaned sentence is transformed to array of numbers using our vectorizer object, in which in this case it is a CountVectorizer(). Lastly, as the sentence has been converted into vectors, then we are able to predict its class, and in this case the final output is like this:

array(['FAKE'], dtype='<U4')

According to the output above, it shows that the sentence is considered as a fake news by our Naïve Bayes model.


Credit: BecomingHuman By: Muhammad Ardi

Previous Post

6 Content Assets to Help Sales Teams Sell More

Next Post

Argonne scientists use machine learning to predict defects in 3D printed parts

Related Posts

Labeling Case Study — Agriculture— Pigs’ Productivity, Behavior, and Welfare Image Labeling | by ByteBridge | Feb, 2021
Neural Networks

Labeling Case Study — Agriculture— Pigs’ Productivity, Behavior, and Welfare Image Labeling | by ByteBridge | Feb, 2021

March 5, 2021
8 concepts you must know in the field of Artificial Intelligence | by Diana Diaz Castro | Feb, 2021
Neural Networks

8 concepts you must know in the field of Artificial Intelligence | by Diana Diaz Castro | Feb, 2021

March 5, 2021
The Examples and Benefits of AI in Healthcare: From accurate diagnosis to remote patient monitoring | by ITRex Group | Mar, 2021
Neural Networks

The Examples and Benefits of AI in Healthcare: From accurate diagnosis to remote patient monitoring | by ITRex Group | Mar, 2021

March 4, 2021
3 Types of Image Segmentation. If you are getting started with Machine… | by Doga Ozgon | Feb, 2021
Neural Networks

3 Types of Image Segmentation. If you are getting started with Machine… | by Doga Ozgon | Feb, 2021

March 4, 2021
The Role Of Artificial Intelligence In The Fight Against COVID | by B-cube.ai | Feb, 2021
Neural Networks

The Role Of Artificial Intelligence In The Fight Against COVID | by B-cube.ai | Feb, 2021

March 4, 2021
Next Post
Argonne scientists use machine learning to predict defects in 3D printed parts

Argonne scientists use machine learning to predict defects in 3D printed parts

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Convergence of AI, 5G and Augmented Reality Poses New Security Risks 
Artificial Intelligence

Convergence of AI, 5G and Augmented Reality Poses New Security Risks 

March 5, 2021
2021 Gartner Magic Quadrant for Data Science and Machine Learning Platforms
Machine Learning

2021 Gartner Magic Quadrant for Data Science and Machine Learning Platforms

March 5, 2021
With its acquisition of Auth0, Okta goes all in on CIAM
Internet Security

With its acquisition of Auth0, Okta goes all in on CIAM

March 5, 2021
Survey Finds Many Companies Do Little or No Management of Cloud Spending  
Artificial Intelligence

Survey Finds Many Companies Do Little or No Management of Cloud Spending  

March 5, 2021
UVA doctors give us a glimpse into the future of artificial intelligence
Machine Learning

UVA doctors give us a glimpse into the future of artificial intelligence

March 5, 2021
Labeling Case Study — Agriculture— Pigs’ Productivity, Behavior, and Welfare Image Labeling | by ByteBridge | Feb, 2021
Neural Networks

Labeling Case Study — Agriculture— Pigs’ Productivity, Behavior, and Welfare Image Labeling | by ByteBridge | Feb, 2021

March 5, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Convergence of AI, 5G and Augmented Reality Poses New Security Risks  March 5, 2021
  • 2021 Gartner Magic Quadrant for Data Science and Machine Learning Platforms March 5, 2021
  • With its acquisition of Auth0, Okta goes all in on CIAM March 5, 2021
  • Survey Finds Many Companies Do Little or No Management of Cloud Spending   March 5, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates