Sunday, February 28, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Neural Networks

3 minute read to ‘How to find optimal number of clusters using K-means Algorithm’ | by Kavya Gajjar | Oct, 2020

October 16, 2020
in Neural Networks
3 minute read to ‘How to find optimal number of clusters using K-means Algorithm’ | by Kavya Gajjar | Oct, 2020
586
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Usually in any K-means clustering problem, the first problem that we face is to decide the number of clusters(or classes) based on the data. This problem can be resolved by 3 different metrics(or methods) that we use to decide the optimal ‘k’ cluster values. They are:

  1. Elbow Curve Method
  2. Silhouette Score
  3. Davies Bouldin Index

Let us take a sample dataset and implement the above mentioned methods to understand their working.

You might also like

How AI Can Be Used in Agriculture Sector for Higher Productivity? | by ANOLYTICS

Future Tech: Artificial Intelligence and the Singularity | by Jason Sherman | Feb, 2021

Tackling ethics in AI algorithms: the case of Salesforce | by Iflexion | Feb, 2021

We will use the make blobs dataset from sklearn.datasets library for illustrating the above methods

Now let’s look at what these methods area and that after implementing those three methods on the created dataset what are the results.

A) Elbow Curve
The main idea is to define clusters such that the total within-cluster sum of square (WSS) is minimized. It measures the compactness of the clustering and we want it to be as small as possible. The idea is to choose a number of clusters (k) so that adding another cluster doesn’t improve much better the total WSS.

Big Data Jobs

***What Within-cluster Sum of Square mean (WSS)?
Basically it is the sum of squared distance (usually Euclidean distance) from it’s nearest centroid (center point of cluster).

It decreases with increasing number of clusters(k). Aim is to find the bend (like an elbow joint) point in the graph.

1. Fundamentals of AI, ML and Deep Learning for Product Managers

2. The Unfortunate Power of Deep Learning

3. Graph Neural Network for 3D Object Detection in a Point Cloud

4. Know the biggest Notable difference between AI vs. Machine Learning

***This method is called elbow curve because the visual representation of the WSS w.r.t. the number of clusters(k) looks like human elbow

Here, we find that the k=3 is the bend(elbow) point.
*** Usually elbow curve method is a little ambiguous as the bend point for some datasets is not visible clearly

B) Silhouette Score
Silhouette Score is calculated using mean of intra-cluster distance (a) and the mean of nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is (b -a) / max(a, b). For better clarification, intra-cluster distance (a) is distance of sample point to it’s centroid and (b) is distance of sample point to nearest cluster that it is not a part of. Hence, we want the silhouette score to be maximum. Thus, have to find a global maxima for this method.

Silhouette coefficient exhibits a peak characteristic as compared to the gentle bend in the elbow method. This is easier to visualize and reason with.

Here, we can easily visualize the peak point at k=3.

C) Davies Bouldin Index
It is defined as a ratio between the cluster scatter and the cluster’s separation. Basically a ratio of within-cluster distance and between cluster distances. Aim is to find optimal value in which clusters are less dispersed internally and are farther apart fro each other (i.e. distance between two clusters is high). Hence, a lower value of Davies Bouldin index will mean that the clustering is better.

As I mentioned earlier lower value is desired, so we find the global minima point i.e. k= 3.

So after using all the above mentioned methods, we concluded that optimal value of ‘k’ is 3. Now, implementing the k-means clustering algorithm on the dataset we get the following clusters.

We can see from the above graph that all points are classified into three clusters appropriately. Hence, the k=3 was an optimal value for clustering.

Credit: BecomingHuman By: Kavya Gajjar

Previous Post

Instagram Stories for Marketing | Infographic

Next Post

Splice Machine Announces the Launch of 'ML Minutes'

Related Posts

How AI Can Be Used in Agriculture Sector for Higher Productivity? | by ANOLYTICS
Neural Networks

How AI Can Be Used in Agriculture Sector for Higher Productivity? | by ANOLYTICS

February 27, 2021
Future Tech: Artificial Intelligence and the Singularity | by Jason Sherman | Feb, 2021
Neural Networks

Future Tech: Artificial Intelligence and the Singularity | by Jason Sherman | Feb, 2021

February 27, 2021
Tackling ethics in AI algorithms: the case of Salesforce | by Iflexion | Feb, 2021
Neural Networks

Tackling ethics in AI algorithms: the case of Salesforce | by Iflexion | Feb, 2021

February 27, 2021
Creative Destruction and Godlike Technology in the 21st Century | by Madhav Kunal
Neural Networks

Creative Destruction and Godlike Technology in the 21st Century | by Madhav Kunal

February 26, 2021
How 3D Cuboid Annotation Service is better than free Tool? | by ANOLYTICS
Neural Networks

How 3D Cuboid Annotation Service is better than free Tool? | by ANOLYTICS

February 26, 2021
Next Post
Splice Machine Announces the Launch of ‘ML Minutes’

Splice Machine Announces the Launch of 'ML Minutes'

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Accurate classification of COVID‐19 patients with different severity via machine learning – Sun – 2021 – Clinical and Translational Medicine
Machine Learning

Accurate classification of COVID‐19 patients with different severity via machine learning – Sun – 2021 – Clinical and Translational Medicine

February 28, 2021
Privacy Commissioner asks for clarity on minister’s powers in Critical Infrastructure Bill
Internet Security

Privacy Commissioner asks for clarity on minister’s powers in Critical Infrastructure Bill

February 28, 2021
Top Master’s Programs In Machine Learning In The US
Machine Learning

Top Master’s Programs In Machine Learning In The US

February 28, 2021
TikTok agrees to pay $92 million to settle teen privacy class-action lawsuit
Internet Security

TikTok agrees to pay $92 million to settle teen privacy class-action lawsuit

February 28, 2021
Machine Learning as a Service (MLaaS) Market 2020 Emerging Trend and Advancement Outlook 2025
Machine Learning

Key Company Profile, Production Revenue, Product Picture and Specifications 2025

February 28, 2021
Cybercrime groups are selling their hacking skills. Some countries are buying
Internet Security

Cybercrime groups are selling their hacking skills. Some countries are buying

February 28, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Accurate classification of COVID‐19 patients with different severity via machine learning – Sun – 2021 – Clinical and Translational Medicine February 28, 2021
  • Privacy Commissioner asks for clarity on minister’s powers in Critical Infrastructure Bill February 28, 2021
  • Top Master’s Programs In Machine Learning In The US February 28, 2021
  • TikTok agrees to pay $92 million to settle teen privacy class-action lawsuit February 28, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates