I promise it’s not just another “ML Article.”
- Naive Bayes: The Naive Bayes Classifier technique derives from on the Bayesian theorem.
- Bayes Theorem: Bayes’ theorem is a mathematical equation used in probability and statistics to calculate the conditional probability. In other words, it is used to calculate the probability of an event based on its association with another event — Prof. Helmenstine.
- Conditional Independence: A great example from Wikipedia: “
A and B are conditionally independent given C if and only if, given the knowledge that C occurs, knowledge of whether A occurs provides no information on the likelihood of B occurring, and knowledge of whether B occurs provides no information on the likelihood of A occurring.”
- Posterior Probability: The probability of an event will occur after all information is learned.
- Priori Probability (Marginal Probability): The probability of an event will occur before all information is learned.
Trending AI Articles:
1. Predicting buying behavior using Machine Learning
2. Understanding and building Generative Adversarial Networks(GANs)
3. Learning from mistakes with Hindsight Experience Replay
4. How to Prevent Bias in Machine Learning
Intuitively, the idea of a Naive Bayes is how you probably approach life. Like all my articles, I believe that a simple and intuitive understanding of a model should be understood first before diving into the mathematics and practical jargon.
How do you respond or think when making a decision?
Let’s say you’re responsible for Thanksgiving dinner. You have cooked Thanksgiving dinner for the last ten years. Within those ten years, you have prepared three desserts: pumpkin pie, chocolate cheesecake, and white macadamia cookies. More often than not, you will use the reception of your recipes in the last ten years. For example, your uncle is bringing his new girlfriend. What should you make?
You recall that you have relatives who are allergic to macadamia. You also remember that friends and relatives with a Latino background tend to like cookies whereas friends and relatives with an Asian origin tend to enjoy the pumpkin pie more. Please note that these are not actual statistics!
You ask your uncle where is his girlfriend from and whether she is allergic to macadamia. Her parents are from Argentina, and she’s not allergic to macadamia. For this simple derivation, we can intuitively conclude that she will probably like cookies!
You realize using previous encounters might help you identify insightful patterns.
- Supervised/Unsupervised: Supervised
- Regression/Classification: Classification
You must first know some basic probabilities concepts.
- The syntax “P(A|B)” means the probability of event A occurring given event B. For example, the likelihood that I will have an umbrella given that it rains is 100%.
- Independent events are events that do not influence one another which is an unlikely occurrence but a required one to conduct a Naive Bayes model. For example, the event that I will take the train is independent of whether someone from Paris takes the train — extreme case.
- Dependent events are events that influence one another. For example, rain should affect if I carry an umbrella or not.
- Multiplying probabilities. If we multiply two events; a coin landing on heads and a dice landing on 4, we calculate the probability of both of these events by ½ * ⅙ = 1/12, so the probability that these events occur is 1/12. One requirement is that these events have to be independent of one another!
If you would like to know how we derive the Bayes Theorem formula, the image below should help.
Bayes Theorem (Naive Bayes derives from Bayes Theorem)
If you have never heard of Bayes Theorem, let me explain.
Mr. X went to the doctor. The doctor told Mr. X that given his symptoms, he had 2% of having a severe illness. Should he be a concern?
Not before he calculates the Bayes Theorem. Why? Well because the Bayes Theorem can help us incorporate not just the probability of having a severe illness giving you tested positive but of other probabilities that the doctor is not including.
Prior Probability P(A): The probability that anyone has a rare disease.
- P(A) = 1%
Conditional Probability of A is P(B|A): The probability that given you tested positive, you have the rare disease. Remember, only because you have certain symptoms, it does not mean you have the rare disease.
- P(B|A) = 80%
Priori Probability P(B): Lastly, we need to compute the probability of a person testing positive. A person could test positive under two scenarios:
- The person has the disease. (True Positive)
- The person does not have the disease. (False Positive)
- Formula: [P(B|A)*P(A)] + [P(B|NOT A)*P(NOT A)] = P(B)
- (.80*.01) + (.096*.99) = 0.103 = 10.3%
- ((.8)(.01))/(.103) = .0776 = 7.76%
- If you tested positive, there is 7.76% you have the disease.
- Notice how the actual probability of having the disease is less than what the doctor suggested which is a result of the False Positives. Just because you tested positive does not mean you will have the disease.
The idea remains the same. Back to our Thanksgiving example. Let’s presume that out of our 20 friends and relatives:
- 4: Cheesecake, 10: White macadamia cookies, 6: Pumpkin Pie
Since your brother’s girlfriend is an adult, let’s calculate the probability of each dessert:
1.The probability that your brother’s girlfriend will like Pie given she’s adult.
- P(Pie | Adult) = (P(Adult | Pie) * P(Pie)) / P(Adult)
- ((3/6)*(6/20))/P(6/20) = ½
2.The probability that your brother’s girlfriend will like Cookie given she’s adult.
- P(Cookie | Adult) = (P(Adult | Cookie) * P(Cookie)) / P(Adult)
- ((2/10)*(10/20))/P(6/20) = ⅓
3.The probability that your brother’s girlfriend will like Cheesecake given she’s adult.
- P(Cheesecake|Adult)=(P(Adult|Cheesecake)*P(Cheesecake))/ P(Adult)
- ((1/4)*(4/20))/P(6/20) = ⅙
Go for Cookies.
However, this is a simple case. More often than not, we would have to deal with multiple layers.
The Naive Bayes is a great model that was heavily used to detect if an email is spam or not.
The Naive Bayes is computationally fast, simple to implement and works well in high-dimensions. The drawback of the Naive Bayes is that it assumes independence among the probabilities which is probably inaccurate. More often than not, a person decisions/attribution to a dimension will influence other aspects.
Don’t forget to give us your 👏 !
Machine Learning Series Day 3 (Naive Bayes) was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.