Credit: Data Science Central
In practice, the Data Scientist wants to know which formula they will write in their Excel sheet when they enter all the data available into it: Bayes’ or usual?
The answer is that it depends: if all the data is well determined, so numerical, with closed functions, normal conditional probability. If there are doubts about the data or how to connect its parts, then it is better using the Bayesian probability.
When you use Bayesian probability, you are counting on some amount of guessing, so that one could safely say that whatever is sure to be the case will be expressed through normal conditional probability, and whatever is expressed in Bayesian probability may still be proven to be improper or not as good as something else.
Bayesian probability is then a statistical probability, and conditional probability is a mathematical probability.
The same difference one observes in non-strictly-objective research, so that research that deals with things beyond the World of Mathematics, the World of the Computers, and so on, and objective research, so that research that fully fits inside of the World of Algebra, or the World of the PCs.
If I want to know if the value of x in x+5=7 is 2, I work totally inside of the World of Mathematics, and I decide: yes, it is.
If I want to know how much chance I have to get liver disease if I am an alcoholic, and Science has progressed to the level of establishing a perfect mathematical relationship between both items, I work totally inside of the World of Mathematics. If doubts remain as to the correlation between ‘being an alcoholic’ and ‘having liver disease’, then I work inside of the ‘Bayesian World’ instead.
The way to apply the formula is:
In normal conditional probability:
A = having liver disease
P(A) = 0.1
B = being an alcoholic
P(B) = 0.05
P (A and B) = 0.07
P(A|B) = 0.07 x 0.1/0.05 = 0.14 = 14%
However, if the next paragraph is the truth instead, so:
P(AB) = P(having liver disease given that the patient is alcoholic) = P (being an alcoholic given that the patient has the disease) x P(having liver disease)/P(being alcoholic)
P(AB) = 0.07 x 0.1/0.05 = 0.14 = 14%
Notice that the formula above is the same as isolating P(B and A), which has to be identical to P(A and B), in the second formula of the previous picture, and replacing the numerator with the result.
In the Monty Hall Problem, we have:
- About 33% of the doors have a car on the first question. Exactly 50% on the second.
- All are doors, 3 of them on the first question, 2 of them on the second.
- Are doors, and have a car is about 33% on the first question, and exactly 50% on the second.
P(having a car/a door is opened) = P(a door is opened/having a car) x P(having a car)/P(door being opened)
P(having a car/a door is opened) = 100% x 0%/100% on the first question, since a door is always opened at that stage (Silvio or Monty Hall does that), and that door has no car, invariably. This probability is then 0, never a car on the first question.
P(having a car/a door is opened) = 100% x 50%/100% = 50%
Now, it is known that, as for results of the show, every 10 in 100 win the car, and 8 in 10 of the winners switched doors instead of sticking.
First question, all the same.
Second question moment, then:
P(having a car/other door is chosen) = P(other door is chosen/having a car) x P(having a car)/P(other door is chosen)
P(having a car/other door is chosen) = 0.8 x 50%/50% = 80%
On the other hand,
P(having a car/same door is chosen) = P(same door is chosen/having a car) x P(having a car)/P(same door is chosen)
P(having a car/same door is chosen) = 0.2 x 50%/50% = 20%