Last week, I took Google’s Tensorflow Certification exam, a grueling 5-hour endeavor where the tester is required to build highly accurate models pertaining to image classification, text classification, and time-series predictors.
I ended up passing the exam, but it definitely was a beast and required incremental accumulation of subject matter knowledge over several years along with a very structured study schedule and hands-on practice.
I wanted to quickly share what I’ve learned since then about Tensorflow and about neural networks in general.
Artificial Intelligence (AI) is all about probabilities at the most basic level, but we can sum it up in once sentence:
Artificial Intelligence is about arriving at the probability distribution of the truth.
You aren’t seeing this at the surface level because API’s like Keras extract the higher level math away, but ultimately its good to understand exactly what machine learning is doing under the hood so that we know when to put faith into what the model is telling us, and when not to.
So how are neural networks using probability distributions? Let’s take an example of using an AI model to predict heart disease.
Let’s say I have a training dataset of 1000 patients, 980 are labeled to not have heart disease and 20 are labeled as having heart disease. As far as the model is concerned, this dataset represents the ground truth, the maximum amount of information present to be learned by a machine learning model to equate the real world.
What a neural network is doing internally is that it’s shifting its weights around to arrive at the expected distribution of the original training dataset. In other words, when you feed data into a neural network, what you’re saying to the network is this:
“Okay, neural network, I’m going to feed you some data where 980 of the inputs have class “0″ (“do not have heart disease”), and 20 have class “1″ (“have heart disease”). Please shift your network weights around when evaluating the input data so that your predictions will typically match around 20 class “1’s” for every 980 class “0’s””.
That’s all that’s happening under the hood. The neural network is converging in on a probability distribution for predictions that will most closely match the probability distribution of the original ground truth you fed into it.
This is an important thing to understand because if you’re feeding datasets into a machine learning model where the distribution of classifiers do not approximate the true distribution of those classifiers in nature, then your machine learning model will be pretty “untruthful” (useless) when you try to predict on new data.
This is the ultimate challenge of data science and artificial intelligence is that it’s not just about building efficient models, it’s actually more about what you’re feeding into them.
This is analogous to diet and exercise for people: if you’re training hard in the gym, but eating fattening and unhealthy foods, it doesn’t matter how great your training program is, you’re ultimately not going to get the results you’re looking for.
The hardest part about machine learning is understanding whether the distribution of your data matches that which occurs naturally in the physical world.
1. Top 5 Open-Source Machine Learning Recommender System Projects With Resources
2. Deep Learning in Self-Driving Cars
3. Generalization Technique for ML models
4. Why You Should Ditch Your In-House Training Data Tools (And Avoid Building Your Own)
If I went out and sampled 1000 people for heart disease using their age, BMI, blood pressure, and other feature inputs to predict whether they have heart disease, how many of them would truly have heart disease?
But here’s the twist:
- What if I sampled those 1,000 people standing outside my local gym? How might that affect the number of people in the dataset I record as truly having heart disease?
- Conversely, what if I sampled the entire dataset standing outside of a McDonald’s?
Depending on how I collect the dataset, that is going to affect the ground truth distribution of “doesn’t have heart disease” versus “has heart disease” for training data. I may encounter far less people who truly have heart disease standing outside my local gym versus people who are regular diners at McDonald’s.
If I fed either of these datasets into a machine learning model, the model is going to converge in on the same probability distribution as the dataset that I fed into it. The model cannot learn any more information that what I originally gave it. It won’t tell me whether I am biasing it or not.
Neural networks cannot learn any more information than what you give them. They cannot tell you whether your dataset represents actual reality or not.
If my dataset is biased towards not having heart disease, the model will move its internal probability distribution to put a heavier emphasis on predicting someone to not have heart disease, which could yield a lot of false negatives if I started using that model in a clinical setting for cardiology patients.