In the former article, we saw how we could replace M in our inequality with the growth function, and thereby the VC dimensions. The VC dimension means that it is possible to learn with infinite hypothesis sets — because we can replace them with something finite. It is also tighter than M since we no longer depend on the union bound, but instead, we consider overlaps between different hypotheses. In this article, we will give a few examples of how to use the inequalities. First, we will quickly look at the two versions of the inequality again and how we can re-write into another form to use it.
Inspecting the Formulas
We ended up getting the following inequality when we replaced M with the growth function — which, we could not directly replace:
The formula consists of the following components:
- E_out: The out-of-sample error we want to estimate.
- E_in: the in-sample error that we get.
- N: The sample size.
- δ: this is our confidence level, for example, δ = 0.05, gives a confidence level of 0.95.
- Growth Function: A bound on how many dichotomies we can produce on a given sample. The growth function was bounded in regard to the breakpoint — so we could find it by using the following formula:
If we instead of using the growth function, use the VC dimension instead, then it gets the following bound:
One way to think about the expression inside the square root is as a penalty for the complexity of our model. It penalizes us by worsening the bound on E_out when we use a more complex hypothesis set, (i.e., larger VC dimensions). If someone manages to fit a simpler model with the same training error, they will get a more favorable estimate for E_out. Another thing, to keep in mind is if we demand high confidence for our bound, then we need more training samples to get a good one. This leads to a re-writing of the bound, such that we can check how big our sample needs to be if we want a certain generalization performance.
We could again replace the growth function with the VC dimension if we want. Let us take some examples of how to use the bounds:
Example of Model Complexity — growth function
Imagine that our have an in-sample error of 0.1, i.e., E_in = 0.1. Our model has breakpoint 5 and sample size 1000, i.e., N = 1000. We want to be 95% confident of the bound we find, hence we have δ = 0.05.
1. Preparing for the Great Reset and The Future of Work in the New Normal
2. Feature Scaling in Machine Learning
3. Understanding Confusion Matrix
4. 8 Myths About AI in the Workplace
The first thing we need to do is to calculate the growth function, and we need to remember it is done for 2N and not simply N. We insert into the formula for the growth function and get:
We can then insert into the inequality.
As we can see, it is not a super great bound — but it could be improved with a larger sample. Now, let us do the same but with the VC dimensions instead.
Model Complexity — VC dimension
We have the same values as before, we just need to remember that the VC dimension is equal to k — 1, hence we have d_VC = 4. We insert into our formula:
Example of Sample Complexity — growth function
Imagine that we want our generalization error to be 0.1 at most, and we want to be 95% confident that it does not exceed it. That means we would have ε = 0.1 and δ = 0.05. We can also say that our breakpoint, this time, is 4. We then need to make an iterative process, where we start with an initial guess of our needed sample size. Let us start with an initial guess of N = 1000. We then need to calculate our growth function:
Then we insert this into the formula for our sample complexity:
We then repeat the process and calculate the growth function for N = 20314
We do this until the value for N stabilizes, which ends up being around N = 28340.
Example of Sample Complexity — VC dimension
We then do the same, but this time we replace the growth function with the VC dimensions. So we have d_VC = 3, ε = 0.1 and δ = 0.05. We also start with a guess of N = 1000. We insert into the formula:
Again, this is done until the value of N stabilizes — here around 28088. As expected, the two values are negligibly close to each other.
We have now seen how the different bounds can be used in practice and for what they are used. In the next article, we will talk about overfitting and how it can be remedied with a validation set.