The block contains two branches (i) Indentity branch that refers to own itself i.e x . (ii) F(x) referes to the network part called residual mapping .
Assume x as input .If weights over which we are training are negative just skip the input. We are passing those weights into relu activation function which not allow to pass it for further calculation.
Why we use identity blog if there is relu which chop off all negative weights ?
The main architecture contains image → convolution → relu . For negative weights if I will able to stop is to pass in convolution layer and make unnecessary calculations and then is send to the relu , then is can be say I will able to reduce the parameters as well as calculations.
This is equivalent to reducing the amount of parameters for the same number of layers , so it can be extended to deeper models. So the author proposed ResNet with 50, 101 , and 152 layers , and not only did not have degradation problems, the error rate was greatly reduced, and the computational complexity was also kept at a very low level .