Capsule Networks provide a way of detecting parts of objects in an image and representing cognitive links between those parts. Which means that capsule networks in several poses will recognize the same purpose even though they haven’t seen the pose in the training data.
CNN’s work by accumulating feature sets at every layer. It starts by identifying points, then shapes, then actual objects. However, the information on spatial relationships of all these features is lost.
You can think of a CNN like this:
CNN = 2 eye’s + 1 Nose + 1 Mouth
CNN = Face
There are two eyes, one nose and one mouth, but something is wrong.
Can you spot it? We can quickly tell that there are an eye and a mouth in the wrong position and that this is not what a person should look. A well-trained CNN has trouble with this idea, though.
CNN apart from being easily tricked by images with features in the wrong place, when viewing an image in a different orientation, a CNN is also easily confused. One way to overcome this is by an intense preparation from all angles, but this takes a lot of time and too much of computational power.
1. AI for CFD: Intro (part 1)
2. Using Artificial Intelligence to detect COVID-19
3. Real vs Fake Tweet Detection using a BERT Transformer Model in few lines of code
4. Machine Learning System Design
“Convolutional neural networks are doomed” — Geoffrey Hinton.
The emergence of Capsule Networks allows us to take full advantage of spatial relationships, and we can start looking at stuff like:
CNN = 2 adjacent eye’s + 1 Nose under eyes + 1 Mouth under nose
CNN = Face
Capsule Networks provide a way of detecting parts of objects in an image and representing spatial relations between those parts. That means capsule networks can recognize the same purpose in several different poses, even if they haven’t seen the pose in training data. So, what is a capsule, and how do they work?
Visual Parse Tree
They design capsule Networks according to how people appear to concentrate. Our visual system is excellent for focusing on small sections of a scene at a time and building up a whole piece-by-piece image. For instance, think about what happens when you speak to someone and look at them; at one point, your eyes can not concentrate on the entire face of that person. Instead, the eyes typically focus on only one part of the person’s face; the right or left eye, or even their nose bridge. You can solely focus your eyes on one point at a time.
We will model this piece-by-piece emphasis as a tree for capsule networks because its shape is of tree branches. Trees are composed of parent nodes, children nodes and branch nodes.
Below is an illustration of a more complex tree. Note that the child nodes in the higher layers are parents of their child nodes; these have been colour-coded in the diagram so that when a node is a child and parent, its child node text colour matches the text colour of the corresponding parent node. E.g., a node that has a blue “parent” text will be attached with a blue “child” label to a similar child node. Parent nodes can have over one child and growing parents may have children nodes attached. A “leaf” is also a kind of node without children.
Parent nodes combine child node observations to construct an intricate image. You will also see these trees rotated in a neural network layout, so they are on their side. This ends up looking like a similar picture of a neural network, with node layers processing some input data, generating outputs and transferring those outputs to the next node layer.
Note that each node in the above tree represents a capsule and in a capsule network, and as we move forward through these layers, the capsules in each layer will create a complete representation of an object in a piece-by-piece picture.
Capsules are a small group of neurons which have several key characteristics:
- Each neuron in a capsule represents unique properties of a particular part of an image; the properties such as parts colour, width, etc.
- Every capsule produces a vector which has an absolute magnitude (which represents the probability of the presence of a part) and orientation (which represents the generalized pose of a part).
- A capsule network comprises several capsule layers; during training, this network aims to learn the spatial relationships between the elements and the whole of an object (e.g. how the location of the eyes and the nose contribute to the site of the entire face in an image).
- Capsules reflect relationships between parts of an entire object by using dynamic routing to measure the connections between one capsule layer and the next, creating powerful connections between object parts linked to space.
Capsule Network can categorize into two parts:
- Convolutional Encoder
- Fully connected, a linear decoder