A brief on our visual system and its promising contribution to the future study of artificial visual perception
So it’s been a very big while since I last updated my blog. I had been very busy with all the preparations for my last year of master’s. It took me a big while to finally decide what my main research interest is going to be. Artificial Intelligence is a very broad field and can be studied from many approaches — the computational neuroscience or cognitive science approach that takes in our cognition and intelligence into consideration, or it can be purely computational and mathematical. I personally prefer a balanced mixture of both. I have come to a recent realisation that doing machine learning is probably “not enough” (for me). The ability to learn is surely one of the puzzle pieces that make up our intelligence, but I’m more fascinated in the human cognition and perception — how we make sense of the world around us. To do so, one must receive information from the external environment through the 5 senses, where they are then channeled to and processed inside one’s brain — the human brain, with its ~80 billion neurons, is one of the most complex systems on Earth. And out of all the 5 senses that our brain is constantly processing and integrating, one plays a grand role and takes over more than half of our brain computational space, and more of our neurons are devoted to this sense than the other four combined. Yup, you’re right. I’m talking about Vision.
It is quite funny that computer vision used to be taken quite lightly in the past — I’m talking about the 50s and 60s when the human visual system was thought to be easy to replicate. Scientists thought teaching a computer to play chess would be harder. It makes sense because as a human being like you and me, chess requires a lot of practice time and not everyone has the time and skill to become a master. However, seeing is easy for us, and just like that, our visual ability such as binocular integration, depth perception, hand-eye coordination, visual pattern recognition, which are some of the most biologically complex tasks we undertake as humans, were taken for granted. Years flew by and now we know better than ever, how remarkably complex the biological visual system really is. Today, we had witnessed AlphaGo beat a Go champion; we saw IBM Watson beat a human contestant at Jeopardy!, the computer vision field, however, is still stuck at trying to (accurately) recognise objects and barely scratched the surface of human vision. At this moment, the majority of the computer vision software can do a task of a 3-year-old at best and has not possessed any essence of human-level perception, yet. I guess this is the reason why I can’t wait to put both of my feet inside this exciting field. As a fan of computational cognitive neuroscience of vision and artificial intelligence, computer vision is where I can get the best of both worlds.
Enough yadaah yadaah, let’s dig a little bit deeper.
One key ability that is unique to that of a human brain is invariant object recognition, which refers to an instantaneous and accurate recognition of objects in the presence of variations such as size, rotation, illumination, and position. In simple words, it allows us to identify objects in complex, distorted scenes in a fraction of a second. Despite decades of research into the topic, very little is known on how the brain constructs invariant representations of objects.
To put things into perspective of what our eyes can do that the machine (and probably many other animals out there) cannot, take a picture below for an example. In a split second, we were able to identify that the creature in the picture is a cat (probably at the vet) without even analysing closely at the shape of the paw, nor having to see its pointy ears or eyes. Our brain tells us so fast about the situation of when the image was taken without us having to consciously think about it.
In this another picture, which I blurred to the point where a computer system (and many animals) would have a hard time recognising the objects and the context behind the image, most of us would still be able to guess right away that it’s a picture of a crime scene investigation and the people in the picture are probably detectives or relevant officers.
But how do we do that? Well, we still haven’t figured out completely the computation and mechanism behind visual perception and cognition, yet. But what we know so far is how our visual system is able to recognize an object, which brings us to our next point.
“More than 50 percent of the cortex, the surface of the brain, is devoted to processing visual information”
This was said by William G. Allyn, a professor of Medical Optics, whose belief is the same as mine; that by understanding how vision works, we can maybe understand how our brain works as a whole. Our visual cortex is made up of 140 million neurons and is probably the most complex parts of the brain responsible for segmenting and integrating visual data relayed from the retina, and subsequently relaying those data to other regions in the brain to be further analysed to quickly give perception (object and pattern recognition) and formulate memories with little effort. Up to 50% of the neural tissue in our brain is, directly or indirectly, related to vision and over 66% of our neural activity is involved in visual processing alone. Amazing isn’t it?
Our visual cortex (highlighted area) is differentiated into 5 regions based on structural and functional classification, V1 to V5. It is theorized that the complexity of the visual data processing increase for each region as the visual information gets passed along. It was found that different neurons from different regions in the visual cortex only respond to certain specific stimuli. The most studied examples are the simple and complex cells. The simple cells mostly reside in V1, and responds to specific types of visual cues such as the orientation of edges and lines, while the complex cells found in V1-V3 do not represent a single receptive field like the simple cell, but a summation of several receptive fields integrated from many simple cells. They also respond preferentially to movement in certain directions. The study of cells in V2 have been shown to respond to differences in colours, spatial frequency, moderately complex patterns, and object orientation.
It has been hypothesized that there are special cells or groups of cells that learn to respond to certain objects, thus allowing instantaneous recognition of objects and patterns that have been previously seen.
Two proposals theorise that the information leaves the occipital lobes via the white matter tract pathway called steams to other regions of the brain, where both streams are segregated in function. The ventral stream (known as “what pathway”) is involved with objects and visual identification and cognition. The dorsal stream is involved with processing the object’s spatial location, attention, and action-related process. Both streams run through the temporal and parietal lobes. More studies are still needed to be done to confirm the precise function of the two.
I will not go too deep to avoid confusion for readers who are not familiar with neuroscience. This experiment deserves a page of its own.
In the early 50s, two neuroscientists from Harvard Medical School — David Hubel and Torsten Wiesel — conducted an experiment on a cat’s visual cortex where their spectacular findings won them 2 Nobel prices and revolutionized our understanding of the human visual cortex. The experiment, which involves the recording of the neurons in the visual cortex of a cat as they moved a bright line across its retina. The study finds that: (1)the neuron fired only the line was in a particular place on the retina, (2)the activity of the neuron changed depending on the orientation of the line, and (3) sometimes the neuron fired only when the line was moving in a particular direction (different neurons respond to different angles). For example, some fired when exposed to vertical edges and some when shown horizontal or diagonal edges. This is also when they discovered the 3 types of cells in the visual cortex: simple, complex, and hypercomplex (two which we mentioned earlier). This was when we know that simple cells are responsible for learning simple things such as detecting edges and corner, and as they become more complex the features to which they respond to become more and more specific. This finding has paved ways to modern computer vision and has been implemented into the convolutional neural network model, which shall be discussed in my next story.
For the video of the Hubel and Torsten experiment, please click on the following link: https://www.youtube.com/watch?v=y_l4kQ5wjiw
DiCarlo, J. J., Zoccolan, D., & Rust, N. C. (2012). How does the brain solve visual object recognition?. Neuron, 73(3), 415–434. doi:10.1016/j.neuron.2012.01.010
J Anat. 2010 Unravelling the development of the visual cortex: implications for plasticity and repair.;217(4):449–68. doi: 10.1111/j.1469–7580.2010.01275.x. Epub 2010 Aug 17.