Should one teach AI the same way one would teach a child?
“Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child’s? If this were then subjected to an appropriate course of education one would obtain the adult brain.” — Alan Turing, 1950
From the age of 3, most young children start to speak. Not just uttering random words but actually forming complete sentences. They start to ask questions. From simple “what” and “where” to those annoying “how” and “why” questions. But you should be amazed. Because behind these pure and unbiased questions, the child is slowly constructing its own knowledge base — a mental library of symbolic representations of the world. From forming and storing mental images of objects to recalling and replicating. The prime ingredient of creativity and imagination, logic and reasoning. Piaget named these sub-stages Symbolic Function phase (forming mental images/pictorial symbols) and Intuitive Thought phase(asking questions/knowledge construction). He uses the term “intuitive” because children at this phase are unaware of how they obtained this knowledge and tend to be very certain that they are correct.
Do note that at this stage, children also developed very little of the concept of Conservation (certain physical characteristics of an object remains the same regardless of the changes of their appearance) and Transformation (the actual changes in physical characteristic), such that they may think that changing to a cup of a different shape also changes the amount of liquid within that cup (taller glass means more water), or if you squish a clay ball right in front of them from round to flat, they are likely think that the ball is two different objects.
A child continues this “preoperational period” until they reach the age of 7 where they reach middle childhood (age 7–12). By this stage, they should already have quite a large (common) knowledge base, with a very active imagination. They should already know common concepts such as time, location, numbers, etc. Now. Mark this phase, because this will be the key point of this article.
In middle childhood, logic and reasoning start to take place. This is called The Concrete Operational Stage. Children begin to understand the concept of “right” and “wrong”. They start to see themselves as more autonomous beings and will try to solve problems on their own. They will start to infer other’s people thoughts and opinions. They are now able to use inductive logic (inferring certain rule based on the cause and effect of the past experience) and manipulate simple mathematical and symbolic knowledge (such as they will know that if you take away 2 cookies from a jar of 5, there will be 3 left, without having to see the actual cookies being taken away). However, their mental operation is “concrete”, that is, limited to tangible and touchable objects only. They cannot reason abstractly (if not memorising from some youtube video), such as, what will happen to their family if their dad dies, or what is the meaning of freedom or free will. Their knowledge representation is limited to things they have only touch or seen.
Now, let’s stop here for a second before we move on to the more complex realm of abstract reasoning and formal logic, and think for a bit about the current state of today’s artificial intelligence. Can it rival a 7-year-old child?
2. Deep Learning in Self-Driving Cars
3. Generalization Technique for ML models
4. Why You Should Ditch Your In-House Training Data Tools (And Avoid Building Your Own)
Ever since the breakthrough of AlexNet (Convolutional Neural Network) and Google Deepmind’s Alpha Go, the term “AI” and “machine learning” have been coined around a great deal. Then comes another big advancement in natural language processing (NLP) in 2017, when the paper “Attention is All You Need” by a team of Google Brain and University of Toronto researchers was published. They introduced the “transformer”, an architecture that allows the creation of much deeper neural networks that can have billions of parameters. The idea is very interesting as it focuses on “attention”, which means the model knows what to select and what to discard. This is exactly what we do. I have mentioned the importance of attention before in the earlier series. It also learns in an unsupervised manner, meaning that the model does not need hand-generated labels and can learn the generate any text. These works are indeed, very impressive, and I am actually very amazed at how far we have come and how fast the field is progressing
Using the same model, GPT-3 by Open AI, the third generation language prediction model of its kind, uses up to 175 billion parameters. This is 10-times bigger than its previous version, GPT-2. It is able to do exquisite things like summarising text, more accurate translation, or generate a blueprint for an application, Again. Amazing (and also very computationally expensive to run). This wowed the whole tech world, and for sure, wowed many of my geek friends.
But if we were to ask if this model really “intelligent”, the answer would be “no”. Even compared to a 3-year-old. Why?
Just like many other expert systems that do exceptionally well in a very specific and narrow task (Alpha Go – play Go, GPT-3 — text prediction, AlexNet — image classification), they did what they did base on a specific task assigned to them. They rely on the extremely large dataset as well. At this point, according to Pearl, they are still statistical software simply doing a curve-fitting exercise, giving no intuition about the higher-order intelligence. It’s a constant competition of who can fit the curve better, and increase the accuracy, even by just 1%. Most of my time these days that are spent on machine learning, is actually less on the model architecture itself, but on the data-preprocessing step.
from the point of view of the mathematical hierarchy, no matter how skillfully you manipulate the data and what you read into the data when you manipulate it, it’s still a curve–fitting exercise, albeit complex and nontrivial.” — Judea Pearl
If you were to ask today’s CNN why they chose to label a certain dataset as a cat and not a dog, it won’t be able to answer you, nor can you trace back their trains of thought (again, to both answer and to classify an image, it is another complicated topic of a visuo-linguistic model). Can it tell you what is the difference between a cat and a dog? No. Because they have no idea the concept of a dog, or a cat, or an animal. Or anything. On the other hand, children have an ability to form new concepts, unsupervised, based on can seeing the same entities a couple of times. Does an AI know the concept of conservation and transformation? I have not come across a computer vision paper that aims to implement such a concept into a model. This is because we can barely solve a normal object recognition task without relying on absurdly large datasets. You just feed in loads of data and hoping that the model does whatever it should do to achieve the best accuracy.
Let us illustrate an example in which a DNN was assigned to the task of food container recognition (below). It is logically obvious to us that, based on its functionality, the food container should not be hanging from the ceiling, nor it should be upside down since this violates the physical rules we know of. It should always be placed on a solid surface. However, this knowledge is unknown to the network, thus, the network can be seen as learning ‘blindly’ and ‘freely’ without any knowledge of the real world. A commonsense so basic that even a young child would know, is still lacking in today’s model.
A language predictor like GPT-3 is tricky and it seems like it has already mastered this linguistic game. It already appears intelligent, giving you meaningful answers. But the true intelligence is a lot more than mastering a language game. This is the case of the real-world as well. It’s actually very easy to trick people into believing that you are intelligent. You can simply read smart facts from somewhere and put them into your own word. You can use someone else’s work as your template of speech. But did the model actually know about the poem it just wrote or did it simply just based it on a certain original piece and then predict a similar choice of word and style? Does it really have its own creativity, or is it just like a guy plagiarising someone else’s work without knowing what he’s really doing?
This applies to us humans as well because not all of us are considered to be intelligent either.
So, many predictors/classifiers today solely focuses on the accuracy of the predictions, seeking answers to “what” and “where”, neglecting the “why” and “how”, which require transparency to the reasoning process leading to a decision. AI researchers are very well-aware of this “black box” issue. This is why recently, there has been a major shift towards transparent/explainable AI, which attempts to bring back the dated symbolic AI approach and integrate it into today’s deep learning model so that the decision making process can be traced. This is known as “Neuro-Symbolic AI”. In simple words, researchers are attempting to inject logic so that the model can start to reason.
Although the definition of intelligent is extremely broad and subjective. But in the sense of symbolic reasoning and common sense, with transparent and traceable decision-making processes, the logical intelligence of the model can definitely be proven the same way you can ask a 7-year-old about how he/she arrives at a certain conclusion.
My point is, before today, it seems like we all just want to create a magical machine that is already at its prime adulthood the moment it was born. We want it to become an expert “from the start”, forgetting what it really is to be intelligent. To be logical. To know the cause and effect. To reason about the dynamic world around us without being constantly fed a hideous amount of the dataset. We ask ourselves questions. We can arrive at new conclusions without being handheld. And we forget that all of us were not intelligent from birth, and our intelligence and behavior stemmed from millions of years of evolution, coded in our genes. Having a machine to do just one task with no knowledge of the physical world is no different than teaching any man to do repetitive factory work. He doesn’t even have to think. In this sense, some people are no different than a machine. The main difference is that the machine’s computational speed and accuracy far exceed that of a human, which is why it does so well in this curve-fitting task. But in the end, it is still just statistical software, but an extremely powerful one, and with no sense of intelligence.
I will definitely write more about Neuro-symbolic AI in the future. But for now, I just want to introduce Turing’s idea of a child machine (which happens to align with my ideology of artificial general intelligence) and for the reader to sit and think about what artificial intelligence today is really capable of doing compared to that of a young child. Would you call them intelligent?
Part 2 of this article will focus on abstraction and formal logic. The cognitive characteristics found in older children and how the same concept can be applied to AI
But what I find very interesting from GPT-3 is how it learns how to learn. Not in a human sense, with purpose, but it seems to arrive at a conclusion that it must learn or figure out certain things in order to obtain the best accuracy. I find the process of learning as a side-product to obtain the optimal results is very special and I cannot disagree that this is a major step towards a very powerful AI. I would like to emphasise on these “side products” in my future series about AI safety.
Again, thank you for reading.