Now I know what you might be thinking. How can a machine teach me?! Let alone, how can it benefit me in the confinement of COVID-19?! Here is my punchline for you. If you do not have enough humility and curiosity, this article is not for you. But if you aspire to the likes of Einstein and Confucius, you may continue reading.
First of all, let’s turn the how questions into what calibrations.
Humans are full of qualities, all of which can fall on either the positive or negative ends of the behavioral spectrum. Same for machines. After all, they are the creation of their fellow human. Their brains, in other words their neural networks, are wired in a way that mimic ours.
Reducing a machine to its bits and pieces for a second, it is nothing but a system of neurons passing information into its cortex after which actions are taken. A reward or a penalty would then follow. The human informational passage and behavioral output manifests in a similar way to that of a machine.
That is why many of us perform at their best with a fixed routine or clear schedule. At the end of the day, the little Mario in each of us will fear penalization under uncertain conditions. That brings into attention the tale of COVID-19. It surely caught many by surprise. Imagine a Ferrari driving at 300km/h then immediately comes to halt! That is what has happened to the state of the world.
But then what can we learn from the automatic duck of Descartes and Vaucanson? Thing is, conditions have changed, hence the duck is no longer operating under its status quo.
A machine learns by the virtue of different algorithms. Reinforcement Learning (RL) is amongst the highly regarded ones in the realm of Machine Learning (ML). This is the one discussed hereunder.
Take a mouse for instance. The little fellow enters into an environment that has a number of states [S](contexts) and actions [A](moves). Based on the S and the A, the mouse’s life mission revolves around maximizing her rewards by gaining more points and minimizing the penalties that can result in her death.
At a higher level, the mouse will take the actions based on two analytical frames of thinking: exploration and exploitation. She will be exploring the environment around her, hence adding new states [S0 to S1] and so on; she will also be exploiting her recent knowledge from the previous states. Thus, her prior knowledge from S1 will serve her as she proceeds to the unknowns of S2. Perhaps, there the cat is getting much closer, compared to her encounter in S1.
So, how is the mouse able to get her most rewards and least penalties under such ambiguity?
The answer lies in a function the mouse has in its neural network, called Q-function. The Q function takes a form of a matrix that returns to the agent, the mouse, the steps that give her the maximum rewards. By constantly, exploring and exploiting her environment, she is able to reinforce her learning by moving into the direction with his values in the matrix. The Q-Matrix becomes her learning loop that iterates and improves for as long as the mouse is alive.
So, one thing we can take away from RL and the mouse with positive mindset, especially in the times of COVID-19 is that while we are not certain of our future state [St+1], we have to keep exploring and exploiting our environment, the house and the room where we live. Surely, we can create a new habit or routine that does not just substitute an old one but enriches our lives in both, holistic as well as specific terms.
Next time, someone tells you something in the effect of, “I do not know where we are heading to”, you can respond by saying, “at least we know where we are now”.
The good thing is that we are past state zero [S0] of COVID-19. Perhaps each one of us now is at a different stage. One might be at S1 while another is ahead in S2. What matters is that we utilize the power of the Q-Matrix when reshaping our lifestyle in the time-being. Not only would that induce a conscious change, it will also reinforce our learning mechanism as we proceed into the next state and the one after [S+N].
An exercise I recommend doing is to think of a habit, routine, behavior that you want to adopt or rid of and jot it down in a Q-Matrix. Assign different figures to it, say from 0 to 5. The ones that sum up to the highest score should inform your next course of action. You might be pondering the idea of working out 2 hours a day or exploring a different path every time you go down to buy groceries all the way to an urge for communicating more effectively and compassionately the 50% of time on average you spend with your spouse these days of confinement.
Stay humble, curious, with positive mindset, befriend the mouse and remember that nothing is constant but change.
Credit: Google News