AlphaGo Part III
What is the problem in reaching AGI? Haven’t machines proved powerful enough to solve all hard problems we through to them: locomotion, playing games, driving cars, translation, even cooking. So, we are already there, aren’t we? No.
The problem I see can be stated in one phrase: these machines do not have “free will”, or in other words, they don’t known what to do unless a human tell them the goal, or, in ML terms, the objective function (or loss function) to be minimized.
We learn mostly by unsupervised learning, not supervised learning, as in Go. How to reach to a point where a machine will find automatically what goals to reach and optimize is a long shot from where we are today. So, what is the problem?
The problem is finding what “pseudo-objective function” we should give the machine to learn. Reward optimization, as proposed by schmidhuber is ok when the task is well defined and you have frequent or infrequent rewards. However, for more high-level cognition we don’t have obvious rewards. What is the reward of curiosity, of art, of creativity?
Furthermore, you always need an external observer or teacher to interpret the results of the machine and to choose which reward to pick among an infinite number of possibilities. That’s the problem: the machine is isolated from the world and it needs ALWAYS an outside agent consciousness to tell the meaning of inputs & outputs and the goals.
My take on that is simple: the pseudo-objective function is “meaning”. The machine will try to minimize the serendipity or surprise, between what it thinks is the world and and it observes from the world.
Given its computational capabilities, it will start creating models of the world as he observes data (a model is simply a set of abstractions he learns to represent invariant properties he may find in observations).
Based on these models, the agent will generalize these patterns to other unseen observations. If it founds something as expected, then he confirms. If it sees something unexpected, then it will reconsider its model based on what he observed in the past and what he is seeing.
The agent tries to create a story, a narrative, that creates a sense of unity, cohesion of all elements he is observing. So, in other words, the objective is to maximize the state of internal coherence – the meaning. The observation take only a secondary role and are conditioned by the bias of the model that the system already have. It is conditioned by the world but its learning is completely independent from the world. That’s the challenge.
So, unless we create curiosity driven machine capable to incorporate some level of subjectivity of the world, we will not solve the last missing part of the puzzle to achieve AGI.