Not every expert is convinced that AGI is a realistic goal, or even a possible goal. DeepMind, an Alphabet-backed research lab, released an artificial intelligence system called Gato this week that has contributed to that goal. Gato is what DeepMind describes as a “universal” system, one that can be taught to perform many different types of tasks. DeepMind researchers trained Gato on 604 tasks, including captioning images, conducting conversations, stacking blocks with real robotic arms, and playing Atari games.
Jack Hessel, a research scientist at the Allen Institute for Artificial Intelligence, noted that an AI system capable of solving many tasks is not new. For example, Google recently started using a system in Google Search called the Multi-Task Unified Model, or MUM, which can process text, images, and video to perform everything from finding cross-lingual variations on spelling of words to linking search queries to images up task. Like all AI systems, Gato learns by example, ingesting billions of words, images from real-world and simulated environments, buttons pressed, joint torques, and more in the form of tokens. These tokens represent data in a way that Gato can understand, allowing the system to figure out the mechanics of Breakout, or which combination of words in a sentence might have grammatical meaning.
Gato isn’t necessarily good at these tasks. For example, when chatting with people, the system often responds with superficial or inaccurate responses. For example, when answering what the capital of France is by saying “Marseille”, when captioning an image, Gato gives the wrong gender. And the system stacks the blocks correctly only 60 percent of the time when using a real-world robot. But on 450 of the aforementioned 604 tasks, DeepMind claims, Gato outperformed experts more than half the time. Oddly, from an architectural standpoint, Gato is not significantly different from many AI systems in production today. It shares a common feature with OpenAI’s GPT-3, namely that it is a “deformer”. Back in 2017, Transformer has become the architecture of choice for complex reasoning tasks, showing good capabilities in summarizing documents, generating music, classifying objects in images, and analyzing protein sequences.
Perhaps more notably, in terms of the number of parameters, Gato is orders of magnitude smaller than single-mission systems, including GPT-3. Parameters are the parts that the system learns from the training data and basically define the skills of the system on a problem, such as generating text. Gato has only 1.2 billion, while GPT-3 has over 170 billion. DeepMind researchers purposely kept Gato small so that the system could tackle specific problems in real time.