Google LIMoE: A Step Towards A Single AI

Google announced a breakthrough that it describes as a step toward the goal of a single AI model that can handle multiple tasks.
SIA Team
June 18, 2022

Google announced a new technology called LIMoE, which it claims is a step toward Google’s goal of developing an AI architecture called Pathways.

LIMoE stands for Learning Multiple Modalities with a Sparse Mixture of Experts Model. It’s a model that combines vision and text processing.

Pathways is an AI architecture that consists of a single model that can learn to perform multiple tasks that are currently performed by multiple algorithms.

While other architectures can perform similar tasks, the new model distinguishes itself by employing a neural network technique known as a Sparse Model.

The sparse model is described in a 2017 research paper titled Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, which introduced the Mixture-of-Experts layer (MoE) approach.

In 2021, Google announced GLaM: Efficient Scaling of Language Models with Mixture-of-Experts, an MoE model trained solely on text.

The difference with LIMoE is that it works on both text and images at the same time.

The LIMoE model routes problems to “experts” who specialize in a specific task, achieving similar or better results than current problem-solving approaches.

An intriguing aspect of the model is how some experts specialize primarily in image processing, while others specialize primarily in text processing, and still, others specialize in both.

According to Google’s explanation of how LIMoE works, there are experts for eyes, wheels, striped textures, solid textures, words, door handles, food & fruits, sea & sky, and plant images.

The sparse model differs from the “dense” models in that, rather than devoting every part of the model to completing a task, the sparse model assigns the task to various “experts” who specialize in a subset of the task.

This reduces the computational cost, making the model more efficient.

Similarly on how a brain sees a dog and knows it’s a dog, that it’s a pug, and that the pug has a silver fawn color coat, this model can view an image and similarly accomplish the task, by delegating computational tasks to different experts who specialize in recognizing a dog, its breed, color, and so on.

Experts who specialize in various aspects of the problem provide the ability to scale and accurately complete a wide range of tasks at a lower computational cost.