Canada AI

Reinforcement learning (RL) is suffering from what I call the “big baby problem.”

RL is a category of machine learning that uses rewards and penalties to achieve a desired goal. But the benchmark tasks used to measure how RL algorithms are performing—like Atari video games and simulation environments—don’t reflect the complexity of the natural world.

As a result, the algorithms have grown more sophisticated without confronting real-world problems—leaving them too fragile to operate beyond deterministic and narrowly defined environments. (Big baby, see what I mean?)

This defeats the purpose of RL, which is to eventually develop robots that can adapt to changing physical surroundings. If you train a robot to pour a glass of water, for example, you want it to be able to do that with any given sink. But benchmarking the RL algorithms on Atari is like “training, testing, and evaluating on a single sink,” says Amy Zhang, a PhD student at McGill University and part-time research engineer on the Facebook AI Research team.

That’s why Zhang, along with two other collaborators, proposed three new families of benchmark tasks to better reflect the natural world. Two of them are focused on visual reasoning, in which the algorithm must train to navigate inside a natural image either to classify that image or to find a target object to settle on. The third modifies the existing Atari benchmark by swapping out the video game’s black background for randomly selected video clips.

“In the original Atari, a model can just learn to memorize every screen,” says Zhang. “In this setting, where we have video, every screen is going to be different, so it actually needs to learn to visually comprehend the scene to understand what's going on.”

“That seems a lot closer to real-world robotics,” she says.

When the researchers tested current RL algorithms on the new benchmarks, the algorithms tripped up significantly. “So that means that there's more work to be done to figure out how we can learn more generalizable, more robust models in RL,” Zhang says.

Until the algorithms can cope with a small dose of complexity, they certainly won’t be able to fare in more dynamic environments.

This article originally appeared in Technology Review

Karen Hao