“Reinforcement Learning is used on many physical systems, but will it work in scenarios where it is hard to press the reset button?”
Alphabet Inc.’s Loon wants to provide internet connectivity to people in remote areas, and to facilitate that, the company is using balloons that run on AI; more specifically deep reinforcement learning algorithms. The company claims to be the world’s first company to use reinforcement learning in a production aerospace system. A few years ago, the team at Loon kickstarted their navigation deep RL project, codenamed Project Sleepwalk, a collaboration between Loon and the Google AI team in Montréal.
Loon’s deep RL solutions are an alternative to the conventional approach where the automated systems follow fixed procedures artisanally crafted by engineers. Today, Loon balloons use a new distributed training system that uses distributional Q-learning to make sense of tens of millions of simulated hours of flight.
Overview Of Deep Reinforcement Learning
Ideally, reinforcement learning agents are designed to learn directly from raw inputs without any hand-engineered features or domain heuristics. To achieve this, researchers resort to deep learning. Alphabet Inc.’s DeepMind is one of the companies that has pioneered in the field of deep reinforcement learning. They have managed to create the first artificial agents to achieve human-level performance across many strategic real-world scenarios.
Deep RL is a combination of deep learning and reinforcement learning and leverages the representational power of deep learning to tackle the reinforcement learning problem. Deep RL can build on existing toolkits and provide models of how representations can be shaped by rewards and by task demands. RL agents continually make value judgements so as to select good actions. The learnings of an agent are represented by a Q-network. This network is responsible for estimating the total reward that an agent can expect to receive in return for a particular action. Deep Q-Networks (DQN) algorithm stores all of the agent’s experiences and then randomly samples and replays these experiences to provide diverse and decorrelated training data. Since their introduction, DeepMind’s DQN algorithms have even managed to achieve human-level performance in many games.
How Loon Sails On Deep RL
The team working on Loon wrote in their blog that though the reinforcement learning was promising for Loon, they were unsure about deep RL being practical for high altitude platforms like balloons drifting through the stratosphere autonomously for long durations. The system that Loon’s balloons require must respond accurately to different variables such as uncertain winds, partial visibility and even cater for inconsistent power supply to make that correct turn.
“Additional challenges such as low-level coordination of a constellation of flight systems, navigating new high altitude platforms and adapting current tactics to handle new types of navigation goals add complexity to the mission,” wrote Salvatore Candido, CTO of Loon.
A super-pressure balloon in the stratosphere barely has two options: go up or go down. However, navigating that balloon skillfully is still complex. So even to begin with RL, the team at Loon had to prove that a machine can learn a drop-in replacement for navigation controllers.
Reinforcement Learning, wrote Candido, helps shift most of the expensive computation to train the RL agents. Most of the large compute operations are done before the flight begins, and the fleet control system only needs to run a “cheap” function, every minute of its flight through a deep neural network.
At such great heights, power becomes an expensive commodity. Loon balloons are solar-powered, and it powers navigation and communications equipment. Less power consumed to steer the balloon means more power is available to connect people to the Internet, information, and other people.
Instead of going the traditional route, the team at Loon is using RL to build navigation machines, surpassing the quality of what an engineer can create. This approach, says the CTO, allows Loon systems to scale well while using limited manpower.
The engineers at Loon have built navigation systems led by computers making decisions in a data-driven manner. No matter how well AI is steering the balloon in a complex setting, Candido brushed off any chance of the balloon completely working on its own. “… there is no chance that a super-pressure balloon drifting efficiently through the stratosphere will become sentient,” quipped Candido.