Hands-on Guide to AI Habitat: A Platform For Embodied AI Research

AI Habitat

AI Habitat is a simulation platform developed with an intent to advance research in the domain of Embodied AI. It trains embodied agents (such as virtual robots and egocentric assistants) in highly photo-realistic 3D environments. It embeds within the trained agents crucial features such as active perception, long-term planning and interactive learning which distinguish them from ‘internet AI’ based models which learn from static data available on the internet.

AI Habitat project was developed by Facebook AI Research (FAIR) and Facebook Reality Labs, in collaboration with Georgia Tech Computing, Simon Fraser University, Intel AI and UC Berkeley in January 2019.

The platform is named as ‘HABITAT’ since it is the place where the trained agents live!

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Watch the following video to have a glimpse of AI Habitat.

Before moving on to the details of Habitat, let us understand the concept of Embodied AI in order to have a greater insight of Habitat’s practical applications.

Download our Mobile App

Overview of Embodied AI

Embodied AI is an advanced type of Artificial Intelligence which enables learning through interaction with the surrounding environment. It does not rely on static datasets (such as ImageNet, COCO, VQA) comprising images, videos, and text. In simple terms, it is the study of intelligent systems which involve physical or virtual embodiment (such as robots or trained assistants). 

The concept of embodied AI is based on the embodiment hypothesis according to which, “intelligence emerges in the interaction of an agent with an environment and as a result of sensorimotor activity”.

Conventional AI practices face certain difficulties when dealing with real-world situations. For instance, a Computer Vision powered by AI may work well in controlled conditions such as a closed room but its performance may deteriorate when exposed to realistic situations such as varying distances and orientation. Unlike the strictly algorithm-led approach of such traditional AI practices, embodied AI practitioners try to first understand the working of biological systems, then develop general principles of intelligent behaviour and then apply that knowledge to build artificial intelligent systems.

Why use a simulation?

If we deploy an embodied agent in the real world, it cannot run faster than real-time. Besides, it can cause harm to anyone or even to itself in case of some malfunction. The environments in which it can execute are highly expensive to create and also difficult to reproduce at times. 

On the contrary, simulations can run way faster than real-time and can be parallelized too. The process of training and testing the agent in simulation is safe, cheap and systematic. Once the development and testing phases are completed in simulation, it can be transferred to physical platforms.

Components of Habitat

Habitat specifically consists of:

  1. Habitat-Sim
  2. Habitat-Lab

Are you curious to know about these components? Proceed further and explore!

Habitat-Sam is a flexible, high-performance 3D simulator. It comes with configurable agents and multiple sensors. It has built-in support for several generic 3D datasets such as MatterPort3D, Gibson, Replica, and more. When rendering a scene from the MatterPort3D dataset, Habitat-Sim achieves an excellent performance of thousands of frames per second (FPS) running single-threaded. It also reaches 10,000+ FPS multi-process on a single GPU!

Habitat-Lab is a modular high-level library for end-to-end development of embodied AI agents. It facilitates defining embodied AI tasks to be carried out by the trained assistant e.g. navigation, user-interaction, following manual instructions and so on. It also enables configuring the agents (such as its physical structure and embedded sensors), training those agents (through imitation, reinforcement learning or no learning at all as in the classical Simultaneous Localization And Mapping (SLAM) problem) as well as comparing their performance using standard metrics.

GitHub repositories: Habitat-Lab  Habitat-Sim

Practical implementation

Here is an example code to set up a PointNav task using Habitat in which the designed embodied agent’s task is to move from a source location to a target location. Consider that the user himself is the agent which should move based on the key controls.

Prerequisites for the implementation:

  • Install Habitat-Lab and Habitat-Sim (steps to install these are given in the GitHub links provided above)
  • Install the cv2 Python library (pip install OpenCV-python)
  • Download the .zip file of test scenes data

All the information required to handle embodied tasks with a simulator is abstracted inside the class called Env belonging to habitat.core.env package. It acts as a base class for other derived environment classes. It consists of three major components: dataset (episodes), simulator (sim) and task and connects them together.

Import the required libraries

 import habitat
 from habitat.sims.habitat_simulator.actions import HabitatSimActions
 import cv2 

Define the key controls

 def transform_rgb_bgr(image):
     return image[:, :, [2, 1, 0]] 

Configure an environment 

 def example():
     env = habitat.Env(
     print("Environment created successfully") 

   Reset the environments and returns the initial observations

    observations = env.reset()

    Print out the destination point’s distance and angle 

     print("Destination, distance: {:3f}, theta(radians): {:.2f}".format(

       Display the output simulation

       cv2.imshow("RGB", transform_rgb_bgr(observations["rgb"]))
       print("Agent is stepping around inside the environment.") 
     count_steps = 0  //variable to keep track of number of steps
     while not env.episode_over:  //while the dataset is not empty
         keystroke = cv2.waitKey(0)
         //Move forward if user presses ‘f’ key
         if keystroke == ord(FORWARD_KEY):   
             action = HabitatSimActions.MOVE_FORWARD
             print("action: FORWARD")
       //Move left if user presses ‘l’ key
         elif keystroke == ord(LEFT_KEY):
             action = HabitatSimActions.TURN_LEFT
             print("action: LEFT")
        //Move right if user presses ‘r’ key
         elif keystroke == ord(RIGHT_KEY):
             action = HabitatSimActions.TURN_RIGHT
             print("action: RIGHT")
        //Agent should stop if user presses ‘q’ key
         elif keystroke == ord(FINISH):
             action = HabitatSimActions.STOP
             print("action: FINISH")
         /*Display proper message if user presses any key other than ‘f’, 
          ‘l’, ‘r’ or ‘q’*/
             print("INVALID KEY")

       Perform an action in the environment and return observations

        observations = env.step(action)

       Increase the step count once the action is performed

        count_steps += 1

     Print out the destination point’s distance and angle

   print("Destination, distance: {:3f}, theta(radians): {:.2f}".format(

     Display the output simulation

     cv2.imshow("RGB", transform_rgb_bgr(observations["rgb"]))

   Print out the total number of steps covered

   print("Episode finished after {} steps.".format(count_steps))

     if (
         action == HabitatSimActions.STOP
         and observations["pointgoal_with_gps_compass"][0] < 0.3
         print("you successfully navigated to destination point")
         print("your navigation was unsuccessful") 

Execute Main method

 if __name__ == "__main__":

Running the above code will initialize an agent inside an environment which can be moved using ‘f’, ‘l’, ‘r’ and ‘q’ keys. On the terminal, a destination vector in polar format will be printed with distance and angle required to move to reach the goal. Once we are within 0.3m of the goal, we can press the ‘q’ key to stop and finish the episode successfully. If our finishing distance to goal exceeds 0.3m or if we spend more than 500 steps in the environment your episode will be unsuccessful.

In addition to the habitat.core.env package used in the above demo code, other packages provided by the Habitat community are listed below. Click on the corresponding link to learn in detail about the associated classes, methods and properties.

Visit this page to get an advanced level illustration of training a dragon embodied agent using Habitat along with a video-based explanation as well as Colab code notebooks.

Refer to the following sources to dig deeper into the concepts of AI Habitat:

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Nikita Shiledarbaxi
A zealous learner aspiring to advance in the domain of AI/ML. Eager to grasp emerging techniques to get insights from data and hence explore realistic Data Science applications as well.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox