Sourangshu Bhattacharya, assistant professor at the Department of computer science and Engineering of IIT Kharagpur, spoke about “Scalable AI in Autonomous Driving” at SkillUp 2021, the virtual education fair organised by Analytics India Magazine.
During the presentation, he talked about his recent projects, joint ventures of IIT Kharagpur and HP:
- Multi-criteria online Frame-subset selection for autonomous vehicle videos.
- Convex online video frame subset selection using sultiple criteria for data-efficient autonomous driving.
His presentation touched upon CARLA simulator, state-of-the-art methods for AI problems, requirements of scalability, and MCOSS (Multi-criteria online Frame-subset selection).
Bhattacharya suggested the use of an open-source driving simulator like CARLA, made by Intel. The functions of CARLA are:
- Supports development training, validation of autonomous urban driving systems.
- Presents open-source code, training, and testing protocols.
- Provides maps of urban layouts, buildings, vehicles in different states of weather.
- Supports flexible specification of sensor suites and environmental conditions. Given the position of the vehicle, the software will generate various sensor outputs.
He stated, “The sensors of CARLA include RGB cameras available in different angles and also the depth map and semantic segmentation.” The depth map indicates how far and close the objects are in the rendered scene. On the other hand, semantic segmentation classifies every object by displaying it in a different color, according to the object class.
The maps of CARLA contain waypoints, identified by red dots. A cluster of waypoints make poses and a sequence of poses make an episode. Bhattacharya presented a demo video showing how CARLA is used. “The driving model of CARLA is based on the ground route, which makes it more accurate than other simulators,” he said.
Bhattacharya further explained Semantic segmentation as a component of AI application. He mentioned the popular models of semantic segmentation like DeepLab V2 and Deeplab V3+.
In an autonomous driving model, the simulator agent provides data to the perception module, which takes in the information from surroundings. Then the route planning module gives the directional command. Then the whole information cluster is sent to the neural network model, which through the steps of affordances and control module, produce the output of steer, throttle and brake.
In older paradigms, people used modular pipelines with a perception stack and driving model, but later, they started using end-to-end learning processes that include training perception as well as a driving module. Pilotnet, by NVIDIA, is uch an end-to-end learning model. Instead of a module approach, it took the end-to-end approach to the controller actions.
He also talked about another current paradigm called Conditional Imitation Learning, which uses the approach of learning by imitating an expert and sometimes exploring. It conditions the driving decision and explores the previous actions. Affordance learning is another paradigm that uses affordances which uses simple explainable quantities that govern driving decisions. It predicts affordances from camera observations.
There are three types of affordances such as non-discrete, continuous and non-continuous. The non-discrete affordances are hazard stop, red traffic light, speed sign. Non-continuous affordances are distance to a vehicle and continuous affordances are distance to the centerline.
Scalability in autonomous driving
Bhattacharya pointed out the necessity of scalability in autonomous driving. Big companies that invest in autonomous driving collect huge amounts of data. Autonomous driving is data intensive. He said an autonomous vehicle collects around 1 TB data in one hour.
The framework has two sets, cumulative set and reduced set. He said, “The cumulative set will keep on growing, but the reduced set will accumulate data from the selected subset of the current dataset. Here, the selected subset is identified as Xsbt.”
Bhattacharya talked about the convex optimization formulation for achieving the online subset. There are various criteria in this formulation such as Pairwise, pointwise, and selection variables. Later, they are tested on semantic segmentation.