Abstract

Existing object-search approaches enable robots to search through free pathways, however, robots operating in unstructured human-centered environments frequently also have to manipulate the environment to their needs. In this work, we introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects. These new challenges require combining manipulation and navigation skills in unexplored environments. We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills. To achieve this, we design an abstract high-level action space around a semantic map memory and leverage the explored environment as instance navigation points. We perform extensive experiments in simulation and the real-world that demonstrate that HIMOS effectively transfers to new environments in a zero-shot manner. It shows robustness to unseen subpolicies, failures in their execution, and different robot kinematics. These capabilities open the door to a wide range of downstream tasks across embodied AI and real-world use cases.


How Does It Work?

approach
Figure: Schematic overview of HIMOS. A semantic map Mt serves as a central memory component and is used and updated across low- as well as high-level modules. This map is extended to a partial panoptic map with instance IDs of relevant objects. Given the remaining target objects gt to find, the robot state srobot,t, and the derived valid actions vt, the high-level policy acts in an abstract action space. Low-level actions comprise local and global exploration, navigation to previously mapped object instances, and a mobile manipulation policy.


Efficient high-level decision making requires the right level of abstractions of states and actions. We hypothesize that object- and instance-level decision making is such an efficient level of abstraction for embodied search tasks. We design a high- level policy around this idea. In particular: (i) We propose an instance navigation subpolicy that leverages the agent’s accumulated knowledge about the environment. It provides a prior on important places and on the granularity of navigation points, making it data-efficient and well-optimizable. (ii) As objects are discrete instances, the resulting full action space remains discrete, avoiding the complexities of mixed action spaces. At the same time, all the subpolicies still act directly in continuous action spaces, allowing for direct transfer to real robotic systems. (iii) We abstract from reasoning about exact robot placements in the real world by shifting the responsibility of mobility into the subpolicies. This ensures that the subpolicies can start from a large set of initial positions, resolving the ”hand-off” problem from naive skill-chaining and strongly simplifies the learning process for the high-level policy. Furthermore, it enables us to change out the subpolicies to unseen subpolicies in the real world. (iv) We incorporate subpolicy failures into the training process, enabling the high- level policy to learn a retrial behavior if execution fails.

Videos

Code and Models

A software implementation of this project based on PyTorch including trained model checkpoints can be found in our GitHub repository and is released under the GPLv3 license. For any usage outside this license, please contact the authors.

Publications

Fabian Schmalstieg, Daniel Honerkamp, Tim Welschehold and Abhinav Valada,
Learning Hierarchical Interactive Multi-Object Search for Mobile Manipulation


Acknowledgements

This work was funded by the European Union’s Horizon 2020 research and innovation program under grant agreement No 871449-OpenDR. Toyota Motor Europe supported this project with an HSR robot for experiments.

People