HIMOS: Hierarchical Interactive Multi-Object Search

Abstract

Existing object-search approaches enable robots to search through free pathways, however, robots operating in unstructured human-centered environments frequently also have to manipulate the environment to their needs. In this work, we introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects. These new challenges require combining manipulation and navigation skills in unexplored environments. We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills. To achieve this, we design an abstract high-level action space around a semantic map memory and leverage the explored environment as instance navigation points. We perform extensive experiments in simulation and the real-world that demonstrate that HIMOS effectively transfers to new environments in a zero-shot manner. It shows robustness to unseen subpolicies, failures in their execution, and different robot kinematics. These capabilities open the door to a wide range of downstream tasks across embodied AI and real-world use cases.

How Does It Work?

Figure: Schematic overview of HIMOS. A semantic map M_t serves as a central memory component and is used and updated across low- as well as high-level modules. This map is extended to a partial panoptic map with instance IDs of relevant objects. Given the remaining target objects g_t to find, the robot state s_robot,t, and the derived valid actions v_t, the high-level policy acts in an abstract action space. Low-level actions comprise local and global exploration, navigation to previously mapped object instances, and a mobile manipulation policy.

Efficient high-level decision making requires the right level of abstractions of states and actions. We hypothesize that object- and instance-level decision making is such an efficient level of abstraction for embodied search tasks. We design a high- level policy around this idea. In particular: (i) We propose an instance navigation subpolicy that leverages the agent’s accumulated knowledge about the environment. It provides a prior on important places and on the granularity of navigation points, making it data-efficient and well-optimizable. (ii) As objects are discrete instances, the resulting full action space remains discrete, avoiding the complexities of mixed action spaces. At the same time, all the subpolicies still act directly in continuous action spaces, allowing for direct transfer to real robotic systems. (iii) We abstract from reasoning about exact robot placements in the real world by shifting the responsibility of mobility into the subpolicies. This ensures that the subpolicies can start from a large set of initial positions, resolving the ”hand-off” problem from naive skill-chaining and strongly simplifies the learning process for the high-level policy. Furthermore, it enables us to change out the subpolicies to unseen subpolicies in the real world. (iv) We incorporate subpolicy failures into the training process, enabling the high- level policy to learn a retrial behavior if execution fails.