Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration

* denotes equal contribution
Stanford University

Human2Sim2Robot trains dexterous manipulation policies from one human RGB-D video.

We use object trajectories and pre-manipulation poses to guide RL in sim and thereby bridge the human-robot embodiment gap, achieving zero-shot sim-to-real transfer.

Abstract

Teaching robots dexterous manipulation skills often requires collecting hundreds of demonstrations using wearables or teleoperation, which is challenging to scale. Videos of human-object interactions are easier to collect and scale, but leveraging them directly for robot learning is difficult due to the lack of explicit action labels and morphological differences between robot and human hands.

We propose Human2Sim2Robot, a novel real-to-sim-to-real framework for training dexterous manipulation policies using only one RGB-D video of a human demonstrating a task. Our method utilizes reinforcement learning (RL) in simulation to cross the human-robot embodiment gap without relying on wearables, teleoperation, or large-scale data collection typically necessary for imitation learning methods. From the demonstration, we extract two task-specific components: (1) the object pose trajectory to define an object-centric, embodiment-agnostic reward function, and (2) the pre-manipulation hand pose to initialize and guide exploration during RL training. We found that these two components are highly effective for learning the desired task, eliminating the need for task-specific reward shaping and tuning. We demonstrate that Human2Sim2Robot significantly outperforms trajectory retargeting and one-shot imitation learning across a wide range of tasks, including grasping, non-prehensile manipulation, and extrinsic manipulation.

Method

Real-World Robot Performance

All Human2Sim2Robot videos are played at 1x speed.

Grasping Tasks

  • Plate Lift Rack

    ✅: 60%

  • Pitcher Pour

    ✅: 100%

Non-Prehensile Tasks

  • Snackbox Push

    ✅: 100%

  • Snackbox Pivot

    ✅: 100%

  • Snackbox Push Pivot

    ✅: 100%

  • Plate Push

    ✅: 100%

Multi-Step Task

  • Plate Pivot Lift Rack

    ✅: 86.6%

Simulation Rollouts

Robustness & Failure Recovery

Out-of-Distribution Object Positions

Distractor Objects, Perturbations, and Background Changes

  • Background Changes

  • Lighting Changes

  • Distractors

  • Perturbations (Plate)

  • Human Interference

  • Perturbations (Snackbox)

  • Table Color / Friction

  • Table Color / Friction

  • Obstructions / Friction

  • Obstructing Objects

Baselines

All baseline videos are played at 2x speed.

Replay

Object-Aware Replay

Behavior Cloning (Diffusion Policy)

Robustness Comparison

  • Baseline

  • Ours

Ablation Tests

Importance of Object Trajectory Tracking Rewards

  • Fixed Target

  • Interpolated Target

  • Downsampled Trajectory

  • Ours

Importance of Pre-Manipulation Pose Initialization

  • Default Initialization

  • Overhead Initialization

  • Pre-Manipulation Far

  • Ours

Sufficiency of Single Pre-Manipulation Hand Pose

  • Hand-Trajectory Tracking Rewards

  • Residual Policy

  • Ours

Video

Acknowledgements

This work is supported by Stanford Human-Centered Artificial Intelligence (HAI), the National Science Foundation (NSF) under Grant Numbers 2153854 and 2342246, and the Natural Sciences and Engineering Research Council of Canada (NSERC) under Award Number 526541680.

BibTeX

@article{TODO,
        author    = {TODO},
        title     = {TODO},
        journal   = {TODO},
        year      = {TODO},
      }