Decoupled Generative Modeling for
Human-Object Interaction Synthesis

Hwanhee Jung1 Seunggwan Lee1 Jeongyoon Yoon1 SeungHyeon Kim1
Giljoo Nam2 Qixing Huang3 Sangpil Kim1,*
1Korea University 2Meta 3The University of Texas at Austin

Abstract

Synthesizing realistic human-object interaction (HOI) is essential for 3D computer vision and robotics, underpinning animation and embodied control. Existing approaches often require manually specified intermediate waypoints and place all optimization objectives on a single network, which increases complexity, reduces flexibility, and leads to errors such as unsynchronized human and object motion or penetration. To address these issues, we propose Decoupled Generative Modeling for Human-Object Interaction Synthesis (DecHOI), which separates path planning and action synthesis. A trajectory generator first produces human and object trajectories without prescribed waypoints, and an action generator conditions on these paths to synthesize detailed motions. To further improve contact realism, we employ adversarial training with a discriminator that focuses on the dynamics of distal joints. The framework also models a moving counterpart and supports responsive dynamic long sequence planning while preserving plan consistency. Across two benchmarks, FullBodyManipulation and 3D-FUTURE, DecHOI surpasses prior methods on most quantitative metrics and qualitative evaluations, and a perceptual studies likewise prefer our results.

Key Contributions

Overview

Main architecture figure
Overview of DecHOI. Architecture of DecHOI showing the decoupled trajectory and action generation process. Conditioned on the text instruction, geometry, current human and object poses, and a goal point, the trajectory generator plans paths, while the action generator produces joint motions on these paths to yield synchronized, contact-aware interactions. The right panels detail the Trajectory and Action Generators.

Video Results

DynaPlan

*For visualization, we render the obstacle (green) using a pre-trained action generation model, which can introduce slight jitter in the obstacle.

BibTeX

Citation
@misc{jung2025decoupledgenerativemodelinghumanobject,
  title         = {Decoupled Generative Modeling for Human-Object Interaction Synthesis},
  author        = {Hwanhee Jung and Seunggwan Lee and Jeongyoon Yoon and SeungHyeon Kim and Giljoo Nam and Qixing Huang and Sangpil Kim},
  year          = {2025},
  eprint        = {2512.19049},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  url           = {https://arxiv.org/abs/2512.19049}
}