PhysHO: Physics-Based Dynamic 3D Gaussian Human and Object from Monocular Video

CVPR 2026


Department of Computer Science, National University of Singapore
Description of the image

Abstract

Physically plausible reconstruction of human–object dynamics from a single video remains under-explored in physics-based methods. Most prior approaches omit human-generated internal actuation by assuming motion driven solely by gravity and simple contacts. They also rely on idealized constitutive laws that underfit heterogeneous and anisotropic materials. We introduce PhysHO, which tightly couples SMPL-driven Linear Blend Skinning (LBS) with a Material Point Method (MPM) simulator to address these gaps. Our key insight is to use LBS as an interpretable actuation prior and MPM to propagate those forces through contact under physical constraints. Concretely, we derive targeted actuation with a PD controller guided by LBS trajectories and gate it per particle via a learnable LBS-impact factor so that only particles inside the SMPL volume are directly actuated. We model real materials with residual neural constitutive laws layered on expert elastic–plastic models and conditioned on per particle to capture heterogeneity and anisotropy. We stabilize monocular learning with structure-preserving 3D flow supervision and a progressive and loss-balanced training schedule. Our PhysHO reconstructs observed dynamics with high fidelity, and predicts future motion and simulates outcomes under novel human actions. Experimental results demonstrate robust human-driven interactions beyond gravity-only scenes.

Method Overview

Description of the image

PhysHO framework. We couple SMPL-driven LBS with an MPM simulator, where LBS provides a localized actuation prior inside the human andMPMpropagates forces through contact to objects under physical constraints. Residual neural constitutive laws model heterogeneous and anisotropic materials. Training uses a structure-preserving 3D flow prior and progressive loss-balanced optimization.

Comparison

We compare our method against state-of-the-art monocular human reconstruction GART and monocular dynamic reconstruction 4D-Gaus on both the reconstruction part and the future prediction part.

Reconstruction Part
Future Prediction Part

Ablation Study

We evaluate our method in terms of both rendering quality and physical reconstruction accuracy.

Rendering Quality
Reconstruction Accuracy

More Results

Reconstruction

Future Prediction

Application

Simulated Animations

Acknowledgments

This research / project is supported by the National Research Foundation (NRF) Singapore, under its NRF-Investigatorship Programme (Award ID. NRFNRFI09- 0008).

BibTeX

@inproceedings{jiang2026physho,
      title={PhysHO: Physics-Based Dynamic 3D Gaussian Human and Object from Monocular Video},
      author={Jiang, Suyi and Lee, Gim Hee},
      booktitle={CVPR},
      year={2026}
  }