Physically plausible reconstruction of human–object dynamics from a single video remains under-explored in physics-based methods. Most prior approaches omit human-generated internal actuation by assuming motion driven solely by gravity and simple contacts. They also rely on idealized constitutive laws that underfit heterogeneous and anisotropic materials. We introduce PhysHO, which tightly couples SMPL-driven Linear Blend Skinning (LBS) with a Material Point Method (MPM) simulator to address these gaps. Our key insight is to use LBS as an interpretable actuation prior and MPM to propagate those forces through contact under physical constraints. Concretely, we derive targeted actuation with a PD controller guided by LBS trajectories and gate it per particle via a learnable LBS-impact factor so that only particles inside the SMPL volume are directly actuated. We model real materials with residual neural constitutive laws layered on expert elastic–plastic models and conditioned on per particle to capture heterogeneity and anisotropy. We stabilize monocular learning with structure-preserving 3D flow supervision and a progressive and loss-balanced training schedule. Our PhysHO reconstructs observed dynamics with high fidelity, and predicts future motion and simulates outcomes under novel human actions. Experimental results demonstrate robust human-driven interactions beyond gravity-only scenes.
Method Overview
PhysHO framework. We couple SMPL-driven LBS with an MPM simulator, where LBS provides a localized actuation prior
inside the human andMPMpropagates forces through contact to objects under physical constraints. Residual neural constitutive laws model
heterogeneous and anisotropic materials. Training uses a structure-preserving 3D flow prior and progressive loss-balanced optimization.
Comparison
We compare our method against state-of-the-art monocular
human reconstruction GART and monocular dynamic reconstruction 4D-Gaus on both the reconstruction part and the future prediction part.
Reconstruction Part
Future Prediction Part
Ablation Study
We evaluate our method in terms of both rendering quality
and physical reconstruction accuracy.
Rendering Quality
Reconstruction Accuracy
More Results
Reconstruction
Future Prediction
Application
Simulated Animations
Acknowledgments
This research / project is supported by
the National Research Foundation (NRF) Singapore, under
its NRF-Investigatorship Programme (Award ID. NRFNRFI09-
0008).
BibTeX
@inproceedings{jiang2026physho,
title={PhysHO: Physics-Based Dynamic 3D Gaussian Human and Object from Monocular Video},
author={Jiang, Suyi and Lee, Gim Hee},
booktitle={CVPR},
year={2026}
}