Abstract
Real-time tracking of previously unseen, highly dynamic objects in contact-rich environments—such as during dexterous in-hand manipulation—remains a significant challenge. Purely vision-based tracking often suffers from heavy occlusions due to the frequent contact interactions and motion blur caused by abrupt motion during contact impacts. We propose TwinTrack, a physics-aware visual tracking framework that enables robust and real-time 6-DoF pose tracking of unknown dynamic objects in a contact-rich scene by leveraging the contact physics of the observed scene. At the core of TwinTrack is an integration of Real2Sim and Sim2Real. In Real2Sim, we combine the complementary strengths of vision and contact physics to estimate object's collision geometry and physical properties: object's geometry is first reconstructed from vision, then updated along with other physical parameters from contact dynamics for physical accuracy. In Sim2Real, robust pose estimation of the object is achieved by adaptive fusion between visual tracking and prediction of the learned contact physics. TwinTrack is built on a GPU-accelerated, deeply customized physics engine to ensure real-time performance. We evaluate our method on two contact-rich scenarios: object falling with rich contact impacts against the environment, and contact-rich in-hand manipulation. Experimental results demonstrate that, compared to baseline methods, TwinTrack achieves significantly more robust, accurate, and real-time 6-DoF tracking in these challenging scenarios, with tracking speed exceeding 20 Hz
Overview of TwinTrack

TwinTrack includes two main components: Real2Sim, where we learn contact physics by combining the complementary strengths of vision and contact dynamics, and Sim2Real, where we achieve robust pose estimation of the object by adaptive fusion between visual tracking and the prediction of the learned contact dynamics.
Real2Sim: learning contact physics from RGB-D vision
Real2Sim focuses on learning both contact collision geometry and contact dynamics from RGB-D observations, creating a twining simulation environment that mimics real-world physics.
Impact-rich free falling
Sim2Real: physics-aware robust tracking
Sim2Real adaptively fuses the prediction of the learned contact physics and feature-based tracking, achieving robustness to motion blur, occlusions, and partial observability.
Contact dynamics guided tracking (left) versus Naive visual only tracking (right)
Our concurrent projects using TwinTrack
The website template was borrowed and adapted from Ref-Nerf