Publications | Embodied Vision - Max Planck Institute for Intelligent Systems

7 results (View BibTeX file of all listed publications)

2024

Event-based Non-Rigid Reconstruction of Low-Rank Parametrized Deformations from Contours

Xue, Y., Li, H., Leutenegger, S., Stueckler, J.

International Journal of Computer Vision (IJCV), 2024 (article)

Abstract

Visual reconstruction of fast non-rigid object deformations over time is a challenge for conventional frame-based cameras. In recent years, event cameras have gained significant attention due to their bio-inspired properties, such as high temporal resolution and high dynamic range. In this paper, we propose a novel approach for reconstructing such deformations using event measurements. Under the assumption of a static background, where all events are generated by the motion, our approach estimates the deformation of objects from events generated at the object contour in a probabilistic optimization framework. It associates events to mesh faces on the contour and maximizes the alignment of the line of sight through the event pixel with the associated face. In experiments on synthetic and real data of human body motion, we demonstrate the advantages of our method over state-of-the-art optimization and learning-based approaches for reconstructing the motion of human arms and hands. In addition, we propose an efficient event stream simulator to synthesize realistic event data for human motion.

DOI [BibTex]

2024

Xue, Y., Li, H., Leutenegger, S., Stueckler, J. Event-based Non-Rigid Reconstruction of Low-Rank Parametrized Deformations from Contours International Journal of Computer Vision (IJCV), 2024 (article)

DOI [BibTex]

2023

Object-Level Dynamic Scene Reconstruction With Physical Plausibility From RGB-D Images

Strecke, M. F.

Eberhard Karls Universität Tübingen, Tübingen, 2023 (phdthesis)

Abstract

Humans have the remarkable ability to perceive and interact with objects in the world around them. They can easily segment objects from visual data and have an intuitive understanding of how physics influences objects. By contrast, robots are so far often constrained to tailored environments for a specific task, due to their inability to reconstruct a versatile and accurate scene representation. In this thesis, we combine RGB-D video data with background knowledge of real-world physics to develop such a representation for robots.

Our contributions can be separated into two main parts: a dynamic object tracking tool and optimization frameworks that allow for improving shape reconstructions based on physical plausibility. The dynamic object tracking tool "EM-Fusion" detects, segments, reconstructs, and tracks objects from RGB-D video data. We propose a probabilistic data association approach for attributing the image pixels to the different moving objects in the scene. This allows us to track and reconstruct moving objects and the background scene with state-of-the art accuracy and robustness towards occlusions.

We investigate two ways of further optimizing the reconstructed shapes of moving objects based on physical plausibility. The first of these, "Co-Section", includes physical plausibility by reasoning about the empty space around an object. We observe that no two objects can occupy the same space at the same time and that the depth images in the input video provide an estimate of observed empty space. Based on these observations, we propose intersection and hull constraints, which we combine with the observed surfaces in a global optimization approach. Compared to EM-Fusion, which only reconstructs the observed surface, Co-Section optimizes watertight shapes. These watertight shapes provide a rough estimate of unseen surfaces and could be useful as initialization for further refinement, e.g., by interactive perception. In the second optimization approach, "DiffSDFSim", we reason about object shapes based on physically plausible object motion. We observe that object trajectories after collisions depend on the object's shape, and extend a differentiable physics simulation for optimizing object shapes together with other physical properties (e.g., forces, masses, friction) based on the motion of the objects and their interactions. Our key contributions are using signed distance function models for representing shapes and a novel method for computing gradients that models the dependency of the time of contact on object shapes. We demonstrate that our approach recovers target shapes well by fitting to target trajectories and depth observations. Further, the ground-truth trajectories are recovered well in simulation using the resulting shape and physical properties. This enables predictions about the future motion of objects by physical simulation.

We anticipate that our contributions can be useful building blocks in the development of 3D environment perception for robots. The reconstruction of individual objects as in EM-Fusion is a key ingredient required for interactions with objects. Completed shapes as the ones provided by Co-Section provide useful cues for planning interactions like grasping of objects. Finally, the recovery of shape and other physical parameters using differentiable simulation as in DiffSDFSim allows simulating objects and thus predicting the effects of interactions. Future work might extend the presented works for interactive perception of dynamic environments by comparing these predictions with observed real-world interactions to further improve the reconstructions and physical parameter estimations.

link (url) DOI [BibTex]

2023

Strecke, M. F. Object-Level Dynamic Scene Reconstruction With Physical Plausibility From RGB-D Images Eberhard Karls Universität Tübingen, Tübingen, 2023 (phdthesis)

link (url) DOI [BibTex]

2022

Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using Deep Shape Priors

Elich, C., Oswald, M. R., Pollefeys, M., Stueckler, J.

Computer Vision and Image Understanding (CVIU), 2022 (article) Accepted

Abstract

Representing scenes at the granularity of objects is a prerequisite for scene understanding and decision making. We propose PriSMONet, a novel approach based on Prior Shape knowledge for learning Multi-Object 3D scene decomposition and representations from single images. Our approach learns to decompose images of synthetic scenes with multiple objects on a planar surface into its constituent scene objects and to infer their 3D properties from a single view. A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image. By differentiable rendering, we train our model to decompose scenes from RGB-D images in a self-supervised way. The 3D shapes are represented continuously in function-space as signed distance functions which we pre-train from example shapes in a supervised way. These shape priors provide weak supervision signals to better condition the challenging overall learning task. We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.

Link Preprint link (url) DOI Project Page [BibTex]

2022

Elich, C., Oswald, M. R., Pollefeys, M., Stueckler, J. Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using Deep Shape Priors Computer Vision and Image Understanding (CVIU), 2022 (article) Accepted

Link Preprint link (url) DOI Project Page [BibTex]

Visual-Inertial Odometry with Online Calibration of Velocity-Control Based Kinematic Motion Models

Li, H., Stueckler, J.

IEEE Robotics and Automation Letters (RA-L), 2022, Accepted for oral presentation at IEEE ICRA 2023 (article) Accepted

Abstract

Visual-inertial odometry (VIO) is an important technology for autonomous robots with power and payload constraints. In this paper, we propose a novel approach for VIO with stereo cameras which integrates and calibrates the velocity-control based kinematic motion model of wheeled mobile robots online. Including such a motion model can help to improve the accuracy of VIO. Compared to several previous approaches proposed to integrate wheel odometer measurements for this purpose, our method does not require wheel encoders and can be applied when the robot motion can be modeled with velocity-control based kinematic motion model. We use radial basis function (RBF) kernels to compensate for the time delay and deviations between control commands and actual robot motion. The motion model is calibrated online by the VIO system and can be used as a forward model for motion control and planning. We evaluate our approach with data obtained in variously sized indoor environments, demonstrate improvements over a pure VIO method, and evaluate the prediction accuracy of the online calibrated model.

preprint [BibTex]

Li, H., Stueckler, J. Visual-Inertial Odometry with Online Calibration of Velocity-Control Based Kinematic Motion Models IEEE Robotics and Automation Letters (RA-L), 2022, Accepted for oral presentation at IEEE ICRA 2023 (article) Accepted

preprint [BibTex]

2021

Physical Representation Learning and Parameter Identification from Video Using Differentiable Physics

Kandukuri, R., Achterhold, J., Moeller, M., Stueckler, J.

International Journal of Computer Vision, 2021 (article)

link (url) DOI Project Page [BibTex]

2021

Kandukuri, R., Achterhold, J., Moeller, M., Stueckler, J. Physical Representation Learning and Parameter Identification from Video Using Differentiable Physics International Journal of Computer Vision, 2021 (article)

link (url) DOI Project Page [BibTex]

2020

Numerical Quadrature for Probabilistic Policy Search

Vinogradska, J., Bischoff, B., Achterhold, J., Koller, T., Peters, J.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(1):164-175, 2020 (article)

DOI [BibTex]

2020

Vinogradska, J., Bischoff, B., Achterhold, J., Koller, T., Peters, J. Numerical Quadrature for Probabilistic Policy Search IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(1):164-175, 2020 (article)

DOI [BibTex]

Visual-Inertial Mapping with Non-Linear Factor Recovery

Usenko, V., Demmel, N., Schubert, D., Stückler, J., Cremers, D.

IEEE Robotics and Automation Letters (RA-L), 5, 2020, presented at IEEE International Conference on Robotics and Automation (ICRA) 2020, preprint arXiv:1904.06504 (article)

Abstract

Cameras and inertial measurement units are complementary sensors for ego-motion estimation and environment mapping. Their combination makes visual-inertial odometry (VIO) systems more accurate and robust. For globally consistent mapping, however, combining visual and inertial information is not straightforward. To estimate the motion and geometry with a set of images large baselines are required. Because of that, most systems operate on keyframes that have large time intervals between each other. Inertial data on the other hand quickly degrades with the duration of the intervals and after several seconds of integration, it typically contains only little useful information. In this paper, we propose to extract relevant information for visual-inertial mapping from visual-inertial odometry using non-linear factor recovery. We reconstruct a set of non-linear factors that make an optimal approximation of the information on the trajectory accumulated by VIO. To obtain a globally consistent map we combine these factors with loop-closing constraints using bundle adjustment. The VIO factors make the roll and pitch angles of the global map observable, and improve the robustness and the accuracy of the mapping. In experiments on a public benchmark, we demonstrate superior performance of our method over the state-of-the-art approaches.

Code Preprint link (url) Project Page [BibTex]

Usenko, V., Demmel, N., Schubert, D., Stückler, J., Cremers, D. Visual-Inertial Mapping with Non-Linear Factor Recovery IEEE Robotics and Automation Letters (RA-L), 5, 2020, presented at IEEE International Conference on Robotics and Automation (ICRA) 2020, preprint arXiv:1904.06504 (article)

Code Preprint link (url) Project Page [BibTex]

2024

Event-based Non-Rigid Reconstruction of Low-Rank Parametrized Deformations from Contours

2024

2023

Object-Level Dynamic Scene Reconstruction With Physical Plausibility From RGB-D Images

2023

2022

Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using Deep Shape Priors

2022

Visual-Inertial Odometry with Online Calibration of Velocity-Control Based Kinematic Motion Models

2021

Physical Representation Learning and Parameter Identification from Video Using Differentiable Physics

2021

2020

Numerical Quadrature for Probabilistic Policy Search

2020

Visual-Inertial Mapping with Non-Linear Factor Recovery

Latest News

Links

Contact Us

MPI Papers

Publication Type

Year

2024

Event-based Non-Rigid Reconstruction of Low-Rank Parametrized Deformations from Contours

2024

2023

Object-Level Dynamic Scene Reconstruction With Physical Plausibility From RGB-D Images

2023

2022

Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using Deep Shape Priors

2022

Visual-Inertial Odometry with Online Calibration of Velocity-Control Based Kinematic Motion Models

2021

Physical Representation Learning and Parameter Identification from Video Using Differentiable Physics

2021

2020

Numerical Quadrature for Probabilistic Policy Search

2020

Visual-Inertial Mapping with Non-Linear Factor Recovery