Revisiting Structure from Motion with 3D Reconstruction Priors

Abstract

Structure from Motion (SfM) has been a cornerstone of computer vision for decades, aiming to reconstruct 3D scene structure and camera parameters from 2D images. Traditionally, this problem is broken into several subproblems - image matching, triangulation, and global optimization - typically relying on 2D keypoints and reprojection error. Recently, 3D reconstruction models like DUSt3R have proven highly effective for a variety of 3D vision tasks. These models, based on feed-forward neural networks, regress dense 3D pointmaps from pairs of images in a shared coordinate system. In this work, we integrate modern 3D reconstruction priors into the incremental SfM pipeline. We propose a novel optimization method that aligns 3D pointmaps with scene structure and incorporates them as additional constraints in the optimization process. This allows us to enhance global optimization by combining both 2D keypoints and 3D pointmaps, resulting in improved robustness. We evaluate our approach on indoor scenes and demonstrate that it outperforms the baseline pipeline that relies solely on 2D constraints from reprojection error in a shared coordinate system. In this work, we integrate modern 3D reconstruction priors into the incremental SfM pipeline. We propose a novel optimization method that aligns 3D pointmaps with scene structure and incorporates them as additional constraints in the optimization process. This allows us to enhance global optimization by combining both 2D keypoints and 3D pointmaps, resulting in improved robustness. We evaluate our approach on indoor scenes and demonstrate that it outperforms the baseline pipeline that relies solely on 2D constraints from reprojection error.

Method

Starting with an image collection, we match images to extract pairwise keypoint constraints. We pass all pairwise images through our 3D reconstruction prior to extract pointmaps. Using the provided keypoint matches (green), we extract the corresponding pointmap matches (orange). Next, we estimate a rigid alignment from the pointmap matches to their corresponding scene structure (purple). This serves as initialization to our global optimization. Note: Gray points are for visualization purposes only and are not used in the global optimization.

BibTeX

@article{korth2025revisiting,
  author    = {Korth, Daniel and Niessner, Matthias},
  title     = {Revisiting Structure from Motion with 3D Reconstruction Priors},
  year      = {2025},
}

Revisiting Structure from Motion with 3D Reconstruction Priors

tl;dr We enhance global optimization (typically Bundle Adjustment) with 3D constraints from DUSt3R.

Abstract

Method

BibTeX