You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've read your GTSfM paper - nice work, thanks for pushing this to arxiv. I enjoyed reading it and appreciate the huge effort that went into building it. I am very surprised by the conclusion that SuperPoint+Super/LightGlue is not as good as SIFT - in fact we've always observed the exact opposite with incremental SfM (COLMAP) on different easy and difficult datasets (ETH3D, IMC 2020/1/2/3). I went through the code but didn't find anything obvious.
The point clouds of SP+SG/LG look pretty sparse on several datasets, so do the matches in fig 3.
the shorter image side is resized to at most 760 pixels in length
So that'd give a 1351x760 px image for a 1920×1080 input - this seems fine.
A maximum of 5000 keypoints are used for each of the following front-ends
Do you know how many points are effectively extracted by SuperPoint per image? How often is the limit of 5k hit compared to SIFT?
Do I understand correctly that you use the default settings? Did you try to tweak them? As is, it cannot return 5k keypoints on these kinds of images, unlike SIFT. I recommend the following:
decrease the detection threshold: keypoint_threshold=0.001
decrease the NMS radius: nms_radius=3
if images are smaller than the limit (760px), upsample them
This should make SuperPoint competitive with SIFT in terms of keypoint detection.
We do know that these deep matchers are more easily tricked by symmetries, as you point out in fig 3. This seems confirmed by table 3: compared to SIFT, the mean of the front-end errors is much higher than their median and they have many more VG outliers, especially on South Building and Crane.
Did you try tuning the filtering threshold (minimum number of inliers, cycle consistency) for each front-end? 15 and 7° seem pretty loose for front-ends that have a high recall.
Did you try running the averaging+BA on edges that are inliers according to the GT poses?
It seems that the motion averaging does not have any robustness built-in. Zhang et al. (ICCV 2023) show that using a robust cost function is critical (table 5) and that weighting by inlier count or two-view covariance can often help. Did you try this? This paper actually shows that SuperPoint+SuperGlue can work perfectly fine for global SfM.
Thanks for your comments! We'll certainly discuss and try some of the things you suggest. We were not rooting for SIFT in any way :-) We appreciate the advice and hope to just get the best possible performance..
Hi folks,
I've read your GTSfM paper - nice work, thanks for pushing this to arxiv. I enjoyed reading it and appreciate the huge effort that went into building it. I am very surprised by the conclusion that SuperPoint+Super/LightGlue is not as good as SIFT - in fact we've always observed the exact opposite with incremental SfM (COLMAP) on different easy and difficult datasets (ETH3D, IMC 2020/1/2/3). I went through the code but didn't find anything obvious.
So that'd give a 1351x760 px image for a 1920×1080 input - this seems fine.
Do you know how many points are effectively extracted by SuperPoint per image? How often is the limit of 5k hit compared to SIFT?
gtsfm/gtsfm/frontend/detector_descriptor/superpoint.py
Lines 45 to 46 in 1b55b76
Do I understand correctly that you use the default settings? Did you try to tweak them? As is, it cannot return 5k keypoints on these kinds of images, unlike SIFT. I recommend the following:
keypoint_threshold=0.001
nms_radius=3
This should make SuperPoint competitive with SIFT in terms of keypoint detection.
Thanks!
cc @Phil26AT @ducha-aiki
The text was updated successfully, but these errors were encountered: