Toward Precise Ambiguity-Aware Cross-Modality Global Self-Localization

There are significant advances in GNSS-free cross-modality self-localization of self-driving vehicles. Recent methods focus on learnable features for both cross-modal global localization via place recognition (PR) and local pose tracking, however they lack means of combining them in a complete localization pipeline. That is, a pose retrieved from PR has to be validated if it actually represents the true pose. Performing this validation without GNSS measurements makes the localization problem significantly more challenging. In this contribution, we propose a method to precisely localize the ego-vehicle in a high resolution map without GNSS prior. Furthermore, sensor and map data may be of different dimensions (2D / 3D) and modality, i.e. radar, lidar or aerial imagery. We initialize our system with multiple hypotheses retrieved from a PR method and infer the correct hypothesis over time. This multi-hypothesis approach is realized using a Gaussian sum filter which enables an efficient tracking of a low number of hypotheses and further facilitates the inference of our deep sensor-to-map matching network at arbitrarily distant regions simultaneously. We further propose a method to estimate the probability that none of the currently tracked hypotheses is correct. We achieve successful global localization in extensive experiments on the MulRan dataset, outperforming comparative methods even if none of the initial poses from PR was close to the true pose. Due to the flexibility of the approach, we can show state-of-the-art accuracy in lidar-to-aerial-imagery localization on a custom dataset using our pipeline with only minor modifications of the matching model.

View this article on IEEE Xplore