Refine
Document Type
- Conference Proceeding (3)
- Article (1)
- Preprint (1)
Language
- English (5)
Keywords
- 3D ship detection (1)
- Bayesian neural network (1)
- Deep learning (1)
- Extended object tracking (1)
- Extension estimation (1)
- Gaussian processes (1)
- Inverse perspective (1)
- Lidar (1)
- Lidar-camera registration (1)
- Machine learning (1)
Institute
Black-box variational inference (BBVI) is a technique to approximate the posterior of Bayesian models by optimization. Similar to MCMC, the user only needs to specify the model; then, the inference procedure is done automatically. In contrast to MCMC, BBVI scales to many observations, is faster for some applications, and can take advantage of highly optimized deep learning frameworks since it can be formulated as a minimization task. In the case of complex posteriors, however, other state-of-the-art BBVI approaches often yield unsatisfactory posterior approximations. This paper presents Bernstein flow variational inference (BF-VI), a robust and easy-to-use method flexible enough to approximate complex multivariate posteriors. BF-VI combines ideas from normalizing flows and Bernstein polynomial-based transformation models. In benchmark experiments, we compare BF-VI solutions with exact posteriors, MCMC solutions, and state-of-the-art BBVI methods, including normalizing flow-based BBVI. We show for low-dimensional models that BF-VI accurately approximates the true posterior; in higher-dimensional models, BF-VI compares favorably against other BBVI methods. Further, using BF-VI, we develop a Bayesian model for the semi-structured melanoma challenge data, combining a CNN model part for image data with an interpretable model part for tabular data, and demonstrate, for the first time, the use of BBVI in semi-structured models.
The main challenge in Bayesian models is to determine the posterior for the model parameters. Already, in models with only one or few parameters, the analytical posterior can only be determined in special settings. In Bayesian neural networks, variational inference is widely used to approximate difficult-to-compute posteriors by variational distributions. Usually, Gaussians are used as variational distributions (Gaussian-VI) which limits the quality of the approximation due to their limited flexibility. Transformation models on the other hand are flexible enough to fit any distribution. Here we present transformation model-based variational inference (TM-VI) and demonstrate that it allows to accurately approximate complex posteriors in models with one parameter and also works in a mean-field fashion for multi-parameter models like neural networks.
Targetless Lidar-camera registration is a repeating task in many computer vision and robotics applications and requires computing the extrinsic pose of a point cloud with respect to a camera or vice-versa. Existing methods based on learning or optimization lack either generalization capabilities or accuracy. Here, we propose a combination of pre-training and optimization using a neural network-based mutual information estimation technique (MINE [1]). This construction allows back-propagating the gradient to the calibration parameters and enables stochastic gradient descent. To ensure orthogonality constraints with respect to the rotation matrix we incorporate Lie-group techniques. Furthermore, instead of optimizing on entire images, we operate on local patches that are extracted from the temporally synchronized projected Lidar points and camera frames. Our experiments show that this technique not only improves over existing techniques in terms of accuracy, but also shows considerable generalization capabilities towards new Lidar-camera configurations.
Three-dimensional ship localization with only one camera is a challenging task due to the loss of depth information caused by perspective projection. In this paper, we propose a method to measure distances based on the assumption that ships lie on a flat surface. This assumption allows to recover depth from a single image using the principle of inverse perspective. For the 3D ship detection task, we use a hybrid approach that combines image detection with a convolutional neural network, camera geometry and inverse perspective. Furthermore, a novel calculation of object height is introduced. Experiments show that the monocular distance computation works well in comparison to a Velodyne lidar. Due to its robustness, this could be an easy-to-use baseline method for detection tasks in navigation systems.
Random matrices are used to filter the center of gravity (CoG) and the covariance matrix of measurements. However, these quantities do not always correspond directly to the position and the extent of the object, e.g. when a lidar sensor is used.In this paper, we propose a Gaussian processes regression model (GPRM) to predict the position and extension of the object from the filtered CoG and covariance matrix of the measurements. Training data for the GPRM are generated by a sampling method and a virtual measurement model (VMM). The VMM is a function that generates artificial measurements using ray tracing and allows us to obtain the CoG and covariance matrix that any object would cause. This enables the GPRM to be trained without real data but still be applied to real data due to the precise modeling in the VMM. The results show an accurate extension estimation as long as the reality behaves like the modeling and e.g. lidar measurements only occur on the side facing the sensor.