Refine
Year of publication
Document Type
- Conference Proceeding (19)
- Article (9)
- Part of a Book (3)
- Doctoral Thesis (3)
- Master's Thesis (2)
- Bachelor Thesis (1)
- Book (1)
- Preprint (1)
- Report (1)
Keywords
Institute
- Institut für Optische Systeme - IOS (40) (remove)
Using multi-camera matching techniques for 3d reconstruction there is usually the trade-off between the quality of the computed depth map and the speed of the computations. Whereas high quality matching methods take several seconds to several minutes to compute a depth map for one set of images, real-time methods achieve only low quality results. In this paper we present a multi-camera matching method that runs in real-time and yields high resolution depth maps. Our method is based on a novel multi-level combination of normalized cross correlation, deformed matching windows based on the multi-level depth map information, and sub-pixel precise disparity maps. The whole process is implemented completely on the GPU. With this approach we can process four 0.7 megapixel images in 129 milliseconds to a full resolution 3d depth map. Our technique is tailored for the recognition of non-technical shapes, because our target application is face recognition.
Black-box variational inference (BBVI) is a technique to approximate the posterior of Bayesian models by optimization. Similar to MCMC, the user only needs to specify the model; then, the inference procedure is done automatically. In contrast to MCMC, BBVI scales to many observations, is faster for some applications, and can take advantage of highly optimized deep learning frameworks since it can be formulated as a minimization task. In the case of complex posteriors, however, other state-of-the-art BBVI approaches often yield unsatisfactory posterior approximations. This paper presents Bernstein flow variational inference (BF-VI), a robust and easy-to-use method flexible enough to approximate complex multivariate posteriors. BF-VI combines ideas from normalizing flows and Bernstein polynomial-based transformation models. In benchmark experiments, we compare BF-VI solutions with exact posteriors, MCMC solutions, and state-of-the-art BBVI methods, including normalizing flow-based BBVI. We show for low-dimensional models that BF-VI accurately approximates the true posterior; in higher-dimensional models, BF-VI compares favorably against other BBVI methods. Further, using BF-VI, we develop a Bayesian model for the semi-structured melanoma challenge data, combining a CNN model part for image data with an interpretable model part for tabular data, and demonstrate, for the first time, the use of BBVI in semi-structured models.
Die Frage „Wozu braucht man das?“ vonseiten der Studierenden oder Aussagen wie „Das habe ich im Beruf später nie mehr benötigt.“ von ehemaligen Studierenden ist den meisten Mathematikdozierenden sehr vertraut. Im Projekt BiLeSA wird dem Wunsch nach Integration von Praxisnähe im Mathematikunterricht mithilfe einer Smartphone-App, welche ausgewählte Themen in der Mathematik anhand von digitalen Bildern sichtbar macht, umgesetzt. Bei den ausgewählten Themen handelt es sich um (affin) lineare Abbildungen, Ableitungen in höheren Raumdimensionen und Potenzen von Komplexen Zahlen. Die Konzeptionierung des Lernobjekts erfolgte mit dem Design Based Research (DBR) Ansatz, welches im Basisprojekt des IBH-Labs „Seamless Learning“ konzipiert und entwickelt wurde.
Rheumatoid arthritis is an autoimmune disease that causes chronic inflammation of synovial joints, often resulting in irreversible structural damage. The activity of the disease is evaluated by clinical examinations, laboratory tests, and patient self-assessment. The long-term course of the disease is assessed with radiographs of hands and feet. The evaluation of the X-ray images performed by trained medical staff requires several minutes per patient. We demonstrate that deep convolutional neural networks can be leveraged for a fully automated, fast, and reproducible scoring of X-ray images of patients with rheumatoid arthritis. A comparison of the predictions of different human experts and our deep learning system shows that there is no significant difference in the performance of human experts and our deep learning model.
Deep neural networks have become a veritable alternative to classic speaker recognition and clustering methods in recent years. However, while the speech signal clearly is a time series, and despite the body of literature on the benefits of prosodic (suprasegmental) features, identifying voices has usually not been approached with sequence learning methods. Only recently has a recurrent neural network (RNN) been successfully applied to this task, while the use of convolutional neural networks (CNNs) (that are not able to capture arbitrary time dependencies, unlike RNNs) still prevails. In this paper, we show the effectiveness of RNNs for speaker recognition by improving state of the art speaker clustering performance and robustness on the classic TIMIT benchmark. We provide arguments why RNNs are superior by experimentally showing a “sweet spot” of the segment length for successfully capturing prosodic information that has been theoretically predicted in previous work.
Three-dimensional ship localization with only one camera is a challenging task due to the loss of depth information caused by perspective projection. In this paper, we propose a method to measure distances based on the assumption that ships lie on a flat surface. This assumption allows to recover depth from a single image using the principle of inverse perspective. For the 3D ship detection task, we use a hybrid approach that combines image detection with a convolutional neural network, camera geometry and inverse perspective. Furthermore, a novel calculation of object height is introduced. Experiments show that the monocular distance computation works well in comparison to a Velodyne lidar. Due to its robustness, this could be an easy-to-use baseline method for detection tasks in navigation systems.
Wer schon einmal dicht gedrängt vor der Konzertbühne stand kann sich die aussichtslose Lage, wenn die Stimmung kippt und Panik aufkommt, gut vorstellen. Es ist sehr wichtig, Räume und Events, die zeitweise von sehr vielen Menschen aufgesucht werden, so zu gestalten und zu planen, dass maximale Sicherheit gewährleistet ist. Damit eine öffentliche Veranstaltung reibungslos verläuft ist eine gründliche Planung, also ein qualitativ hochwertiges Crowd Management unabdingbar.
In this paper we present a method using deep learning to compute parametrizations for B-spline curve approximation. Existing methods consider the computation of parametric values and a knot vector as separate problems. We propose to train interdependent deep neural networks to predict parametric values and knots. We show that it is possible to include B-spline curve approximation directly into the neural network architecture. The resulting parametrizations yield tight approximations and are able to outperform state-of-the-art methods.