Refine
Year of publication
Document Type
- Conference Proceeding (19)
- Article (9)
- Part of a Book (3)
- Doctoral Thesis (3)
- Master's Thesis (2)
- Bachelor Thesis (1)
- Book (1)
- Preprint (1)
- Report (1)
Keywords
Institute
- Institut für Optische Systeme - IOS (40) (remove)
Visualization-Assisted Development of Deep Learning Models in Offline Handwriting Recognition
(2018)
Deep learning is a field of machine learning that has been the focus of active research and successful applications in recent years. Offline handwriting recognition is one of the research fields and applications were deep neural networks have shown high accuracy. Deep learning models and their training pipeline show a large amount of hyper-parameters in their data selection, transformation, network topology and training process that are sometimes interdependent. This increases the overall difficulty and time necessary for building and training a model for a specific data set and task at hand. This work proposes a novel visualization-assisted workflow that guides the model developer through the hyper-parameter search in order to identify relevant parameters and modify them in a meaningful way. This decreases the overall time necessary for building and training a model. The contributions of this work are a workflow for hyper-parameter search in offline handwriting recognition and a heat map based visualization technique for deep neural networks in multi-line offline handwriting recognition. This work applies to offline handwriting recognition, but the general workflow can possibly be adapted to other tasks as well.
Motion estimation is an essential element for autonomous vessels. It is used e.g. for lidar motion compensation as well as mapping and detection tasks in a maritime environment. Because the use of gyroscopes is not reliable and a high performance inertial measurement unit is quite expensive, we present an approach for visual pitch and roll estimation that utilizes a convolutional neural network for water segmentation, a stereo system for reconstruction and simple geometry to estimate pitch and roll. The algorithm is validated on a novel, publicly available dataset recorded at Lake Constance. Our experiments show that the pitch and roll estimator provides accurate results in comparison to an Xsens IMU sensor. We can further improve the pitch and roll estimation by sensor fusion with a gyroscope. The algorithm is available in its implementation as a ROS node.
Das Projekt eFlow, an dem unter anderem die HTWG Konstanz seit 2012 forscht, simuliert mit Hilfe einer mathematischen Simulation wie sich Menschenmassen verhalten, wenn sie ein vorgegebenes Gelände verlassen sollen. Die Simulation baut auf einen Ansatz der Finite Elemente Methode auf, in der mehrere gekoppelte Differenzialgleichungen berechnet werden müssen. Diese Berechnungen erweisen sich gerade bei komplexen Szenarien mit großem Gelände und vielen Personen als sehr rechenintensiv. Ziel dieser Bachelorarbeit ist es ein Surrogate Modell zu erstellen, welches basierend auf machine-learning Ansätzen im spezifischen auf Regressionsmethoden Ergebnisse der Simulation vorhersagen soll. Somit müssen Datensätze generiert werden. Diese entstehen durch wiederholte Durchläufe der Simulation, in der jeweils die Eingabeparameter, die in das Regressionsmodell einfließen sollen variiert werden und mit dem entsprechenden Ergebnis der Simulation verknüpft werden. Die Regressionsansätze werden dabei pro Durchlauf komplexer, in dem jeweils zusätzliche Eingabeparameter mit in die Datengenerierung aufgenommen werden. Es soll überprüft werden, ob diese Simulation mittels machine-learning Ansätzen reproduzierbar ist. Basierend auf diesen Surrogate Modellen soll es möglich gemacht werden, Situationen in Echtzeit zu überprüfen, ohne dabei den Weg der rechenaufwendigen Simulation zu gehen. Die Ergebnisse bestätigen, dass die mathematische Simulation mittels Regression reproduzierbar ist. Es erweist sich jedoch als sehr rechenaufwendig, Daten zu sammeln, um genügend Eingabeparameter mit in die Regressionsmethode einfließen zu lassen. Diese Arbeit gestaltet somit eine Vorstudie zur Umsetzung eines ausgereiften Surrogate Modells, welches jegliche Eingabeparameter der Simulation berücksichtigen kann.
The main challenge in Bayesian models is to determine the posterior for the model parameters. Already, in models with only one or few parameters, the analytical posterior can only be determined in special settings. In Bayesian neural networks, variational inference is widely used to approximate difficult-to-compute posteriors by variational distributions. Usually, Gaussians are used as variational distributions (Gaussian-VI) which limits the quality of the approximation due to their limited flexibility. Transformation models on the other hand are flexible enough to fit any distribution. Here we present transformation model-based variational inference (TM-VI) and demonstrate that it allows to accurately approximate complex posteriors in models with one parameter and also works in a mean-field fashion for multi-parameter models like neural networks.
Targetless Lidar-camera registration is a repeating task in many computer vision and robotics applications and requires computing the extrinsic pose of a point cloud with respect to a camera or vice-versa. Existing methods based on learning or optimization lack either generalization capabilities or accuracy. Here, we propose a combination of pre-training and optimization using a neural network-based mutual information estimation technique (MINE [1]). This construction allows back-propagating the gradient to the calibration parameters and enables stochastic gradient descent. To ensure orthogonality constraints with respect to the rotation matrix we incorporate Lie-group techniques. Furthermore, instead of optimizing on entire images, we operate on local patches that are extracted from the temporally synchronized projected Lidar points and camera frames. Our experiments show that this technique not only improves over existing techniques in terms of accuracy, but also shows considerable generalization capabilities towards new Lidar-camera configurations.
Deep neural networks (DNNs) are known for their high prediction performance, especially in perceptual tasks such as object recognition or autonomous driving. Still, DNNs are prone to yield unreliable predictions when encountering completely new situations without indicating their uncertainty. Bayesian variants of DNNs (BDNNs), such as MC dropout BDNNs, do provide uncertainty measures. However, BDNNs are slow during test time because they rely on a sampling approach. Here we present a single shot MC dropout approximation that preserves the advantages of BDNNs without being slower than a DNN. Our approach is to analytically approximate for each layer in a fully connected network the expected value and the variance of the MC dropout signal. We evaluate our approach on different benchmark datasets and a simulated toy example. We demonstrate that our single shot MC dropout approximation resembles the point estimate and the uncertainty estimate of the predictive distribution that is achieved with an MC approach, while being fast enough for real-time deployments of BDNNs.
Particularly for manufactured products subject to aesthetic evaluation, the industrial manufacturing process must be monitored, and visual defects detected. For this purpose, more and more computer vision-integrated inspection systems are being used. In optical inspection based on cameras or range scanners, only a few examples are typically known before novel examples are inspected. Consequently, no large data set of non-defective and defective examples could be used to train a classifier, and methods that work with limited or weak supervision must be applied. For such scenarios, I propose new data-efficient machine learning approaches based on one-class learning that reduce the need for supervision in industrial computer vision tasks. The developed novelty detection model automatically extracts features from the input images and is trained only on available non-defective reference data. On top of the feature extractor, a one-class classifier based on recent developments in deep learning is placed. I evaluate the novelty detector in an industrial inspection scenario and state-of-the-art benchmarks from the machine learning community. In the second part of this work, the model gets improved by using a small number of novel defective examples, and hence, another source of supervision gets incorporated. The targeted real-world inspection unit is based on a camera array and a flashing light illumination, allowing inline capturing of multichannel images at a high rate. Optionally, the integration of range data, such as laser or Lidar signals, is possible by using the developed targetless data fusion method.
Forecasting is crucial for both system planning and operations in the energy sector. With increasing penetration of renewable energy sources, increasing fluctuations in the power generation need to be taken into account. Probabilistic load forecasting is a young, but emerging research topic focusing on the prediction of future uncertainties. However, the majority of publications so far focus on techniques like quantile regression, ensemble, or scenario-based methods, which generate discrete quantiles or sets of possible load curves. The conditioned probability distribution remains unknown and can only be estimated when the output is post-processed using a statistical method like kernel density estimation.
Instead, the proposed probabilistic deep learning model uses a cascade of transformation functions, known as normalizing flow, to model the conditioned density function from a smart meter dataset containing electricity demand information for over 4,000 buildings in Ireland. Since the whole probability density function is tractable, the parameters of the model can be obtained by minimizing the negative loglikelihood through the state of the art gradient descent. This leads to the model with the best representation of the data distribution.
Two different deep learning models have been compared, a simple three-layer fully connected neural network and a more advanced convolutional neural network for sequential data processing inspired by the WaveNet architecture. These models have been used to parametrize three different probabilistic models, a simple normal distribution, a Gaussian mixture model, and the normalizing flow model. The prediction horizon is set to one day with a resolution of 30 minutes, hence the models predict 48 conditioned probability distributions.
The normalizing flow model outperforms the two other variants for both architectures and proves its ability to capture the complex structures and dependencies causing the variations in the data. Understanding the stochastic nature of the task in such detail makes the methodology applicable for other use cases apart from forecasting. It is shown how it can be used to detect anomalies in the power grid or generate synthetic scenarios for grid planning.
Probabilistic Deep Learning
(2020)
Probabilistic Deep Learning is a hands-on guide to the principles that support neural networks. Learn to improve network performance with the right distribution for different data types, and discover Bayesian variants that can state their own uncertainty to increase accuracy. This book provides easy-to-apply code and uses popular frameworks to keep you focused on practical applications.
Interpretability and uncertainty modeling are important key factors for medical applications. Moreover, data in medicine are often available as a combination of unstructured data like images and structured predictors like patient’s metadata. While deep learning models are state-of-the-art for image classification, the models are often referred to as ’black-box’, caused by the lack of interpretability. Moreover, DL models are often yielding point predictions and are too confident about the parameter estimation and outcome predictions.
On the other side with statistical regression models, it is possible to obtain interpretable predictor effects and capture parameter and model uncertainty based on the Bayesian approach. In this thesis, a publicly available melanoma dataset, consisting of skin lesions and patient’s age, is used to predict the melanoma types by using a semi-structured model, while interpretable components and model uncertainty is quantified. For Bayesian models, transformation model-based variational inference (TM-VI) method is used to determine the posterior distribution of the parameter. Several model constellations consisting of patient’s age and/or skin lesion were implemented and evaluated. Predictive performance was shown to be best by using a combined model of image and patient’s age, while providing the interpretable posterior distribution of the regression coefficient is possible. In addition, integrating uncertainty in image and tabular parts results in larger variability of the outputs corresponding to high uncertainty of the single model components.
Optical surface inspection: A novelty detection approach based on CNN-encoded texture features
(2018)
In inspection systems for textured surfaces, a reference texture is typically known before novel examples are inspected. Mostly, the reference is only available in a digital format. As a consequence, there is no dataset of defective examples available that could be used to train a classifier. We propose a texture model approach to novelty detection. The texture model uses features encoded by a convolutional neural network (CNN) trained on natural image data. The CNN activations represent the specific characteristics of the digital reference texture which are learned by a one-class classifier. We evaluate our novelty detector in a digital print inspection scenario. The inspection unit is based on a camera array and a flashing light illumination which allows for inline capturing of multichannel images at a high rate. In order to compare our results to manual inspection, we integrated our inspection unit into an industrial single-pass printing system.
Offline handwriting recognition systems often use LSTM networks, trained with line- or word-images. Multi-line text makes it necessary to use segmentation to explicitly obtain these images. Skewed, curved, overlapping, incorrectly written text, or noise can lead to errors during segmentation of multi-line text and reduces the overall recognition capacity of the system. Last year has seen the introduction of deep learning methods capable of segmentation-free recognition of whole paragraphs. Our method uses Conditional Random Fields to represent text and align it with the network output to calculate a loss function for training. Experiments are promising and show that the technique is capable of training a LSTM multi-line text recognition system.
Pascal Laube presents machine learning approaches for three key problems of reverse engineering of defective structured surfaces: parametrization of curves and surfaces, geometric primitive classification and inpainting of high-resolution textures. The proposed methods aim to improve the reconstruction quality while further automating the process. The contributions demonstrate that machine learning can be a viable part of the CAD reverse engineering pipeline.
Algorithms for calculating the string edit distance are used in e.g. information retrieval and document analysis systems or for evaluation of text recognizers. Text recognition based on CTC-trained LSTM networks includes a decoding step to produce a string, possibly using a language model, and evaluation using the string edit distance. The decoded string can further be used as a query for database search, e.g. in document retrieval. We propose to closely integrate dictionary search with text recognition to train both combined in a continuous fashion. This work shows that LSTM networks are capable of calculating the string edit distance while allowing for an exchangeable dictionary to separate learned algorithm from data. This could be a step towards integrating text recognition and dictionary search in one deep network.
Knot placement for curve approximation is a well known and yet open problem in geometric modeling. Selecting knot values that yield good approximations is a challenging task, based largely on heuristics and user experience. More advanced approaches range from parametric averaging to genetic algorithms.
In this paper, we propose to use Support Vector Machines (SVMs) to determine suitable knot vectors for B-spline curve approximation. The SVMs are trained to identify locations in a sequential point cloud where knot placement will improve the approximation error. After the training phase, the SVM can assign, to each point set location, a so-called score. This score is based on geometric and differential geometric features of points. It measures the quality of each location to be used as knots in the subsequent approximation. From these scores, the final knot vector can be constructed exploring the topography of the score-vector without the need for iteration or optimization in the approximation process. Knot vectors computed with our approach outperform state of the art methods and yield tighter approximations.
We propose a novel end-to-end neural network architecture that, once trained, directly outputs a probabilistic clustering of a batch of input examples in one pass. It estimates a distribution over the number of clusters k, and for each 1≤k≤kmax, a distribution over the individual cluster assignment for each data point. The network is trained in advance in a supervised fashion on separate data to learn grouping by any perceptual similarity criterion based on pairwise labels (same/different group). It can then be applied to different data containing different groups. We demonstrate promising performance on high-dimensional data like images (COIL-100) and speech (TIMIT). We call this “learning to cluster” and show its conceptual difference to deep metric learning, semi-supervise clustering and other related approaches while having the advantage of performing learnable clustering fully end-to-end.
Know when you don't know
(2018)
Deep convolutional neural networks show outstanding performance in image-based phenotype classification given that all existing phenotypes are presented during the training of the network. However, in real-world high-content screening (HCS) experiments, it is often impossible to know all phenotypes in advance. Moreover, novel phenotype discovery itself can be an HCS outcome of interest. This aspect of HCS is not yet covered by classical deep learning approaches. When presenting an image with a novel phenotype to a trained network, it fails to indicate a novelty discovery but assigns the image to a wrong phenotype. To tackle this problem and address the need for novelty detection, we use a recently developed Bayesian approach for deep neural networks called Monte Carlo (MC) dropout to define different uncertainty measures for each phenotype prediction. With real HCS data, we show that these uncertainty measures allow us to identify novel or unclear phenotypes. In addition, we also found that the MC dropout method results in a significant improvement of classification accuracy. The proposed procedure used in our HCS case study can be easily transferred to any existing network architecture and will be beneficial in terms of accuracy and novelty detection.
At present, the majority of the proposed Deep Learning (DL) methods provide point predictions without quantifying the model's uncertainty. However, a quantification of the reliability of automated image analysis is essential, in particular in medicine when physicians rely on the results for making critical treatment decisions. In this work, we provide an entire framework to diagnose ischemic stroke patients incorporating Bayesian uncertainty into the analysis procedure. We present a Bayesian Convolutional Neural Network (CNN) yielding a probability for a stroke lesion on 2D Magnetic Resonance (MR) images with corresponding uncertainty information about the reliability of the prediction. For patient-level diagnoses, different aggregation methods are proposed and evaluated, which combine the individual image-level predictions. Those methods take advantage of the uncertainty in the image predictions and report model uncertainty at the patient-level. In a cohort of 511 patients, our Bayesian CNN achieved an accuracy of 95.33% at the image-level representing a significant improvement of 2% over a non-Bayesian counterpart. The best patient aggregation method yielded 95.89% of accuracy. Integrating uncertainty information about image predictions in aggregation models resulted in higher uncertainty measures to false patient classifications, which enabled to filter critical patient diagnoses that are supposed to be closer examined by a medical doctor. We therefore recommend using Bayesian approaches not only for improved image-level prediction and uncertainty estimation but also for the detection of uncertain aggregations at the patient-level.
Incremental one-class learning using regularized null-space training for industrial defect detection
(2024)
One-class incremental learning is a special case of class-incremental learning, where only a single novel class is incrementally added to an existing classifier instead of multiple classes. This case is relevant in industrial defect detection scenarios, where novel defects usually appear during operation. Existing rolled-out classifiers must be updated incrementally in this scenario with only a few novel examples. In addition, it is often required that the base classifier must not be altered due to approval and warranty restrictions. While simple finetuning often gives the best performance across old and new classes, it comes with the drawback of potentially losing performance on the base classes (catastrophic forgetting [1]). Simple prototype approaches [2] work without changing existing weights and perform very well when the classes are well separated but fail dramatically when not. In theory, null-space training (NSCL) [3] should retain the basis classifier entirely, as parameter updates are restricted to the null space of the network with respect to existing classes. However, as we show, this technique promotes overfitting in the case of one-class incremental learning. In our experiments, we found that unconstrained weight growth in null space is the underlying issue, leading us to propose a regularization term (R-NSCL) that penalizes the magnitude of amplification. The regularization term is added to the standard classification loss and stabilizes null-space training in the one-class scenario by counteracting overfitting. We test the method’s capabilities on two industrial datasets, namely AITEX and MVTec, and compare the performance to state-of-the-art algorithms for class-incremental learning.
Deep neural networks have been successfully applied to problems such as image segmentation, image super-resolution, coloration and image inpainting. In this work we propose the use of convolutional neural networks (CNN) for image inpainting of large regions in high-resolution textures. Due to limited computational resources processing high-resolution images with neural networks is still an open problem. Existing methods separate inpainting of global structure and the transfer of details, which leads to blurry results and loss of global coherence in the detail transfer step. Based on advances in texture synthesis using CNNs we propose patch-based image inpainting by a single network topology that is able to optimize for global as well as detail texture statistics. Our method is capable of filling large inpainting regions, oftentimes exceeding quality of comparable methods for images of high-resolution (2048x2048px). For reference patch look-up we propose to use the same summary statistics that are used in the inpainting process.