Increasing robustness of handwriting recognition using character N-Gram decoding on large lexica
(2016)
Offline handwriting recognition systems often include a decoding step, that is retrieving the most likely character sequence from the underlying machine learning algorithm. Decoding is sensitive to ranges of weakly predicted characters, caused e.g. by obstructions in the scanned document. We present a new algorithm for robust decoding of handwriting recognizer outputs using character n-grams. Multidimensional hierarchical subsampling artificial neural networks with Long-Short-Term-Memory cells have been successfully applied to offline handwriting recognition. Output activations from such networks, trained with Connectionist Temporal Classification, can be decoded with several different algorithms in order to retrieve the most likely literal string that it represents. We present a new algorithm for decoding the network output while restricting the possible strings to a large lexicon. The index used for this work is an n-gram index with tri-grams used for experimental comparisons. N-grams are extracted from the network output using a backtracking algorithm and each n-gram assigned a mean probability. The decoding result is obtained by intersecting the n-gram hit lists while calculating the total probability for each matched lexicon entry. We conclude with an experimental comparison of different decoding algorithms on a large lexicon.
Offline handwriting recognition systems often use LSTM networks, trained with line- or word-images. Multi-line text makes it necessary to use segmentation to explicitly obtain these images. Skewed, curved, overlapping, incorrectly written text, or noise can lead to errors during segmentation of multi-line text and reduces the overall recognition capacity of the system. Last year has seen the introduction of deep learning methods capable of segmentation-free recognition of whole paragraphs. Our method uses Conditional Random Fields to represent text and align it with the network output to calculate a loss function for training. Experiments are promising and show that the technique is capable of training a LSTM multi-line text recognition system.
Multi-Dimensional Connectionist Classification is amethod for weakly supervised training of Deep Neural Networksfor segmentation-free multi-line offline handwriting recognition.MDCC applies Conditional Random Fields as an alignmentfunction for this task. We discuss the structure and patterns ofhandwritten text that can be used for building a CRF. Since CRFsare cyclic graphical models, we have to resort to approximateinference when calculating the alignment of multi-line text duringtraining, here in the form of Loopy Belief Propagation. This workconcludes with experimental results for transcribing small multi-line samples from the IAM Offline Handwriting DB which showthat MDCC is a competitive methodology.
Recent years have seen the proposal of several different gradient-based optimization methods for training artificial neural networks. Traditional methods include steepest descent with momentum, newer methods are based on per-parameter learning rates and some approximate Newton-step updates. This work contains the result of several experiments comparing different optimization methods. The experiments were targeted at offline handwriting recognition using hierarchical subsampling networks with recurrent LSTM layers. We present an overview of the used optimization methods, the results that were achieved and a discussion of why the methods lead to different results.
Visualization-Assisted Development of Deep Learning Models in Offline Handwriting Recognition
(2018)
Deep learning is a field of machine learning that has been the focus of active research and successful applications in recent years. Offline handwriting recognition is one of the research fields and applications were deep neural networks have shown high accuracy. Deep learning models and their training pipeline show a large amount of hyper-parameters in their data selection, transformation, network topology and training process that are sometimes interdependent. This increases the overall difficulty and time necessary for building and training a model for a specific data set and task at hand. This work proposes a novel visualization-assisted workflow that guides the model developer through the hyper-parameter search in order to identify relevant parameters and modify them in a meaningful way. This decreases the overall time necessary for building and training a model. The contributions of this work are a workflow for hyper-parameter search in offline handwriting recognition and a heat map based visualization technique for deep neural networks in multi-line offline handwriting recognition. This work applies to offline handwriting recognition, but the general workflow can possibly be adapted to other tasks as well.
Digital cameras are subject to physical, electronic and optic effects that result in errors and noise in the image. These effects include for example a temperature dependent dark current, read noise, optical vignetting or different sensitivities of individual pixels. The task of a radiometric calibration is to reduce these errors in the image and thus improve the quality of the overall application. In this work we present an algorithm for radiometric calibration based on Gaussian processes. Gaussian processes are a regression method widely used in machine learning that is particularly useful in our context. Then Gaussian process regression is used to learn a temperature and exposure time dependent mapping from observed gray-scale values to true light intensities for each pixel. Regression models based on the characteristics of single pixels suffer from excessively high runtime and thus are unsuitable for many practical applications. In contrast, a single regression model for an entire image with high spatial resolution leads to a low quality radiometric calibration, which also limits its practical use. The proposed algorithm is predicated on a partitioning of the pixels such that each pixel partition can be represented by one single regression model without quality loss. Partitioning is done by extracting features from the characteristic of each pixel and using them for lexicographic sorting. Splitting the sorted data into partitions with equal size yields the final partitions, each of which is represented by the partition centers. An individual Gaussian process regression and model selection is done for each partition. Calibration is performed by interpolating the gray-scale value of each pixel with the regression model of the respective partition. The experimental comparison of the proposed approach to classical flat field calibration shows a consistently higher reconstruction quality for the same overall number of calibration frames.
Algorithms for calculating the string edit distance are used in e.g. information retrieval and document analysis systems or for evaluation of text recognizers. Text recognition based on CTC-trained LSTM networks includes a decoding step to produce a string, possibly using a language model, and evaluation using the string edit distance. The decoded string can further be used as a query for database search, e.g. in document retrieval. We propose to closely integrate dictionary search with text recognition to train both combined in a continuous fashion. This work shows that LSTM networks are capable of calculating the string edit distance while allowing for an exchangeable dictionary to separate learned algorithm from data. This could be a step towards integrating text recognition and dictionary search in one deep network.
In this paper we present a method using deep learning to compute parametrizations for B-spline curve approximation. Existing methods consider the computation of parametric values and a knot vector as separate problems. We propose to train interdependent deep neural networks to predict parametric values and knots. We show that it is possible to include B-spline curve approximation directly into the neural network architecture. The resulting parametrizations yield tight approximations and are able to outperform state-of-the-art methods.
Deep 3D
(2017)
In the reverse engineering process one has to classify parts of point clouds with the correct type of geometric primitive. Features based on different geometric properties like point relations, normals, and curvature information can be used, to train classifiers like Support Vector Machines (SVM). These geometric features are estimated in the local neighborhood of a point of the point cloud. The multitude of different features makes an in-depth comparison necessary. In this work we evaluate 23 features for the classification of geometric primitives in point clouds. Their performance is evaluated on SVMs when used to classify geometric primitives in simulated and real laser scanned point clouds. We also introduce a normalization of point cloud density to improve classification generalization.
Deep neural networks have been successfully applied to problems such as image segmentation, image super-resolution, coloration and image inpainting. In this work we propose the use of convolutional neural networks (CNN) for image inpainting of large regions in high-resolution textures. Due to limited computational resources processing high-resolution images with neural networks is still an open problem. Existing methods separate inpainting of global structure and the transfer of details, which leads to blurry results and loss of global coherence in the detail transfer step. Based on advances in texture synthesis using CNNs we propose patch-based image inpainting by a single network topology that is able to optimize for global as well as detail texture statistics. Our method is capable of filling large inpainting regions, oftentimes exceeding quality of comparable methods for images of high-resolution (2048x2048px). For reference patch look-up we propose to use the same summary statistics that are used in the inpainting process.
Incremental one-class learning using regularized null-space training for industrial defect detection
(2024)
One-class incremental learning is a special case of class-incremental learning, where only a single novel class is incrementally added to an existing classifier instead of multiple classes. This case is relevant in industrial defect detection scenarios, where novel defects usually appear during operation. Existing rolled-out classifiers must be updated incrementally in this scenario with only a few novel examples. In addition, it is often required that the base classifier must not be altered due to approval and warranty restrictions. While simple finetuning often gives the best performance across old and new classes, it comes with the drawback of potentially losing performance on the base classes (catastrophic forgetting [1]). Simple prototype approaches [2] work without changing existing weights and perform very well when the classes are well separated but fail dramatically when not. In theory, null-space training (NSCL) [3] should retain the basis classifier entirely, as parameter updates are restricted to the null space of the network with respect to existing classes. However, as we show, this technique promotes overfitting in the case of one-class incremental learning. In our experiments, we found that unconstrained weight growth in null space is the underlying issue, leading us to propose a regularization term (R-NSCL) that penalizes the magnitude of amplification. The regularization term is added to the standard classification loss and stabilizes null-space training in the one-class scenario by counteracting overfitting. We test the method’s capabilities on two industrial datasets, namely AITEX and MVTec, and compare the performance to state-of-the-art algorithms for class-incremental learning.
We are interested in computing a mini-batch-capable end-to-end algorithm to identify statistically independent components (ICA) in large scale and high-dimensional datasets. Current algorithms typically rely on pre-whitened data and do not integrate the two procedures of whitening and ICA estimation. Our online approach estimates a whitening and a rotation matrix with stochastic gradient descent on centered or uncentered data. We show that this can be done efficiently by combining Batch Karhunen-Löwe-Transformation [1] with Lie group techniques. Our algorithm is recursion-free and can be organized as feed-forward neural network which makes the use of GPU acceleration straight-forward. Because of the very fast convergence of Batch KLT, the gradient descent in the Lie group of orthogonal matrices stabilizes quickly. The optimization is further enhanced by integrating ADAM [2], an improved stochastic gradient descent (SGD) technique from the field of deep learning. We test the scaling capabilities by computing the independent components of the well-known ImageNet challenge (144 GB). Due to its robustness with respect to batch and step size, our approach can be used as a drop-in replacement for standard ICA algorithms where memory is a limiting factor.
Targetless Lidar-camera registration is a repeating task in many computer vision and robotics applications and requires computing the extrinsic pose of a point cloud with respect to a camera or vice-versa. Existing methods based on learning or optimization lack either generalization capabilities or accuracy. Here, we propose a combination of pre-training and optimization using a neural network-based mutual information estimation technique (MINE [1]). This construction allows back-propagating the gradient to the calibration parameters and enables stochastic gradient descent. To ensure orthogonality constraints with respect to the rotation matrix we incorporate Lie-group techniques. Furthermore, instead of optimizing on entire images, we operate on local patches that are extracted from the temporally synchronized projected Lidar points and camera frames. Our experiments show that this technique not only improves over existing techniques in terms of accuracy, but also shows considerable generalization capabilities towards new Lidar-camera configurations.
The detection of anomalous or novel images given a training dataset of only clean reference data (inliers) is an important task in computer vision. We propose a new shallow approach that represents both inlier and outlier images as ensembles of patches, which allows us to effectively detect novelties as mean shifts between reference data and outliers with the Hotelling T2 test. Since mean-shift can only be detected when the outlier ensemble is sufficiently separate from the typical set of the inlier distribution, this typical set acts as a blind spot for novelty detection. We therefore minimize its estimated size as our selection rule for critical hyperparameters, such as, e.g., the size of the patches is crucial. To showcase the capabilities of our approach, we compare results with classical and deep learning methods on the popular datasets MNIST and CIFAR-10, and demonstrate its real-world applicability in a large-scale industrial inspection scenario.
The detection of differences between images of a printed reference and a reprinted wood decor often requires an initial image registration step. Depending on the digitalization method, the reprint will be displaced and rotated with respect to the reference. The aim of registration is to match the images as precisely as possible. In our approach, images are first matched globally by extracting feature points from both images and finding corresponding point pairs using the RANSAC algorithm. From these correspondences, we compute a global projective transformation between both images. In order to get a pixel-wise registration, we train a learning machine on the point correspondences found by RANSAC. The learning algorithm (in our case Gaussian process regression) is used to nonlinearly interpolate between the feature points which results in a high precision image registration method on wood decors.
Optical surface inspection: A novelty detection approach based on CNN-encoded texture features
(2018)
In inspection systems for textured surfaces, a reference texture is typically known before novel examples are inspected. Mostly, the reference is only available in a digital format. As a consequence, there is no dataset of defective examples available that could be used to train a classifier. We propose a texture model approach to novelty detection. The texture model uses features encoded by a convolutional neural network (CNN) trained on natural image data. The CNN activations represent the specific characteristics of the digital reference texture which are learned by a one-class classifier. We evaluate our novelty detector in a digital print inspection scenario. The inspection unit is based on a camera array and a flashing light illumination which allows for inline capturing of multichannel images at a high rate. In order to compare our results to manual inspection, we integrated our inspection unit into an industrial single-pass printing system.
Digital cameras are used in a large variety of scientific and industrial applications. For most applications the acquired data should represent the real light intensity per pixel as accurately as possible. However, digital cameras are subject to different sources of noise which distort the resulting image. Noise includes photon noise, fixed pattern noise and read noise. The aim of the radiometric calibration is to improve the quality of the resulting images by reducing the influence of the different types of noise on the measured data. In this paper, a new approach for the radiometric calibration of digital cameras using sparse Gaussian process regression is presented. Gaussian process regression is a kernel based supervised machine learning technique. It is used to learn the response of a camera system from a set of training images to allow for the calibration of new images. Compared to the standard Gaussian process method or flat field correction our sparse approach allows for faster calibration and higher reconstruction quality.
Digital bedruckte Oberflächen müssen strengen funktionalen und ästhetischen Anforderungen genügen. Diese Eigenschaften werden im Rahmen der Qualitätsprüfung kontrolliert. Hierbei wirken sich Oberflächendefekte oftmals erst dann aus, wenn diese auch vom Menschen wahrgenommen werden. Aufgrund der hohen Produktionsgeschwindigkeit kann eine solche Bewertung der Sichtbarkeit von Defekten bisher nur außerhalb des Produktionsflusses durch manuelle - subjektiv geprägte - Inspektion erfolgen. Ziel des Projektes ist (1) die Modellierung von Texturen in einer Form, die an das menschliche visuelle System angepasst ist und (2) die automatisierte Beurteilung der Wahrnehmung von Texturfehlern. Im Rahmen des Projekts wurde ein prototypisches System zur Inline-Erfassung von texturierten Oberflächen entwickelt. Auf Basis von realen Aufnahmen industriell produzierter Holzdekore wurde eine repräsentative Texturdatenbank erstellt. Gezeigt werden erste Resultate im Bereich der Defektdetektion auf Basis von statistischen Merkmalen. Diese Ergebnisse dienen als Grundlage für die spätere wahrnehmungsorientierte Bewertung. Letztlich sollen die im Rahmen des Projekts erlangten Ergebnisse in einen prototypischen Aufbau zur Inspektion von digital bedruckten Dekoren einfließen.
Motion estimation is an essential element for autonomous vessels. It is used e.g. for lidar motion compensation as well as mapping and detection tasks in a maritime environment. Because the use of gyroscopes is not reliable and a high performance inertial measurement unit is quite expensive, we present an approach for visual pitch and roll estimation that utilizes a convolutional neural network for water segmentation, a stereo system for reconstruction and simple geometry to estimate pitch and roll. The algorithm is validated on a novel, publicly available dataset recorded at Lake Constance. Our experiments show that the pitch and roll estimator provides accurate results in comparison to an Xsens IMU sensor. We can further improve the pitch and roll estimation by sensor fusion with a gyroscope. The algorithm is available in its implementation as a ROS node.
Three-dimensional ship localization with only one camera is a challenging task due to the loss of depth information caused by perspective projection. In this paper, we propose a method to measure distances based on the assumption that ships lie on a flat surface. This assumption allows to recover depth from a single image using the principle of inverse perspective. For the 3D ship detection task, we use a hybrid approach that combines image detection with a convolutional neural network, camera geometry and inverse perspective. Furthermore, a novel calculation of object height is introduced. Experiments show that the monocular distance computation works well in comparison to a Velodyne lidar. Due to its robustness, this could be an easy-to-use baseline method for detection tasks in navigation systems.
Classification of point clouds by different types of geometric primitives is an essential part in the reconstruction process of CAD geometry. We use support vector machines (SVM) to label patches in point clouds with the class labels tori, ellipsoids, spheres, cones, cylinders or planes. For the classification features based on different geometric properties like point normals, angles, and principal curvatures are used. These geometric features are estimated in the local neighborhood of a point of the point cloud. Computing these geometric features for a random subset of the point cloud yields a feature distribution. Different features are combined for achieving best classification results. To minimize the time consuming training phase of SVMs, the geometric features are first evaluated using linear discriminant analysis (LDA).
LDA and SVM are machine learning approaches that require an initial training phase to allow for a subsequent automatic classification of a new data set. For the training phase point clouds are generated using a simulation of a laser scanning device. Additional noise based on an laser scanner error model is added to the point clouds. The resulting LDA and SVM classifiers are then used to classify geometric primitives in simulated and real laser scanned point clouds.
Compared to other approaches, where all known features are used for classification, we explicitly compare novel against known geometric features to prove their effectiveness.
Linear and nonlinear response functions (RF) are extracted for the climate system and the carbon cycle represented by the MPI-ESM and cGENIE models, respectively. Appropriately designed simulations are run for this purpose. Joining these RFs, we have a climate emulator with carbon emissions as the forcing and any desired observable quantity (provided the data is saved), such as the surface air temperature or precipitation, as the predictand. Like e.g. for atmospheric CO2 concentration, we also have RFs for the solar constant as a forcing — mimicking solar radiation management (SRM) geoengineering. We consider two application cases. 1. One is based on the Paris 2015 agreement, determining the necessary least amount of SRM geoengineering needed to keep the global mean surface air temperature below a certain threshold, e.g. 1.5 or 2 [oC], given a certain amount of carbon emission abatement (ABA) and carbon dioxide removal (CDR) geoengineering. 2. The other application considers the conservation of the Greenland ice sheet (GrIS). Using a zero-dimensional simplification of a complex ice sheet model, we determine (a) if we need SRM given some ABA and CDR, and, if possible, (b) the required least amount of SRM to avoid the collapse of the GrIS. Keeping temperatures below 2 [oC] even is hardly possible without sustained SRM (1.); however, the collapse of the GrIS can be avoided applying SRM even for moderate levels of CDR and ABA, an overshoot being affordable (2.).