Refine
Document Type
- Conference Proceeding (9)
- Other Publications (2)
- Doctoral Thesis (1)
Keywords
Institute
Increasing robustness of handwriting recognition using character N-Gram decoding on large lexica
(2016)
Offline handwriting recognition systems often include a decoding step, that is retrieving the most likely character sequence from the underlying machine learning algorithm. Decoding is sensitive to ranges of weakly predicted characters, caused e.g. by obstructions in the scanned document. We present a new algorithm for robust decoding of handwriting recognizer outputs using character n-grams. Multidimensional hierarchical subsampling artificial neural networks with Long-Short-Term-Memory cells have been successfully applied to offline handwriting recognition. Output activations from such networks, trained with Connectionist Temporal Classification, can be decoded with several different algorithms in order to retrieve the most likely literal string that it represents. We present a new algorithm for decoding the network output while restricting the possible strings to a large lexicon. The index used for this work is an n-gram index with tri-grams used for experimental comparisons. N-grams are extracted from the network output using a backtracking algorithm and each n-gram assigned a mean probability. The decoding result is obtained by intersecting the n-gram hit lists while calculating the total probability for each matched lexicon entry. We conclude with an experimental comparison of different decoding algorithms on a large lexicon.
Recent years have seen the proposal of several different gradient-based optimization methods for training artificial neural networks. Traditional methods include steepest descent with momentum, newer methods are based on per-parameter learning rates and some approximate Newton-step updates. This work contains the result of several experiments comparing different optimization methods. The experiments were targeted at offline handwriting recognition using hierarchical subsampling networks with recurrent LSTM layers. We present an overview of the used optimization methods, the results that were achieved and a discussion of why the methods lead to different results.
Digital cameras are subject to physical, electronic and optic effects that result in errors and noise in the image. These effects include for example a temperature dependent dark current, read noise, optical vignetting or different sensitivities of individual pixels. The task of a radiometric calibration is to reduce these errors in the image and thus improve the quality of the overall application. In this work we present an algorithm for radiometric calibration based on Gaussian processes. Gaussian processes are a regression method widely used in machine learning that is particularly useful in our context. Then Gaussian process regression is used to learn a temperature and exposure time dependent mapping from observed gray-scale values to true light intensities for each pixel. Regression models based on the characteristics of single pixels suffer from excessively high runtime and thus are unsuitable for many practical applications. In contrast, a single regression model for an entire image with high spatial resolution leads to a low quality radiometric calibration, which also limits its practical use. The proposed algorithm is predicated on a partitioning of the pixels such that each pixel partition can be represented by one single regression model without quality loss. Partitioning is done by extracting features from the characteristic of each pixel and using them for lexicographic sorting. Splitting the sorted data into partitions with equal size yields the final partitions, each of which is represented by the partition centers. An individual Gaussian process regression and model selection is done for each partition. Calibration is performed by interpolating the gray-scale value of each pixel with the regression model of the respective partition. The experimental comparison of the proposed approach to classical flat field calibration shows a consistently higher reconstruction quality for the same overall number of calibration frames.
The detection of differences between images of a printed reference and a reprinted wood decor often requires an initial image registration step. Depending on the digitalization method, the reprint will be displaced and rotated with respect to the reference. The aim of registration is to match the images as precisely as possible. In our approach, images are first matched globally by extracting feature points from both images and finding corresponding point pairs using the RANSAC algorithm. From these correspondences, we compute a global projective transformation between both images. In order to get a pixel-wise registration, we train a learning machine on the point correspondences found by RANSAC. The learning algorithm (in our case Gaussian process regression) is used to nonlinearly interpolate between the feature points which results in a high precision image registration method on wood decors.
Visualization-Assisted Development of Deep Learning Models in Offline Handwriting Recognition
(2018)
Deep learning is a field of machine learning that has been the focus of active research and successful applications in recent years. Offline handwriting recognition is one of the research fields and applications were deep neural networks have shown high accuracy. Deep learning models and their training pipeline show a large amount of hyper-parameters in their data selection, transformation, network topology and training process that are sometimes interdependent. This increases the overall difficulty and time necessary for building and training a model for a specific data set and task at hand. This work proposes a novel visualization-assisted workflow that guides the model developer through the hyper-parameter search in order to identify relevant parameters and modify them in a meaningful way. This decreases the overall time necessary for building and training a model. The contributions of this work are a workflow for hyper-parameter search in offline handwriting recognition and a heat map based visualization technique for deep neural networks in multi-line offline handwriting recognition. This work applies to offline handwriting recognition, but the general workflow can possibly be adapted to other tasks as well.