Refine
Document Type
- Conference Proceeding (9)
- Other Publications (2)
- Doctoral Thesis (1)
Keywords
Institute
Digital cameras are subject to physical, electronic and optic effects that result in errors and noise in the image. These effects include for example a temperature dependent dark current, read noise, optical vignetting or different sensitivities of individual pixels. The task of a radiometric calibration is to reduce these errors in the image and thus improve the quality of the overall application. In this work we present an algorithm for radiometric calibration based on Gaussian processes. Gaussian processes are a regression method widely used in machine learning that is particularly useful in our context. Then Gaussian process regression is used to learn a temperature and exposure time dependent mapping from observed gray-scale values to true light intensities for each pixel. Regression models based on the characteristics of single pixels suffer from excessively high runtime and thus are unsuitable for many practical applications. In contrast, a single regression model for an entire image with high spatial resolution leads to a low quality radiometric calibration, which also limits its practical use. The proposed algorithm is predicated on a partitioning of the pixels such that each pixel partition can be represented by one single regression model without quality loss. Partitioning is done by extracting features from the characteristic of each pixel and using them for lexicographic sorting. Splitting the sorted data into partitions with equal size yields the final partitions, each of which is represented by the partition centers. An individual Gaussian process regression and model selection is done for each partition. Calibration is performed by interpolating the gray-scale value of each pixel with the regression model of the respective partition. The experimental comparison of the proposed approach to classical flat field calibration shows a consistently higher reconstruction quality for the same overall number of calibration frames.
Multi-Dimensional Connectionist Classification is amethod for weakly supervised training of Deep Neural Networksfor segmentation-free multi-line offline handwriting recognition.MDCC applies Conditional Random Fields as an alignmentfunction for this task. We discuss the structure and patterns ofhandwritten text that can be used for building a CRF. Since CRFsare cyclic graphical models, we have to resort to approximateinference when calculating the alignment of multi-line text duringtraining, here in the form of Loopy Belief Propagation. This workconcludes with experimental results for transcribing small multi-line samples from the IAM Offline Handwriting DB which showthat MDCC is a competitive methodology.
The detection of differences between images of a printed reference and a reprinted wood decor often requires an initial image registration step. Depending on the digitalization method, the reprint will be displaced and rotated with respect to the reference. The aim of registration is to match the images as precisely as possible. In our approach, images are first matched globally by extracting feature points from both images and finding corresponding point pairs using the RANSAC algorithm. From these correspondences, we compute a global projective transformation between both images. In order to get a pixel-wise registration, we train a learning machine on the point correspondences found by RANSAC. The learning algorithm (in our case Gaussian process regression) is used to nonlinearly interpolate between the feature points which results in a high precision image registration method on wood decors.
Offline handwriting recognition systems often use LSTM networks, trained with line- or word-images. Multi-line text makes it necessary to use segmentation to explicitly obtain these images. Skewed, curved, overlapping, incorrectly written text, or noise can lead to errors during segmentation of multi-line text and reduces the overall recognition capacity of the system. Last year has seen the introduction of deep learning methods capable of segmentation-free recognition of whole paragraphs. Our method uses Conditional Random Fields to represent text and align it with the network output to calculate a loss function for training. Experiments are promising and show that the technique is capable of training a LSTM multi-line text recognition system.
Increasing robustness of handwriting recognition using character N-Gram decoding on large lexica
(2016)
Offline handwriting recognition systems often include a decoding step, that is retrieving the most likely character sequence from the underlying machine learning algorithm. Decoding is sensitive to ranges of weakly predicted characters, caused e.g. by obstructions in the scanned document. We present a new algorithm for robust decoding of handwriting recognizer outputs using character n-grams. Multidimensional hierarchical subsampling artificial neural networks with Long-Short-Term-Memory cells have been successfully applied to offline handwriting recognition. Output activations from such networks, trained with Connectionist Temporal Classification, can be decoded with several different algorithms in order to retrieve the most likely literal string that it represents. We present a new algorithm for decoding the network output while restricting the possible strings to a large lexicon. The index used for this work is an n-gram index with tri-grams used for experimental comparisons. N-grams are extracted from the network output using a backtracking algorithm and each n-gram assigned a mean probability. The decoding result is obtained by intersecting the n-gram hit lists while calculating the total probability for each matched lexicon entry. We conclude with an experimental comparison of different decoding algorithms on a large lexicon.
Algorithms for calculating the string edit distance are used in e.g. information retrieval and document analysis systems or for evaluation of text recognizers. Text recognition based on CTC-trained LSTM networks includes a decoding step to produce a string, possibly using a language model, and evaluation using the string edit distance. The decoded string can further be used as a query for database search, e.g. in document retrieval. We propose to closely integrate dictionary search with text recognition to train both combined in a continuous fashion. This work shows that LSTM networks are capable of calculating the string edit distance while allowing for an exchangeable dictionary to separate learned algorithm from data. This could be a step towards integrating text recognition and dictionary search in one deep network.
Recent years have seen the proposal of several different gradient-based optimization methods for training artificial neural networks. Traditional methods include steepest descent with momentum, newer methods are based on per-parameter learning rates and some approximate Newton-step updates. This work contains the result of several experiments comparing different optimization methods. The experiments were targeted at offline handwriting recognition using hierarchical subsampling networks with recurrent LSTM layers. We present an overview of the used optimization methods, the results that were achieved and a discussion of why the methods lead to different results.