Refine
Year of publication
- 2018 (3) (remove)
Document Type
- Conference Proceeding (3) (remove)
Language
- English (3)
Institute
Algorithms for calculating the string edit distance are used in e.g. information retrieval and document analysis systems or for evaluation of text recognizers. Text recognition based on CTC-trained LSTM networks includes a decoding step to produce a string, possibly using a language model, and evaluation using the string edit distance. The decoded string can further be used as a query for database search, e.g. in document retrieval. We propose to closely integrate dictionary search with text recognition to train both combined in a continuous fashion. This work shows that LSTM networks are capable of calculating the string edit distance while allowing for an exchangeable dictionary to separate learned algorithm from data. This could be a step towards integrating text recognition and dictionary search in one deep network.
Visualization-Assisted Development of Deep Learning Models in Offline Handwriting Recognition
(2018)
Deep learning is a field of machine learning that has been the focus of active research and successful applications in recent years. Offline handwriting recognition is one of the research fields and applications were deep neural networks have shown high accuracy. Deep learning models and their training pipeline show a large amount of hyper-parameters in their data selection, transformation, network topology and training process that are sometimes interdependent. This increases the overall difficulty and time necessary for building and training a model for a specific data set and task at hand. This work proposes a novel visualization-assisted workflow that guides the model developer through the hyper-parameter search in order to identify relevant parameters and modify them in a meaningful way. This decreases the overall time necessary for building and training a model. The contributions of this work are a workflow for hyper-parameter search in offline handwriting recognition and a heat map based visualization technique for deep neural networks in multi-line offline handwriting recognition. This work applies to offline handwriting recognition, but the general workflow can possibly be adapted to other tasks as well.
Offline handwriting recognition systems often use LSTM networks, trained with line- or word-images. Multi-line text makes it necessary to use segmentation to explicitly obtain these images. Skewed, curved, overlapping, incorrectly written text, or noise can lead to errors during segmentation of multi-line text and reduces the overall recognition capacity of the system. Last year has seen the introduction of deep learning methods capable of segmentation-free recognition of whole paragraphs. Our method uses Conditional Random Fields to represent text and align it with the network output to calculate a loss function for training. Experiments are promising and show that the technique is capable of training a LSTM multi-line text recognition system.