Volltext-Downloads (blau) und Frontdoor-Views (grau)

Increasing robustness of handwriting recognition using character N-Gram decoding on large lexica

  • Offline handwriting recognition systems often include a decoding step, that is retrieving the most likely character sequence from the underlying machine learning algorithm. Decoding is sensitive to ranges of weakly predicted characters, caused e.g. by obstructions in the scanned document. We present a new algorithm for robust decoding of handwriting recognizer outputs using character n-grams. Multidimensional hierarchical subsampling artificial neural networks with Long-Short-Term-Memory cells have been successfully applied to offline handwriting recognition. Output activations from such networks, trained with Connectionist Temporal Classification, can be decoded with several different algorithms in order to retrieve the most likely literal string that it represents. We present a new algorithm for decoding the network output while restricting the possible strings to a large lexicon. The index used for this work is an n-gram index with tri-grams used for experimental comparisons. N-grams are extracted from the network output using a backtracking algorithm and each n-gram assigned a mean probability. The decoding result is obtained by intersecting the n-gram hit lists while calculating the total probability for each matched lexicon entry. We conclude with an experimental comparison of different decoding algorithms on a large lexicon.

Export metadata

Additional Services

Search Google Scholar


Author:Martin Schall, Marc-Peter Schambach, Matthias O. FranzORCiDGND
Parent Title (English):12th IAPR Workshop on Document Analysis Systems (DAS), 11-14 April 2016, Santorini, Greece
Document Type:Conference Proceeding
Year of Publication:2016
Release Date:2018/11/20
Tag:Probability; Computational linguistics; Document image processing; Handwriting recognition; Learning (artificial intelligence)
First Page:1
Last Page:6
Open Access?:Ja
Licence (German):License LogoUrheberrechtlich gesch├╝tzt