Refine
Year of publication
Document Type
- Conference Proceeding (19)
- Article (8)
- Part of a Book (3)
- Doctoral Thesis (3)
- Master's Thesis (2)
- Bachelor Thesis (1)
- Book (1)
- Preprint (1)
- Report (1)
Keywords
- 3D ship detection (1)
- Bayesian convolutional neural networks (1)
- Calibration procedure (1)
- Classification (1)
- Computer vision (1)
- Convolutional networks (1)
- Crowdmanagement (1)
- Deep Transformation Model (1)
- Deep learning (4)
- Defect detection (1)
Institute
- Institut für Optische Systeme - IOS (39) (remove)
Algorithms for calculating the string edit distance are used in e.g. information retrieval and document analysis systems or for evaluation of text recognizers. Text recognition based on CTC-trained LSTM networks includes a decoding step to produce a string, possibly using a language model, and evaluation using the string edit distance. The decoded string can further be used as a query for database search, e.g. in document retrieval. We propose to closely integrate dictionary search with text recognition to train both combined in a continuous fashion. This work shows that LSTM networks are capable of calculating the string edit distance while allowing for an exchangeable dictionary to separate learned algorithm from data. This could be a step towards integrating text recognition and dictionary search in one deep network.
Deep neural networks have become a veritable alternative to classic speaker recognition and clustering methods in recent years. However, while the speech signal clearly is a time series, and despite the body of literature on the benefits of prosodic (suprasegmental) features, identifying voices has usually not been approached with sequence learning methods. Only recently has a recurrent neural network (RNN) been successfully applied to this task, while the use of convolutional neural networks (CNNs) (that are not able to capture arbitrary time dependencies, unlike RNNs) still prevails. In this paper, we show the effectiveness of RNNs for speaker recognition by improving state of the art speaker clustering performance and robustness on the classic TIMIT benchmark. We provide arguments why RNNs are superior by experimentally showing a “sweet spot” of the segment length for successfully capturing prosodic information that has been theoretically predicted in previous work.
Visualization-Assisted Development of Deep Learning Models in Offline Handwriting Recognition
(2018)
Deep learning is a field of machine learning that has been the focus of active research and successful applications in recent years. Offline handwriting recognition is one of the research fields and applications were deep neural networks have shown high accuracy. Deep learning models and their training pipeline show a large amount of hyper-parameters in their data selection, transformation, network topology and training process that are sometimes interdependent. This increases the overall difficulty and time necessary for building and training a model for a specific data set and task at hand. This work proposes a novel visualization-assisted workflow that guides the model developer through the hyper-parameter search in order to identify relevant parameters and modify them in a meaningful way. This decreases the overall time necessary for building and training a model. The contributions of this work are a workflow for hyper-parameter search in offline handwriting recognition and a heat map based visualization technique for deep neural networks in multi-line offline handwriting recognition. This work applies to offline handwriting recognition, but the general workflow can possibly be adapted to other tasks as well.
Rheumatoid arthritis is an autoimmune disease that causes chronic inflammation of synovial joints, often resulting in irreversible structural damage. The activity of the disease is evaluated by clinical examinations, laboratory tests, and patient self-assessment. The long-term course of the disease is assessed with radiographs of hands and feet. The evaluation of the X-ray images performed by trained medical staff requires several minutes per patient. We demonstrate that deep convolutional neural networks can be leveraged for a fully automated, fast, and reproducible scoring of X-ray images of patients with rheumatoid arthritis. A comparison of the predictions of different human experts and our deep learning system shows that there is no significant difference in the performance of human experts and our deep learning model.
Fast and reliable acquisition of truth data for document analysis using cyclic suggest algorithms
(2019)
In document analysis the availability of ground truth data plays a crucial role for the success of a project. This is even more true at the rise of new deep learning methods which heavily rely on the availability of training data. But even for traditional, hand crafted algorithms that are not trained on data, reliable test data is important for the improvement and evaluation of the methods. Because ground truth acquisition is expensive and time consuming, semi-automatic methods are introduced which make use of suggestions coming from document analysis systems. The interaction between the human operator and the automatic analysis algorithms is the key to speed up the process while improving the quality of the data. The final confirmation of data may always be done by the human operator. This paper demonstrates a use case for acquisition of truth data in a mail processing system. It shows why a new, extended view on truth data is necessary in development and engineering of such systems. An overview over the tool and the data handling is given, the advantages in the workflow are shown, and consequences for the construction of analysis algorithms are discussed. It can be shown that the interplay between suggest algorithms and human operator leads to very fast truth data capturing. The surprising finding is the fact that if multiple suggest algorithms circularly depend on data, they are especially effective in terms of speed and accuracy.
Pascal Laube presents machine learning approaches for three key problems of reverse engineering of defective structured surfaces: parametrization of curves and surfaces, geometric primitive classification and inpainting of high-resolution textures. The proposed methods aim to improve the reconstruction quality while further automating the process. The contributions demonstrate that machine learning can be a viable part of the CAD reverse engineering pipeline.