Refine
Year of publication
Document Type
- Conference Proceeding (19)
- Article (8)
- Part of a Book (3)
- Doctoral Thesis (3)
- Master's Thesis (2)
- Bachelor Thesis (1)
- Book (1)
- Preprint (1)
- Report (1)
Keywords
- 3D ship detection (1)
- Bayesian convolutional neural networks (1)
- Calibration procedure (1)
- Classification (1)
- Computer vision (1)
- Convolutional networks (1)
- Crowdmanagement (1)
- Deep Transformation Model (1)
- Deep learning (4)
- Defect detection (1)
Institute
- Institut für Optische Systeme - IOS (39) (remove)
Deep neural networks have become a veritable alternative to classic speaker recognition and clustering methods in recent years. However, while the speech signal clearly is a time series, and despite the body of literature on the benefits of prosodic (suprasegmental) features, identifying voices has usually not been approached with sequence learning methods. Only recently has a recurrent neural network (RNN) been successfully applied to this task, while the use of convolutional neural networks (CNNs) (that are not able to capture arbitrary time dependencies, unlike RNNs) still prevails. In this paper, we show the effectiveness of RNNs for speaker recognition by improving state of the art speaker clustering performance and robustness on the classic TIMIT benchmark. We provide arguments why RNNs are superior by experimentally showing a “sweet spot” of the segment length for successfully capturing prosodic information that has been theoretically predicted in previous work.
Das Projekt eFlow, an dem unter anderem die HTWG Konstanz seit 2012 forscht, simuliert mit Hilfe einer mathematischen Simulation wie sich Menschenmassen verhalten, wenn sie ein vorgegebenes Gelände verlassen sollen. Die Simulation baut auf einen Ansatz der Finite Elemente Methode auf, in der mehrere gekoppelte Differenzialgleichungen berechnet werden müssen. Diese Berechnungen erweisen sich gerade bei komplexen Szenarien mit großem Gelände und vielen Personen als sehr rechenintensiv. Ziel dieser Bachelorarbeit ist es ein Surrogate Modell zu erstellen, welches basierend auf machine-learning Ansätzen im spezifischen auf Regressionsmethoden Ergebnisse der Simulation vorhersagen soll. Somit müssen Datensätze generiert werden. Diese entstehen durch wiederholte Durchläufe der Simulation, in der jeweils die Eingabeparameter, die in das Regressionsmodell einfließen sollen variiert werden und mit dem entsprechenden Ergebnis der Simulation verknüpft werden. Die Regressionsansätze werden dabei pro Durchlauf komplexer, in dem jeweils zusätzliche Eingabeparameter mit in die Datengenerierung aufgenommen werden. Es soll überprüft werden, ob diese Simulation mittels machine-learning Ansätzen reproduzierbar ist. Basierend auf diesen Surrogate Modellen soll es möglich gemacht werden, Situationen in Echtzeit zu überprüfen, ohne dabei den Weg der rechenaufwendigen Simulation zu gehen. Die Ergebnisse bestätigen, dass die mathematische Simulation mittels Regression reproduzierbar ist. Es erweist sich jedoch als sehr rechenaufwendig, Daten zu sammeln, um genügend Eingabeparameter mit in die Regressionsmethode einfließen zu lassen. Diese Arbeit gestaltet somit eine Vorstudie zur Umsetzung eines ausgereiften Surrogate Modells, welches jegliche Eingabeparameter der Simulation berücksichtigen kann.
Offline handwriting recognition systems often use LSTM networks, trained with line- or word-images. Multi-line text makes it necessary to use segmentation to explicitly obtain these images. Skewed, curved, overlapping, incorrectly written text, or noise can lead to errors during segmentation of multi-line text and reduces the overall recognition capacity of the system. Last year has seen the introduction of deep learning methods capable of segmentation-free recognition of whole paragraphs. Our method uses Conditional Random Fields to represent text and align it with the network output to calculate a loss function for training. Experiments are promising and show that the technique is capable of training a LSTM multi-line text recognition system.
Multi-Dimensional Connectionist Classification is amethod for weakly supervised training of Deep Neural Networksfor segmentation-free multi-line offline handwriting recognition.MDCC applies Conditional Random Fields as an alignmentfunction for this task. We discuss the structure and patterns ofhandwritten text that can be used for building a CRF. Since CRFsare cyclic graphical models, we have to resort to approximateinference when calculating the alignment of multi-line text duringtraining, here in the form of Loopy Belief Propagation. This workconcludes with experimental results for transcribing small multi-line samples from the IAM Offline Handwriting DB which showthat MDCC is a competitive methodology.
Visualization-Assisted Development of Deep Learning Models in Offline Handwriting Recognition
(2018)
Deep learning is a field of machine learning that has been the focus of active research and successful applications in recent years. Offline handwriting recognition is one of the research fields and applications were deep neural networks have shown high accuracy. Deep learning models and their training pipeline show a large amount of hyper-parameters in their data selection, transformation, network topology and training process that are sometimes interdependent. This increases the overall difficulty and time necessary for building and training a model for a specific data set and task at hand. This work proposes a novel visualization-assisted workflow that guides the model developer through the hyper-parameter search in order to identify relevant parameters and modify them in a meaningful way. This decreases the overall time necessary for building and training a model. The contributions of this work are a workflow for hyper-parameter search in offline handwriting recognition and a heat map based visualization technique for deep neural networks in multi-line offline handwriting recognition. This work applies to offline handwriting recognition, but the general workflow can possibly be adapted to other tasks as well.
Algorithms for calculating the string edit distance are used in e.g. information retrieval and document analysis systems or for evaluation of text recognizers. Text recognition based on CTC-trained LSTM networks includes a decoding step to produce a string, possibly using a language model, and evaluation using the string edit distance. The decoded string can further be used as a query for database search, e.g. in document retrieval. We propose to closely integrate dictionary search with text recognition to train both combined in a continuous fashion. This work shows that LSTM networks are capable of calculating the string edit distance while allowing for an exchangeable dictionary to separate learned algorithm from data. This could be a step towards integrating text recognition and dictionary search in one deep network.
Rheumatoid arthritis is an autoimmune disease that causes chronic inflammation of synovial joints, often resulting in irreversible structural damage. The activity of the disease is evaluated by clinical examinations, laboratory tests, and patient self-assessment. The long-term course of the disease is assessed with radiographs of hands and feet. The evaluation of the X-ray images performed by trained medical staff requires several minutes per patient. We demonstrate that deep convolutional neural networks can be leveraged for a fully automated, fast, and reproducible scoring of X-ray images of patients with rheumatoid arthritis. A comparison of the predictions of different human experts and our deep learning system shows that there is no significant difference in the performance of human experts and our deep learning model.
Mapping of tree seedlings is useful for tasks ranging from monitoring natural succession and regeneration to effective silvicultural management. Development of methods that are both accurate and cost-effective is especially important considering the dramatic increase in tree planting that is required globally to mitigate the impacts of climate change. The combination of high-resolution imagery from unmanned aerial vehicles and object detection by convolutional neural networks (CNNs) is one promising approach. However, unbiased assessments of these models and methods to integrate them into geospatial workflows are lacking. In this study, we present a method for rapid, large-scale mapping of young conifer seedlings using CNNs applied to RGB orthomosaic imagery. Importantly, we provide an unbiased assessment of model performance by using two well-characterised trial sites together containing over 30,000 seedlings to assemble datasets with a high level of completeness. Our results showed CNN-based models trained on two sites detected seedlings with sensitivities of 99.5% and 98.8%. False positives due to tall weeds at one site and naturally regenerating seedlings of the same species led to slightly lower precision of 98.5% and 96.7%. A model trained on examples from both sites had 99.4% sensitivity and precision of 97%, showing applicability across sites. Additional testing showed that the CNN model was able to detect 68.7% of obscured seedlings missed during the initial annotation of the imagery but present in the field data. Finally, we demonstrate the potential to use a form of weakly supervised training and a tile-based processing chain to enhance the accuracy and efficiency of CNNs applied to large, high-resolution orthomosaics.