Refine
Document Type
- Conference Proceeding (14) (remove)
Language
- English (14)
Has Fulltext
- no (14) (remove)
Keywords
- 3D ship detection (1)
- Extended object tracking (1)
- Extension estimation (1)
- Finite-element (1)
- Gaussian processes (1)
- Inverse perspective (1)
- Lidar (1)
- Lidar-camera registration (1)
- Mask R-CNN (1)
- Modelling (1)
Institute
- Institut für Optische Systeme - IOS (14) (remove)
Multi-Dimensional Connectionist Classification is amethod for weakly supervised training of Deep Neural Networksfor segmentation-free multi-line offline handwriting recognition.MDCC applies Conditional Random Fields as an alignmentfunction for this task. We discuss the structure and patterns ofhandwritten text that can be used for building a CRF. Since CRFsare cyclic graphical models, we have to resort to approximateinference when calculating the alignment of multi-line text duringtraining, here in the form of Loopy Belief Propagation. This workconcludes with experimental results for transcribing small multi-line samples from the IAM Offline Handwriting DB which showthat MDCC is a competitive methodology.
Visualization-Assisted Development of Deep Learning Models in Offline Handwriting Recognition
(2018)
Deep learning is a field of machine learning that has been the focus of active research and successful applications in recent years. Offline handwriting recognition is one of the research fields and applications were deep neural networks have shown high accuracy. Deep learning models and their training pipeline show a large amount of hyper-parameters in their data selection, transformation, network topology and training process that are sometimes interdependent. This increases the overall difficulty and time necessary for building and training a model for a specific data set and task at hand. This work proposes a novel visualization-assisted workflow that guides the model developer through the hyper-parameter search in order to identify relevant parameters and modify them in a meaningful way. This decreases the overall time necessary for building and training a model. The contributions of this work are a workflow for hyper-parameter search in offline handwriting recognition and a heat map based visualization technique for deep neural networks in multi-line offline handwriting recognition. This work applies to offline handwriting recognition, but the general workflow can possibly be adapted to other tasks as well.
In this paper we present a method using deep learning to compute parametrizations for B-spline curve approximation. Existing methods consider the computation of parametric values and a knot vector as separate problems. We propose to train interdependent deep neural networks to predict parametric values and knots. We show that it is possible to include B-spline curve approximation directly into the neural network architecture. The resulting parametrizations yield tight approximations and are able to outperform state-of-the-art methods.
Deep neural networks have been successfully applied to problems such as image segmentation, image super-resolution, coloration and image inpainting. In this work we propose the use of convolutional neural networks (CNN) for image inpainting of large regions in high-resolution textures. Due to limited computational resources processing high-resolution images with neural networks is still an open problem. Existing methods separate inpainting of global structure and the transfer of details, which leads to blurry results and loss of global coherence in the detail transfer step. Based on advances in texture synthesis using CNNs we propose patch-based image inpainting by a single network topology that is able to optimize for global as well as detail texture statistics. Our method is capable of filling large inpainting regions, oftentimes exceeding quality of comparable methods for images of high-resolution (2048x2048px). For reference patch look-up we propose to use the same summary statistics that are used in the inpainting process.
Random matrices are used to filter the center of gravity (CoG) and the covariance matrix of measurements. However, these quantities do not always correspond directly to the position and the extent of the object, e.g. when a lidar sensor is used.In this paper, we propose a Gaussian processes regression model (GPRM) to predict the position and extension of the object from the filtered CoG and covariance matrix of the measurements. Training data for the GPRM are generated by a sampling method and a virtual measurement model (VMM). The VMM is a function that generates artificial measurements using ray tracing and allows us to obtain the CoG and covariance matrix that any object would cause. This enables the GPRM to be trained without real data but still be applied to real data due to the precise modeling in the VMM. The results show an accurate extension estimation as long as the reality behaves like the modeling and e.g. lidar measurements only occur on the side facing the sensor.
Incremental one-class learning using regularized null-space training for industrial defect detection
(2024)
One-class incremental learning is a special case of class-incremental learning, where only a single novel class is incrementally added to an existing classifier instead of multiple classes. This case is relevant in industrial defect detection scenarios, where novel defects usually appear during operation. Existing rolled-out classifiers must be updated incrementally in this scenario with only a few novel examples. In addition, it is often required that the base classifier must not be altered due to approval and warranty restrictions. While simple finetuning often gives the best performance across old and new classes, it comes with the drawback of potentially losing performance on the base classes (catastrophic forgetting [1]). Simple prototype approaches [2] work without changing existing weights and perform very well when the classes are well separated but fail dramatically when not. In theory, null-space training (NSCL) [3] should retain the basis classifier entirely, as parameter updates are restricted to the null space of the network with respect to existing classes. However, as we show, this technique promotes overfitting in the case of one-class incremental learning. In our experiments, we found that unconstrained weight growth in null space is the underlying issue, leading us to propose a regularization term (R-NSCL) that penalizes the magnitude of amplification. The regularization term is added to the standard classification loss and stabilizes null-space training in the one-class scenario by counteracting overfitting. We test the method’s capabilities on two industrial datasets, namely AITEX and MVTec, and compare the performance to state-of-the-art algorithms for class-incremental learning.
Targetless Lidar-camera registration is a repeating task in many computer vision and robotics applications and requires computing the extrinsic pose of a point cloud with respect to a camera or vice-versa. Existing methods based on learning or optimization lack either generalization capabilities or accuracy. Here, we propose a combination of pre-training and optimization using a neural network-based mutual information estimation technique (MINE [1]). This construction allows back-propagating the gradient to the calibration parameters and enables stochastic gradient descent. To ensure orthogonality constraints with respect to the rotation matrix we incorporate Lie-group techniques. Furthermore, instead of optimizing on entire images, we operate on local patches that are extracted from the temporally synchronized projected Lidar points and camera frames. Our experiments show that this technique not only improves over existing techniques in terms of accuracy, but also shows considerable generalization capabilities towards new Lidar-camera configurations.
Optical surface inspection: A novelty detection approach based on CNN-encoded texture features
(2018)
In inspection systems for textured surfaces, a reference texture is typically known before novel examples are inspected. Mostly, the reference is only available in a digital format. As a consequence, there is no dataset of defective examples available that could be used to train a classifier. We propose a texture model approach to novelty detection. The texture model uses features encoded by a convolutional neural network (CNN) trained on natural image data. The CNN activations represent the specific characteristics of the digital reference texture which are learned by a one-class classifier. We evaluate our novelty detector in a digital print inspection scenario. The inspection unit is based on a camera array and a flashing light illumination which allows for inline capturing of multichannel images at a high rate. In order to compare our results to manual inspection, we integrated our inspection unit into an industrial single-pass printing system.
Motion estimation is an essential element for autonomous vessels. It is used e.g. for lidar motion compensation as well as mapping and detection tasks in a maritime environment. Because the use of gyroscopes is not reliable and a high performance inertial measurement unit is quite expensive, we present an approach for visual pitch and roll estimation that utilizes a convolutional neural network for water segmentation, a stereo system for reconstruction and simple geometry to estimate pitch and roll. The algorithm is validated on a novel, publicly available dataset recorded at Lake Constance. Our experiments show that the pitch and roll estimator provides accurate results in comparison to an Xsens IMU sensor. We can further improve the pitch and roll estimation by sensor fusion with a gyroscope. The algorithm is available in its implementation as a ROS node.
Three-dimensional ship localization with only one camera is a challenging task due to the loss of depth information caused by perspective projection. In this paper, we propose a method to measure distances based on the assumption that ships lie on a flat surface. This assumption allows to recover depth from a single image using the principle of inverse perspective. For the 3D ship detection task, we use a hybrid approach that combines image detection with a convolutional neural network, camera geometry and inverse perspective. Furthermore, a novel calculation of object height is introduced. Experiments show that the monocular distance computation works well in comparison to a Velodyne lidar. Due to its robustness, this could be an easy-to-use baseline method for detection tasks in navigation systems.