Refine
Document Type
- Article (6)
- Conference Proceeding (2)
- Master's Thesis (1)
- Preprint (1)
Language
- English (10)
Keywords
- Deep learning (10) (remove)
Institute
Interpretability and uncertainty modeling are important key factors for medical applications. Moreover, data in medicine are often available as a combination of unstructured data like images and structured predictors like patient’s metadata. While deep learning models are state-of-the-art for image classification, the models are often referred to as ’black-box’, caused by the lack of interpretability. Moreover, DL models are often yielding point predictions and are too confident about the parameter estimation and outcome predictions.
On the other side with statistical regression models, it is possible to obtain interpretable predictor effects and capture parameter and model uncertainty based on the Bayesian approach. In this thesis, a publicly available melanoma dataset, consisting of skin lesions and patient’s age, is used to predict the melanoma types by using a semi-structured model, while interpretable components and model uncertainty is quantified. For Bayesian models, transformation model-based variational inference (TM-VI) method is used to determine the posterior distribution of the parameter. Several model constellations consisting of patient’s age and/or skin lesion were implemented and evaluated. Predictive performance was shown to be best by using a combined model of image and patient’s age, while providing the interpretable posterior distribution of the regression coefficient is possible. In addition, integrating uncertainty in image and tabular parts results in larger variability of the outputs corresponding to high uncertainty of the single model components.
Background: Polysomnography (PSG) is the gold standard for detecting obstructive sleep apnea (OSA). However, this technique has many disadvantages when using it outside the hospital or for daily use. Portable monitors (PMs) aim to streamline the OSA detection process through deep learning (DL).
Materials and methods: We studied how to detect OSA events and calculate the apnea-hypopnea index (AHI) by using deep learning models that aim to be implemented on PMs. Several deep learning models are presented after being trained on polysomnography data from the National Sleep Research Resource (NSRR) repository. The best hyperparameters for the DL architecture are presented. In addition, emphasis is focused on model explainability techniques, concretely on Gradient-weighted Class Activation Mapping (Grad-CAM).
Results: The results for the best DL model are presented and analyzed. The interpretability of the DL model is also analyzed by studying the regions of the signals that are most relevant for the model to make the decision. The model that yields the best result is a one-dimensional convolutional neural network (1D-CNN) with 84.3% accuracy.
Conclusion: The use of PMs using machine learning techniques for detecting OSA events still has a long way to go. However, our method for developing explainable DL models demonstrates that PMs appear to be a promising alternative to PSG in the future for the detection of obstructive apnea events and the automatic calculation of AHI.
Study design:
Retrospective, mono-centric cohort research study.
Objectives:
The purpose of this study is to validate a novel artificial intelligence (AI)-based algorithm against human-generated ground truth for radiographic parameters of adolescent idiopathic scoliosis (AIS).
Methods:
An AI-algorithm was developed that is capable of detecting anatomical structures of interest (clavicles, cervical, thoracic, lumbar spine and sacrum) and calculate essential radiographic parameters in AP spine X-rays fully automatically. The evaluated parameters included T1-tilt, clavicle angle (CA), coronal balance (CB), lumbar modifier, and Cobb angles in the proximal thoracic (C-PT), thoracic, and thoracolumbar regions. Measurements from 2 experienced physicians on 100 preoperative AP full spine X-rays of AIS patients were used as ground truth and to evaluate inter-rater and intra-rater reliability. The agreement between human raters and AI was compared by means of single measure Intra-class Correlation Coefficients (ICC; absolute agreement; .75 rated as excellent), mean error and additional statistical metrics.
Results:
The comparison between human raters resulted in excellent ICC values for intra- (range: .97-1) and inter-rater (.85-.99) reliability. The algorithm was able to determine all parameters in 100% of images with excellent ICC values (.78-.98). Consistently with the human raters, ICC values were typically smallest for C-PT (eg, rater 1A vs AI: .78, mean error: 4.7°) and largest for CB (.96, -.5 mm) as well as CA (.98, .2°).
Conclusions:
The AI-algorithm shows excellent reliability and agreement with human raters for coronal parameters in preoperative full spine images. The reliability and speed offered by the AI-algorithm could contribute to the efficient analysis of large datasets (eg, registry studies) and measurements in clinical practice.
The use of deep learning models with medical data is becoming more widespread. However, although numerous models have shown high accuracy in medical-related tasks, such as medical image recognition (e.g. radiographs), there are still many problems with seeing these models operating in a real healthcare environment. This article presents a series of basic requirements that must be taken into account when developing deep learning models for biomedical time series classification tasks, with the aim of facilitating the subsequent production of the models in healthcare. These requirements range from the correct collection of data, to the existing techniques for a correct explanation of the results obtained by the models. This is due to the fact that one of the main reasons why the use of deep learning models is not more widespread in healthcare settings is their lack of clarity when it comes to explaining decision making.
Know when you don't know
(2018)
Deep convolutional neural networks show outstanding performance in image-based phenotype classification given that all existing phenotypes are presented during the training of the network. However, in real-world high-content screening (HCS) experiments, it is often impossible to know all phenotypes in advance. Moreover, novel phenotype discovery itself can be an HCS outcome of interest. This aspect of HCS is not yet covered by classical deep learning approaches. When presenting an image with a novel phenotype to a trained network, it fails to indicate a novelty discovery but assigns the image to a wrong phenotype. To tackle this problem and address the need for novelty detection, we use a recently developed Bayesian approach for deep neural networks called Monte Carlo (MC) dropout to define different uncertainty measures for each phenotype prediction. With real HCS data, we show that these uncertainty measures allow us to identify novel or unclear phenotypes. In addition, we also found that the MC dropout method results in a significant improvement of classification accuracy. The proposed procedure used in our HCS case study can be easily transferred to any existing network architecture and will be beneficial in terms of accuracy and novelty detection.
Image novelty detection is a repeating task in computer vision and describes the detection of anomalous images based on a training dataset consisting solely of normal reference data. It has been found that, in particular, neural networks are well-suited for the task. Our approach first transforms the training and test images into ensembles of patches, which enables the assessment of mean-shifts between normal data and outliers. As mean-shifts are only detectable when the outlier ensemble and inlier distribution are spatially separate from each other, a rich feature space, such as a pre-trained neural network, needs to be chosen to represent the extracted patches. For mean-shift estimation, the Hotelling T2 test is used. The size of the patches turned out to be a crucial hyperparameter that needs additional domain knowledge about the spatial size of the expected anomalies (local vs. global). This also affects model selection and the chosen feature space, as commonly used Convolutional Neural Networks or Vision Image Transformers have very different receptive field sizes. To showcase the state-of-the-art capabilities of our approach, we compare results with classical and deep learning methods on the popular dataset CIFAR-10, and demonstrate its real-world applicability in a large-scale industrial inspection scenario using the MVTec dataset. Because of the inexpensive design, our method can be implemented by a single additional 2D-convolution and pooling layer and allows particularly fast prediction times while being very data-efficient.
Contemporary empirical applications frequently require flexible regression models for complex response types and large tabular or non-tabular, including image or text, data. Classical regression models either break down under the computational load of processing such data or require additional manual feature extraction to make these problems tractable. Here, we present deeptrafo, a package for fitting flexible regression models for conditional distributions using a tensorflow backend with numerous additional processors, such as neural networks, penalties, and smoothing splines. Package deeptrafo implements deep conditional transformation models (DCTMs) for binary, ordinal, count, survival, continuous, and time series responses, potentially with uninformative censoring. Unlike other available methods, DCTMs do not assume a parametric family of distributions for the response. Further, the data analyst may trade off interpretability and flexibility by supplying custom neural network architectures and smoothers for each term in an intuitive formula interface. We demonstrate how to set up, fit, and work with DCTMs for several response types. We further showcase how to construct ensembles of these models, evaluate models using inbuilt cross-validation, and use other convenience functions for DCTMs in several applications. Lastly, we discuss DCTMs in light of other approaches to regression with non-tabular data.
Mapping of tree seedlings is useful for tasks ranging from monitoring natural succession and regeneration to effective silvicultural management. Development of methods that are both accurate and cost-effective is especially important considering the dramatic increase in tree planting that is required globally to mitigate the impacts of climate change. The combination of high-resolution imagery from unmanned aerial vehicles and object detection by convolutional neural networks (CNNs) is one promising approach. However, unbiased assessments of these models and methods to integrate them into geospatial workflows are lacking. In this study, we present a method for rapid, large-scale mapping of young conifer seedlings using CNNs applied to RGB orthomosaic imagery. Importantly, we provide an unbiased assessment of model performance by using two well-characterised trial sites together containing over 30,000 seedlings to assemble datasets with a high level of completeness. Our results showed CNN-based models trained on two sites detected seedlings with sensitivities of 99.5% and 98.8%. False positives due to tall weeds at one site and naturally regenerating seedlings of the same species led to slightly lower precision of 98.5% and 96.7%. A model trained on examples from both sites had 99.4% sensitivity and precision of 97%, showing applicability across sites. Additional testing showed that the CNN model was able to detect 68.7% of obscured seedlings missed during the initial annotation of the imagery but present in the field data. Finally, we demonstrate the potential to use a form of weakly supervised training and a tile-based processing chain to enhance the accuracy and efficiency of CNNs applied to large, high-resolution orthomosaics.
Outcomes with a natural order commonly occur in prediction problems and often the available input data are a mixture of complex data like images and tabular predictors. Deep Learning (DL) models are state-of-the-art for image classification tasks but frequently treat ordinal outcomes as unordered and lack interpretability. In contrast, classical ordinal regression models consider the outcome’s order and yield interpretable predictor effects but are limited to tabular data. We present ordinal neural network transformation models (ontrams), which unite DL with classical ordinal regression approaches. ontrams are a special case of transformation models and trade off flexibility and interpretability by additively decomposing the transformation function into terms for image and tabular data using jointly trained neural networks. The performance of the most flexible ontram is by definition equivalent to a standard multi-class DL model trained with cross-entropy while being faster in training when facing ordinal outcomes. Lastly, we discuss how to interpret model components for both tabular and image data on two publicly available datasets.
This paper presents the implementation of deep learning methods for sleep stage detection by using three signals that can be measured in a non-invasive way: heartbeat signal, respiratory signal, and movement signal. Since signals are measurements taken during the time, the problem is seen as time-series data classification. Deep learning methods are chosen to solve the problem are convolutional neural network and long-short term memory network. Input data is structured as a time-series sequence of mentioned signals that represent 30 seconds epoch, which is a standard interval for sleep analysis. The records used belong to the overall 23 subjects, which are divided into two subsets. Records from 18 subjects were used for training the data and from 5 subjects for testing the data. For detecting four sleep stages: REM (Rapid Eye Movement), Wake, Light sleep (Stage 1 and Stage 2), and Deep sleep (Stage 3 and Stage 4), the accuracy of the model is 55%, and F1 score is 44%. For five stages: REM, Stage 1, Stage 2, Deep sleep (Stage 3 and 4), and Wake, the model gives an accuracy of 40% and F1 score of 37%.