OPUS 4 | Search

4 search hits

1 to 4

Sort by

Year
Year
Title
Title
Author
Author

Capturing suprasegmental features of a voicewith RNNs for improved speaker clustering (2018)

Stadelmann, Thilo ; Glinski-Haefeli, Sebastian ; Gerber, Patrick ; Dürr, Oliver

Deep neural networks have become a veritable alternative to classic speaker recognition and clustering methods in recent years. However, while the speech signal clearly is a time series, and despite the body of literature on the benefits of prosodic (suprasegmental) features, identifying voices has usually not been approached with sequence learning methods. Only recently has a recurrent neural network (RNN) been successfully applied to this task, while the use of convolutional neural networks (CNNs) (that are not able to capture arbitrary time dependencies, unlike RNNs) still prevails. In this paper, we show the effectiveness of RNNs for speaker recognition by improving state of the art speaker clustering performance and robustness on the classic TIMIT benchmark. We provide arguments why RNNs are superior by experimentally showing a “sweet spot” of the segment length for successfully capturing prosodic information that has been theoretically predicted in previous work.

Learning neural models for end-to-end clustering (2018)

Meier, Benjamin Bruno ; Elezi, Ismail ; Amirian, Mohammadreza ; Dürr, Oliver ; Stadelmann, Thilo

We propose a novel end-to-end neural network architecture that, once trained, directly outputs a probabilistic clustering of a batch of input examples in one pass. It estimates a distribution over the number of clusters k, and for each 1≤k≤kmax, a distribution over the individual cluster assignment for each data point. The network is trained in advance in a supervised fashion on separate data to learn grouping by any perceptual similarity criterion based on pairwise labels (same/different group). It can then be applied to different data containing different groups. We demonstrate promising performance on high-dimensional data like images (COIL-100) and speech (TIMIT). We call this “learning to cluster” and show its conceptual difference to deep metric learning, semi-supervise clustering and other related approaches while having the advantage of performing learnable clustering fully end-to-end.

Automatic classification of non-small cell lung cancer histologic sub-types by deep learning (2018)

Casanova, R. ; Murina, Elvis ; Haberecker, M. ; Honcharova-Biletska, H. ; Vrugt, B. ; Dürr, Oliver ; Sick, Beate ; Soltermann, A.

Single Shot MC Dropout Approximation (2020)

Brach, Kai ; Sick, Beate ; Dürr, Oliver

Deep neural networks (DNNs) are known for their high prediction performance, especially in perceptual tasks such as object recognition or autonomous driving. Still, DNNs are prone to yield unreliable predictions when encountering completely new situations without indicating their uncertainty. Bayesian variants of DNNs (BDNNs), such as MC dropout BDNNs, do provide uncertainty measures. However, BDNNs are slow during test time because they rely on a sampling approach. Here we present a single shot MC dropout approximation that preserves the advantages of BDNNs without being slower than a DNN. Our approach is to analytically approximate for each layer in a fully connected network the expected value and the variance of the MC dropout signal. We evaluate our approach on different benchmark datasets and a simulated toy example. We demonstrate that our single shot MC dropout approximation resembles the point estimate and the uncertainty estimate of the predictive distribution that is achieved with an MC approach, while being fast enough for real-time deployments of BDNNs.

1 to 4

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Keywords

Institute

4 search hits