Refine
Year of publication
- 2016 (3) (remove)
Document Type
- Conference Proceeding (2)
- Article (1)
Language
- English (3)
Has Fulltext
- no (3)
Keywords
- Sprachakustik (1)
In this paper we propose a method to determine the active speaker for each time-frequency point in the noisy signals of a microphone array. This detection is based on a statistical model where the speech signals as well as noise signals are assumed to be multivariate Gaussian random variables in the Fourier domain. Based on this model we derive a maximum-likelihood detector for the active speaker. The decision is based on the a posteriori signal to noise ratio (SNR) of a speaker dependent max-SNR beamformer.
This paper studies suitable models for the identification of nonlinear acoustic systems. A cascaded structure of nonlinear filters is proposed that contains several parallel branches, consisting of polynomial functions followed by a linear filter for each order of nonlinearity. The second order of nonlinearity is additionally modelled with a parallel branch, containing a Volterra filter. These are followed by a long linear FIR filter that is able to model the room acoustics. The model is applied to the identification of a tube power amplifier feeding a guitar loudspeaker cabinet in an acoustic room. The adaptive identification is performed by the normalized least mean square (NLMS) algorithm. Compared with a generalized polynomial Hammerstein (GPH) model, the accuracy in modelling the dedicated real world system can be improved to a greater extend than increasing the order of nonlinearity in the GPH model.
The multichannel Wiener filter (MWF) is a well-established noise reduction technique for speech processing. Most commonly, the speech component in a selected reference microphone is estimated. The choice of this reference microphone influences the broadband output signal-to-noise ratio (SNR) as well as the speech distortion. Recently, a generalized formulation for the MWF (G-MWF) was proposed that uses a weighted sum of the individual transfer functions from the speaker to the microphones to form a better speech reference resulting in an improved broadband output SNR. For the MWF, the influence of the phase reference is often neglected, because it has no impact on the narrow-band output SNR. The G-MWF allows an arbitrary choice of the phase reference especially in the context of spatially distributed microphones.
In this work, we demonstrate that the phase reference determines the overall transfer function and hence has an impact on both the speech distortion and the broadband output SNR. We propose two speech references that achieve a better signal-to-reverberation ratio (SRR) and an improvement in the broadband output SNR. Both proposed references are based on the phase of a delay-and-sum beamformer. Hence, the time-difference-of-arrival (TDOA) of the speech source is required to align the signals. The different techniques are compared in terms of SRR and SNR performance.