Blind Source Separation (BSS) of convolutive speech mixtures

 

A short list of sound sources
Collection of .wav files, useful to test BSS algorithms. All sources are sampled at 16 KHz.
Sources
Duration
(in seconds)
Description
SPEECH SIGNALS
Sources
Duration
(in seconds)
Description
MUSIC SIGNALS
s1
24
Male, list of english words
s10
30
Mandolin
s2
30
Male, theater
s11
30
Piano
s3
30
Male, hypno
s12
30
Piano
s4
30
Female, sentences
s13
30
Trumpet
s5
29
Female, numbers
s14
30
Guitar
s6
30
Male, numbers
s15
30
Vivaldi
s7
30
Male, poem
s16
30
Loopbass
s8
28
Female, sentences
s17
30
Herbalizer
s9
29
Male, sentences
s18
30
Hooverphonic


Creation of Multichannel Mixtures
Find real-world measured Room Impulse Responses (RIRs) or artificial RIRs and convolve the original sources with these RIRs.

Artificial RIRs (Example of online available room acoustics simulator)
Index on code
Link
(A1)
here
(A2)
here
(A3)
here

These codes implement the acoustic model described in paper:
J . Allen and D. Berkley, Image method for efficiently simulating small-room acoustics, Journal
of the Acoustic Society of America, vol. 65, no. 4, April 1979.
These acoustic simulators allow to set many parameters (room dimensions, source and microphones positions, reverberation time, etc)

 Measured RIRs
Index
Link
Description
(R1)
here Measurements done by A. Westner. Uses roommix.m Matlab function to create mixtures.
One can choose between 8 preset positions for the sources, and 8 positions for the microphones.
Note:  the original roommix.m code has to be slightly modified by downsampling the RIRs
in order to match the sampling rate of the sources (16KHz for the sources given in this webpage).
(R2)
here Use RIRs measurements conducted at McMaster University in the context of hearing aid design. Data here.
Note: these measurements have been conducted with 2 microphones.
See paper: L. Trainor, R. Sonnadara, K. Wiklund, J. Bondy, S. Gupta, S. Becker, I.-C. Bruce and S. Haykin, Development of a flexible, realistic hearing in noise test environment (R-HINT-E), Signal Processing, vol. 84, no. 2, pp. 299-309, Feb. 2004.
(R3)
here Another data-base with real-world RIRs measured with 2 microphones, in different types of rooms.



A (non-exhaustive) list of  BSS algorithms for convolutive speech mixtures

Description:  Links to online available algorithms that exploit non-stationarity nature of speech signals  and solve the BSS problem in the frequency domain by Joint-Approximate-Diagonalization (JAD) of a set of autocorrelation matrices for each frequency-bin (under assumption of uncorrelated source signals).

Code 1:
  L. Parra and C. Spence, Convolutive blind separation of non-stationary sources, IEEE Trans. on Speech and Audio Processing, vol. 8, no. 3, pp. 320-327, March 2000. Code downloadable here.

Code 2: 
K. Rahbar and J.-P. Reilly, A frequency domain method for blind source separation of convolutive audio mixtures, IEEE Trans. on Speech and Audio Processing, vol. 13, no. 5, pp. 832-844, 2005. Code downloadable here.

Code 3:
  D.-T. Pham, C. Servière and H. Boumaraf, Blind separation of speech mixtures based on nonstationarity, in Proc. ISSPA'03, vol. 2, pp. 73-76, 2003. Code downloadable here  (code proposed for the 2 by 2 case only).



Our PARAFAC-based algorithm for BSS of convolutive speech mixtures
Description and Results of experiments

Paper: D. Nion, K. N. Mokios, N. D. Sidiropoulos and A. Potamianos, Batch and adaptive PARAFAC-based blind separation of convolutive speech mixtures, IEEE Trans. on Audio, Speech and Language Processing, June 2008, submitted.

Description:
We exploit the non-stationarity nature of speech signals and we show the classical JAD problem for each frequency is equivalent to a PARAFAC decomposition for each frequency. After the PARAFAC-based separation stage, we use a novel and efficient permutation-correction scheme based on k-means clustering of the source power-profiles. Two key features of our algorithm are:
i) Better separation performance than codes 1, 2 and 3, at a lower complexity.
ii) Uniqueness properties of PARAFAC allow to deal with some under-determined cases, i.e., the mixing matrix can be estimated in a unique way with more sources than microphones. The demixing matrix is then built by Capon beamforming, which allows substantial reduction of cross-talk.


Parameters Description
Tp 
(in seconds)
Duration of each epoch, i.e. the recordings are segmented into P successive frames of duration Tp.
F
FFT length. Within each sub-block p, we perform the FFT of the consecutive overlapping frames  (we used an overlap coefficient of 0.75) of F samples, with a Hanning window of F samples. Then, we compute the sample mean estimate of the autocorrelation array of the recorded signals for each sub-block.
T60
(in seconds)
Reverberation time.
J
Number of microphones

In the following experiments, we use RIRs generated by (A1), with the following dimensions of the room and locations of sources and microphones.

Coordinates
(meters)
Room
s1
s2
s3
s4
s5
s6
s7
mic1
mic2
mic3
mic4
mic5
mic6
mic7
mic8
x
12
2
2
2
2
2
2
2
11
11
11
11
11
11
11
11
y
9
8
1
5
3
7
4
6
5.2
5.6
6
6.4
6.8
7.2
7.6
8
z
3
1.6
1.6
1.6
1.6
1.6
1.6
1.6
1.6
1.6
1.6
1.6
1.6
1.6
1.6
1.6

Here are the typical RIRs generated, between user 1 and microphone 1. Left: T60=200ms. Right: T60=300ms.
RIR T60=200msRIR T60=300ms


EXPERIMENT 1:         Mixture of 2 sources

Type of sources
Speech only
Speech only
Speech only
Speech only
Speech and Music
Music only
J
2
8
2
8
4
8
T60
0.2
0.2
0.4
0.4
0.2
0.2
F
2048
2048
2048
2048
2048
2048
Tp
0.4
0.4
0.4
0.4
0.2
0.2
Original sources
s1  s2
s1  s2
s1  s2
s1  s2
s1  s2
s1  s2
Mixtures
mic1  mic2
mic1 mic2  mic3  mic4
mic5  mic6  mic7  mic8
mic1  mic2 mic1 mic2  mic3  mic4
mic5  mic6  mic7  mic8
mic1  mic2
mic3  mic4
mic1  mic2  mic3  mic4
mic5  mic6  mic7  mic8
Separated sources
y1  y2
y1  y2
y1  y2
y1  y2
y1  y2
y1  y2
Output SIR [dB]
19.1  17.9
28.9  24.4
9.4  9.6
16.7  14.8
24.6  16.9
18.4  17


EXPERIMENT 2:         Mixture of 3 sources


Type of sources
Speech only
Speech only
Speech only
Speech and Music
Music only
Music only
J
3
3
8
8
8
8
T60
0.2
0.3
0.3
0.2
0.2
0.2
F
4096
4096
4096
2048
2048
2048
Tp
0.5
0.5
0.5
0.2
0.2
0.2
Original sources
s1  s2  s3
s1  s2  s3
s1  s2  s3
s1  s2  s3
s1  s2  s3
s1  s2  s3
Mixtures
mic1  mic2  mic3
mic1  mic2  mic3 mic1  mic2  mic3  mic4
mic5  mic6  mic7  mic8
mic1  mic2  mic3  mic4
mic5  mic6  mic7  mic8
mic1  mic2  mic3  mic4
mic5  mic6  mic7  mic8
mic1  mic2  mic3  mic4
mic5  mic6  mic7  mic8
Separated sources
y1  y2  y3
y1  y2  y3 y1  y2  y3 y1  y2  y3 y1  y2  y3 y1  y2  y3
Output SIR [dB]
19.8  18.8  16
12.7  12.5  7.7
18.3  14.6  15.4
20.4  14.7  18.6
11.9  11.8  9
13.1  16.2  14.3


EXPERIMENT 3:         Mixture of 4 sources


Type of sources
Speech only
Speech only
Speech and Music
J
8
8
8
T60
0.2
0.3
0.2
F
4096
4096
2048
Tp
0.5
0.5
0.2
Original sources
s1  s2  s3  s4
s1  s2  s3  s4 s1  s2  s3  s4
Mixtures
mic1  mic2  mic3  mic4
mic5  mic6  mic7  mic8
mic1  mic2  mic3  mic4
mic5  mic6  mic7  mic8
mic1  mic2  mic3  mic4
mic5  mic6  mic7  mic8
Separated sources
y1  y2  y3  y4
y1  y2  y3  y4
y1  y2  y3  y4
Output SIR [dB]
18.5  17.7  16.2  18.7
14.1  12.5  11.5  12.9
15.3  8.9  13.3 11.8


EXPERIMENT 4:         Mixture of 5 sources


Type of sources
Speech only
Speech and Music
J
8
8
T60
0.2
0.2
F
4096
2048
Tp
0.5
0.2
Original sources
s1  s2  s3  s4  s5
s1  s2  s3  s4  s5
Mixtures
mic1  mic2  mic3  mic4
mic5  mic6  mic7  mic8
mic1  mic2  mic3  mic4
mic5  mic6  mic7  mic8
Separated sources
y1  y2  y3  y4  y5
y1  y2  y3  y4  y5
Output SIR  [dB]
14.9  13.9  12.8  13.6  13.5
12.9  10.6  13  10.7  9.5


EXPERIMENT 5:         Mixture of 6 sources


Type of sources
Speech only
J
8
T60
0.2
F
4096
Tp
0.5
Original sources
s1  s2  s3  s4  s5  s6
Mixtures
mic1  mic2  mic3  mic4
mic5  mic6  mic7  mic8
Separated sources
y1  y2  y3  y4  y5  y6
Output SIR  [dB]
12.2  10.9  8.4  8.3  12.3  8.9


EXPERIMENT 6:         Mixture of 7 sources


Type of sources
Speech only
J
8
T60
0.2
F
4096
Tp
0.5
Original sources
s1  s2  s3  s4  s5  s6  s7
Mixtures
mic1  mic2  mic3  mic4
mic5  mic6  mic7  mic8
Separated sources
y1  y2  y3  y4  y5  y6  y7
Output SIR  [dB]
10.4  10.7  7.8  9.7  11.3  8.4  7.8


EXPERIMENT 8:         Under-determined cases


Type of sources
Speech only
Speech only Speech only
Number of sources
4
5
6
Number of microphones (J)
3
4
4
T60
0.2
0.2
0.2
F
2048
1024
1024
Tp
1
1
1
Original sources
s1  s2  s3  s4
s1  s2  s3  s4  s5
s1  s2  s3  s4  s5  s6
Mixtures
mic1 mic2  mic3
mic1  mic2  mic3  mic4
mic1  mic2  mic3  mic4
Separated sources
y1  y2  y3  y4
y1  y2  y3  y4  y5
y1  y2  y3  y4  y5  y6




Main page