A short list of sound sources
Collection
of .wav files, useful to test BSS algorithms. All
sources are sampled at 16 KHz.
|
Sources |
Duration (in seconds) |
Description SPEECH SIGNALS |
Sources |
Duration (in seconds) |
Description MUSIC SIGNALS |
s1 |
24 |
Male, list
of english words |
s10 |
30 |
Mandolin |
s2 |
30 |
Male, theater |
s11 |
30 |
Piano |
s3 |
30 |
Male, hypno |
s12 |
30 |
Piano |
s4 |
30 |
Female,
sentences |
s13 |
30 |
Trumpet |
s5 |
29 |
Female,
numbers |
s14 |
30 |
Guitar |
s6 |
30 |
Male, numbers |
s15 |
30 |
Vivaldi |
s7 |
30 |
Male, poem |
s16 |
30 |
Loopbass |
s8 |
28 |
Female,
sentences |
s17 |
30 |
Herbalizer |
s9 |
29 |
Male,
sentences |
s18 |
30 |
Hooverphonic |
Creation of Multichannel
Mixtures
Find
real-world measured Room Impulse Responses (RIRs) or
artificial RIRs and convolve the original sources with these RIRs.
|
Index on code |
Link |
(A1) |
here |
(A2) |
here |
(A3) |
here |
Index |
Link |
Description |
(R1) |
here | Measurements
done by A. Westner. Uses roommix.m Matlab
function to create mixtures. One can choose between 8 preset positions for the sources, and 8 positions for the microphones. Note: the original roommix.m code has to be slightly modified by downsampling the RIRs in order to match the sampling rate of the sources (16KHz for the sources given in this webpage). |
(R2) |
here | Use RIRs
measurements conducted at McMaster University in the
context of hearing aid design. Data here. Note: these measurements have been conducted with 2 microphones. See paper: L. Trainor, R. Sonnadara, K. Wiklund, J. Bondy, S. Gupta, S. Becker, I.-C. Bruce and S. Haykin, Development of a flexible, realistic hearing in noise test environment (R-HINT-E), Signal Processing, vol. 84, no. 2, pp. 299-309, Feb. 2004. |
(R3) |
here | Another data-base with real-world RIRs measured with 2 microphones, in different types of rooms. |
A (non-exhaustive) list
of BSS algorithms for convolutive speech mixtures
|
Parameters | Description |
Tp (in seconds) |
Duration of each epoch, i.e. the recordings are segmented into P successive frames of duration Tp. |
F |
FFT length.
Within each sub-block p, we perform the FFT of the consecutive
overlapping frames (we used an overlap coefficient of 0.75) of F
samples, with a Hanning window of F samples. Then, we compute the
sample mean estimate of the autocorrelation array of the recorded
signals for each sub-block. |
T60 (in seconds) |
Reverberation
time. |
J |
Number of
microphones |
Coordinates (meters) |
Room |
s1 |
s2 |
s3 |
s4 |
s5 |
s6 |
s7 |
mic1 |
mic2 |
mic3 |
mic4 |
mic5 |
mic6 |
mic7 |
mic8 |
x |
12 |
2 |
2 |
2 |
2 |
2 |
2 |
2 |
11 |
11 |
11 |
11 |
11 |
11 |
11 |
11 |
y |
9 |
8 |
1 |
5 |
3 |
7 |
4 |
6 |
5.2 |
5.6 |
6 |
6.4 |
6.8 |
7.2 |
7.6 |
8 |
z |
3 |
1.6 |
1.6 |
1.6 |
1.6 |
1.6 |
1.6 |
1.6 |
1.6 |
1.6 |
1.6 |
1.6 |
1.6 |
1.6 |
1.6 |
1.6 |
Type of
sources |
Speech only |
Speech only |
Speech only |
Speech only |
Speech and
Music |
Music only |
J |
2 |
8 |
2 |
8 |
4 |
8 |
T60 |
0.2 |
0.2 |
0.4 |
0.4 |
0.2 |
0.2 |
F |
2048 |
2048 |
2048 |
2048 |
2048 |
2048 |
Tp |
0.4 |
0.4 |
0.4 |
0.4 |
0.2 |
0.2 |
Original
sources |
s1
s2 |
s1
s2 |
s1
s2 |
s1
s2 |
s1
s2 |
s1
s2 |
Mixtures |
mic1
mic2 |
mic1
mic2
mic3
mic4 mic5 mic6 mic7 mic8 |
mic1 mic2 | mic1
mic2
mic3
mic4 mic5 mic6 mic7 mic8 |
mic1
mic2 mic3 mic4 |
mic1
mic2
mic3
mic4 mic5 mic6 mic7 mic8 |
Separated
sources |
y1
y2 |
y1
y2 |
y1
y2 |
y1
y2 |
y1
y2 |
y1
y2 |
Output SIR
[dB] |
19.1
17.9 |
28.9
24.4 |
9.4 9.6 |
16.7
14.8 |
24.6
16.9 |
18.4 17 |
Type of
sources |
Speech only |
Speech only |
Speech only |
Speech and
Music |
Music only |
Music only |
J |
3 |
3 |
8 |
8 |
8 |
8 |
T60 |
0.2 |
0.3 |
0.3 |
0.2 |
0.2 |
0.2 |
F |
4096 |
4096 |
4096 |
2048 |
2048 |
2048 |
Tp |
0.5 |
0.5 |
0.5 |
0.2 |
0.2 |
0.2 |
Original
sources |
s1
s2
s3 |
s1
s2
s3 |
s1
s2
s3 |
s1
s2
s3 |
s1
s2
s3 |
s1
s2
s3 |
Mixtures |
mic1
mic2
mic3 |
mic1 mic2 mic3 | mic1
mic2
mic3
mic4 mic5 mic6 mic7 mic8 |
mic1
mic2
mic3
mic4 mic5 mic6 mic7 mic8 |
mic1
mic2
mic3
mic4 mic5 mic6 mic7 mic8 |
mic1
mic2
mic3
mic4 mic5 mic6 mic7 mic8 |
Separated
sources |
y1
y2
y3 |
y1 y2 y3 | y1 y2 y3 | y1 y2 y3 | y1 y2 y3 | y1
y2
y3 |
Output SIR
[dB] |
19.8
18.8 16 |
12.7
12.5 7.7 |
18.3
14.6 15.4 |
20.4
14.7 18.6 |
11.9
11.8 9 |
13.1
16.2 14.3 |
Type of
sources |
Speech only |
Speech only |
Speech and
Music |
J |
8 |
8 |
8 |
T60 |
0.2 |
0.3 |
0.2 |
F |
4096 |
4096 |
2048 |
Tp |
0.5 |
0.5 |
0.2 |
Original
sources |
s1
s2
s3
s4 |
s1 s2 s3 s4 | s1 s2 s3 s4 |
Mixtures |
mic1
mic2
mic3
mic4 mic5 mic6 mic7 mic8 |
mic1
mic2
mic3
mic4 mic5 mic6 mic7 mic8 |
mic1
mic2
mic3
mic4 mic5 mic6 mic7 mic8 |
Separated
sources |
y1
y2
y3
y4 |
y1
y2
y3
y4 |
y1
y2
y3
y4 |
Output SIR
[dB] |
18.5
17.7 16.2 18.7 |
14.1
12.5 11.5 12.9 |
15.3
8.9 13.3 11.8 |
Type of
sources |
Speech only |
Speech and
Music |
J |
8 |
8 |
T60 |
0.2 |
0.2 |
F |
4096 |
2048 |
Tp |
0.5 |
0.2 |
Original
sources |
s1
s2
s3
s4
s5 |
s1
s2
s3
s4
s5 |
Mixtures |
mic1
mic2
mic3
mic4 mic5 mic6 mic7 mic8 |
mic1
mic2
mic3
mic4 mic5 mic6 mic7 mic8 |
Separated
sources |
y1
y2
y3
y4
y5 |
y1 y2 y3 y4 y5 |
Output
SIR [dB] |
14.9
13.9 12.8 13.6 13.5 |
12.9
10.6 13 10.7 9.5 |
Type of
sources |
Speech only |
J |
8 |
T60 |
0.2 |
F |
4096 |
Tp |
0.5 |
Original
sources |
s1
s2
s3
s4
s5
s6 |
Mixtures |
mic1
mic2
mic3
mic4 mic5 mic6 mic7 mic8 |
Separated
sources |
y1
y2
y3
y4
y5
y6 |
Output
SIR [dB] |
12.2
10.9 8.4 8.3 12.3 8.9 |
Type of
sources |
Speech only |
J |
8 |
T60 |
0.2 |
F |
4096 |
Tp |
0.5 |
Original
sources |
s1
s2
s3
s4
s5
s6
s7
|
Mixtures |
mic1
mic2
mic3
mic4
mic5 mic6 mic7 mic8 |
Separated
sources |
y1
y2
y3
y4
y5
y6
y7 |
Output
SIR [dB] |
10.4
10.7 7.8 9.7 11.3 8.4 7.8 |
Type of sources |
Speech only |
Speech only | Speech only |
Number of sources |
4 |
5 |
6 |
Number of microphones (J) |
3 |
4 |
4 |
T60 |
0.2 |
0.2 |
0.2 |
F |
2048 |
1024 |
1024 |
Tp |
1 |
1 |
1 |
Original sources |
s1
s2
s3
s4 |
s1
s2
s3
s4
s5 |
s1
s2
s3
s4
s5
s6 |
Mixtures |
mic1
mic2
mic3 |
mic1
mic2
mic3
mic4 |
mic1 mic2 mic3 mic4 |
Separated sources |
y1
y2
y3
y4 |
y1
y2
y3
y4
y5 |
y1
y2
y3
y4
y5
y6 |