BSS demo

Blind Source Separation (BSS) of convolutive speech mixtures

A short list of sound sources

Collection of .wav files, useful to test BSS algorithms. All sources are sampled at 16 KHz.

Sources	Duration (in seconds)	Description SPEECH SIGNALS	Sources	Duration (in seconds)	Description MUSIC SIGNALS
s1	24	Male, list of english words	s10	30	Mandolin
s2	30	Male, theater	s11	30	Piano
s3	30	Male, hypno	s12	30	Piano
s4	30	Female, sentences	s13	30	Trumpet
s5	29	Female, numbers	s14	30	Guitar
s6	30	Male, numbers	s15	30	Vivaldi
s7	30	Male, poem	s16	30	Loopbass
s8	28	Female, sentences	s17	30	Herbalizer
s9	29	Male, sentences	s18	30	Hooverphonic

Creation of Multichannel Mixtures

Find real-world measured Room Impulse Responses (RIRs) or artificial RIRs and convolve the original sources with these RIRs.

Artificial RIRs (Example of online available room acoustics simulator)

Index on code	Link
(A1)	here
(A2)	here
(A3)	here

These codes implement the acoustic model described in paper:
J . Allen and D. Berkley, Image method for efficiently simulating small-room acoustics, Journal of the Acoustic Society of America, vol. 65, no. 4, April 1979.
These acoustic simulators allow to set many parameters (room dimensions, source and microphones positions, reverberation time, etc)

Measured RIRs

Index	Link	Description
(R1)	here	Measurements done by A. Westner. Uses roommix.m Matlab function to create mixtures. One can choose between 8 preset positions for the sources, and 8 positions for the microphones. Note: the original roommix.m code has to be slightly modified by downsampling the RIRs in order to match the sampling rate of the sources (16KHz for the sources given in this webpage).
(R2)	here	Use RIRs measurements conducted at McMaster University in the context of hearing aid design. Data here. Note: these measurements have been conducted with 2 microphones. See paper: L. Trainor, R. Sonnadara, K. Wiklund, J. Bondy, S. Gupta, S. Becker, I.-C. Bruce and S. Haykin, Development of a flexible, realistic hearing in noise test environment (R-HINT-E), Signal Processing, vol. 84, no. 2, pp. 299-309, Feb. 2004.
(R3)	here	Another data-base with real-world RIRs measured with 2 microphones, in different types of rooms.

A (non-exhaustive) list of BSS algorithms for convolutive speech mixtures

Description: Links to online available algorithms that exploit non-stationarity nature of speech signals and solve the BSS problem in the frequency domain by Joint-Approximate-Diagonalization (JAD) of a set of autocorrelation matrices for each frequency-bin (under assumption of uncorrelated source signals).

Code 1: L. Parra and C. Spence, Convolutive blind separation of non-stationary sources, IEEE Trans. on Speech and Audio Processing, vol. 8, no. 3, pp. 320-327, March 2000. Code downloadable here.

Code 2: K. Rahbar and J.-P. Reilly, A frequency domain method for blind source separation of convolutive audio mixtures, IEEE Trans. on Speech and Audio Processing, vol. 13, no. 5, pp. 832-844, 2005. Code downloadable here.

Code 3: D.-T. Pham, C. Servière and H. Boumaraf, Blind separation of speech mixtures based on nonstationarity, in Proc. ISSPA'03, vol. 2, pp. 73-76, 2003. Code downloadable here (code proposed for the 2 by 2 case only).

Our PARAFAC-based algorithm for BSS of convolutive speech mixtures
Description and Results of experiments

Paper: D. Nion, K. N. Mokios, N. D. Sidiropoulos and A. Potamianos, Batch and adaptive PARAFAC-based blind separation of convolutive speech mixtures, IEEE Trans. on Audio, Speech and Language Processing, June 2008, submitted.

Description: We exploit the non-stationarity nature of speech signals and we show the classical JAD problem for each frequency is equivalent to a PARAFAC decomposition for each frequency. After the PARAFAC-based separation stage, we use a novel and efficient permutation-correction scheme based on k-means clustering of the source power-profiles. Two key features of our algorithm are:
i) Better separation performance than codes 1, 2 and 3, at a lower complexity.
ii) Uniqueness properties of PARAFAC allow to deal with some under-determined cases, i.e., the mixing matrix can be estimated in a unique way with more sources than microphones. The demixing matrix is then built by Capon beamforming, which allows substantial reduction of cross-talk.

Parameters	Description
Tp (in seconds)	Duration of each epoch, i.e. the recordings are segmented into P successive frames of duration Tp.
F	FFT length. Within each sub-block p, we perform the FFT of the consecutive overlapping frames (we used an overlap coefficient of 0.75) of F samples, with a Hanning window of F samples. Then, we compute the sample mean estimate of the autocorrelation array of the recorded signals for each sub-block.
T60 (in seconds)	Reverberation time.
J	Number of microphones

In the following experiments, we use RIRs generated by (A1), with the following dimensions of the room and locations of sources and microphones.

Coordinates (meters)	Room	s1	s2	s3	s4	s5	s6	s7	mic1	mic2	mic3	mic4	mic5	mic6	mic7	mic8
x	12	2	2	2	2	2	2	2	11	11	11	11	11	11	11	11
y	9	8	1	5	3	7	4	6	5.2	5.6	6	6.4	6.8	7.2	7.6	8
z	3	1.6	1.6	1.6	1.6	1.6	1.6	1.6	1.6	1.6	1.6	1.6	1.6	1.6	1.6	1.6

Here are the typical RIRs generated, between user 1 and microphone 1. Left: T60=200ms. Right: T60=300ms.
RIR T60=200ms

EXPERIMENT 1: Mixture of 2 sources

Type of sources	Speech only	Speech only	Speech only	Speech only	Speech and Music	Music only
J	2	8	2	8	4	8
T60	0.2	0.2	0.4	0.4	0.2	0.2
F	2048	2048	2048	2048	2048	2048
Tp	0.4	0.4	0.4	0.4	0.2	0.2
Original sources	s1 s2	s1 s2	s1 s2	s1 s2	s1 s2	s1 s2
Mixtures	mic1 mic2	mic1 mic2 mic3 mic4 mic5 mic6 mic7 mic8	mic1 mic2	mic1 mic2 mic3 mic4 mic5 mic6 mic7 mic8	mic1 mic2 mic3 mic4	mic1 mic2 mic3 mic4 mic5 mic6 mic7 mic8
Separated sources	y1 y2	y1 y2	y1 y2	y1 y2	y1 y2	y1 y2
Output SIR [dB]	19.1 17.9	28.9 24.4	9.4 9.6	16.7 14.8	24.6 16.9	18.4 17

EXPERIMENT 2: Mixture of 3 sources

Type of sources	Speech only	Speech only	Speech only	Speech and Music	Music only	Music only
J	3	3	8	8	8	8
T60	0.2	0.3	0.3	0.2	0.2	0.2
F	4096	4096	4096	2048	2048	2048
Tp	0.5	0.5	0.5	0.2	0.2	0.2
Original sources	s1 s2 s3	s1 s2 s3	s1 s2 s3	s1 s2 s3	s1 s2 s3	s1 s2 s3
Mixtures	mic1 mic2 mic3	mic1 mic2 mic3	mic1 mic2 mic3 mic4 mic5 mic6 mic7 mic8	mic1 mic2 mic3 mic4 mic5 mic6 mic7 mic8	mic1 mic2 mic3 mic4 mic5 mic6 mic7 mic8	mic1 mic2 mic3 mic4 mic5 mic6 mic7 mic8
Separated sources	y1 y2 y3	y1 y2 y3	y1 y2 y3	y1 y2 y3	y1 y2 y3	y1 y2 y3
Output SIR [dB]	19.8 18.8 16	12.7 12.5 7.7	18.3 14.6 15.4	20.4 14.7 18.6	11.9 11.8 9	13.1 16.2 14.3

EXPERIMENT 3: Mixture of 4 sources

Type of sources	Speech only	Speech only	Speech and Music
J	8	8	8
T60	0.2	0.3	0.2
F	4096	4096	2048
Tp	0.5	0.5	0.2
Original sources	s1 s2 s3 s4	s1 s2 s3 s4	s1 s2 s3 s4
Mixtures	mic1 mic2 mic3 mic4 mic5 mic6 mic7 mic8	mic1 mic2 mic3 mic4 mic5 mic6 mic7 mic8	mic1 mic2 mic3 mic4 mic5 mic6 mic7 mic8
Separated sources	y1 y2 y3 y4	y1 y2 y3 y4	y1 y2 y3 y4
Output SIR [dB]	18.5 17.7 16.2 18.7	14.1 12.5 11.5 12.9	15.3 8.9 13.3 11.8

EXPERIMENT 4: Mixture of 5 sources

Type of sources	Speech only	Speech and Music
J	8	8
T60	0.2	0.2
F	4096	2048
Tp	0.5	0.2
Original sources	s1 s2 s3 s4 s5	s1 s2 s3 s4 s5
Mixtures	mic1 mic2 mic3 mic4 mic5 mic6 mic7 mic8	mic1 mic2 mic3 mic4 mic5 mic6 mic7 mic8
Separated sources	y1 y2 y3 y4 y5	y1 y2 y3 y4 y5
Output SIR [dB]	14.9 13.9 12.8 13.6 13.5	12.9 10.6 13 10.7 9.5

EXPERIMENT 5: Mixture of 6 sources

Type of sources	Speech only
J	8
T60	0.2
F	4096
Tp	0.5
Original sources	s1 s2 s3 s4 s5 s6
Mixtures	mic1 mic2 mic3 mic4 mic5 mic6 mic7 mic8
Separated sources	y1 y2 y3 y4 y5 y6
Output SIR [dB]	12.2 10.9 8.4 8.3 12.3 8.9

EXPERIMENT 6: Mixture of 7 sources

Type of sources	Speech only
J	8
T60	0.2
F	4096
Tp	0.5
Original sources	s1 s2 s3 s4 s5 s6 s7
Mixtures	mic1 mic2 mic3 mic4 mic5 mic6 mic7 mic8
Separated sources	y1 y2 y3 y4 y5 y6 y7
Output SIR [dB]	10.4 10.7 7.8 9.7 11.3 8.4 7.8

EXPERIMENT 8: Under-determined cases

Type of sources	Speech only	Speech only	Speech only
Number of sources	4	5	6
Number of microphones (J)	3	4	4
T60	0.2	0.2	0.2
F	2048	1024	1024
Tp	1	1	1
Original sources	s1 s2 s3 s4	s1 s2 s3 s4 s5	s1 s2 s3 s4 s5 s6
Mixtures	mic1 mic2 mic3	mic1 mic2 mic3 mic4	mic1 mic2 mic3 mic4
Separated sources	y1 y2 y3 y4	y1 y2 y3 y4 y5	y1 y2 y3 y4 y5 y6

Main page