- Home
- ENGR-E 511
- Machine Learning for Signal Processing (ENGR-E 511) Homework 3

$30.00

Category: ENGR-E 511

Description

5/5 - (6 votes)

P1: Instantaneous Source Separation [4 points]

1. As you might have noticed from my long hair, I’ve got a rock spirit. However, for this

homework I dabbled to compose jazz music. The title of the song is boring: Homework 3.

2. From x ica 1.wav to x ica 20.wav are 20 recordings of my song, Homework 3. Each recording

has N time domain samples. In this music there are K unknown number of musical sources

played at the same time. In other words, it could simulate the situation that 20 of my students

come to my gig and record my band’s play from 20 different locations (sounds unethical, so I

wouldn’t invite you guys, no worries). This can be seen as a situation where the source was

mixed up with a 20 × K mixing matrix A to the K sources to create the 20 channel mixture:

x1(t)

x2(t)

.

.

.

x20(t)

= A

s1(t)

s2(t)

.

.

.

sK(t)

(1)

3. As you’ve learned how to do source separation using ICA, you should be able to separate

them out into K clean speech sources.

1

4. First, you don’t like the fact that there are too many recordings for this separation problem,

because you have a feeling that the number of sources is a lot smaller than 20. So, you decided

to do a dimension redcution first, before you actually go ahead and do ICA. For this, you

choose to perform PCA with the whitening option. Apply your PCA algorithm on your data

matrix X, a 20 × N matrix. Don’t forget to whiten the data. Make a decision as to how

many dimensions to keep, which will correspond to your K. Hint: take a very close look at

your eigenvalues.

5. On your whitened/dimension reduced data matrix Z (K × N), apply ICA. At every iteration

of the ICA algorithm, use these as your update rules:

∆W ←

NI − g(Y )f(Y )

>

W

W ← W + ρ∆W

Y ← W Z

where

W : The ICA unmixing matrix you’re estimating

Y : The K × N source matrix you’re estimating

Z : Whitened/dim reduced version of your input (using PCA)

g(x) : tanh(x)

f(x) : x

3

ρ : learning rate

N : number of samples

6. Enjoy your separated music. Submit your separated .wav files, source code, and the convergence graph.

7. Implementation notes: Depending on the choice of the learning rate the convergence of the

ICA algorithm varies. But I always see the convergence in from 5 sec to 90 sec in my iMac.

P2: Ideal Masks [3 points]

1. piano.wav and ocean.wav are two sources you’re interested in. Load them separately and

apply STFT with 1024 point frames and 50% overlap. Use Hann windows. Let’s call these

two spectrograms S and N, respectively. Discard the complex conjugate part, so eventually

they will be an 513×158 matrix. Later on in this problem when you recover the time domain

signal out of this, you can easily recover the discarded half from the existing half so that

you can do inverse-DFT on the column vector of full 1024 points. Hint: Why 513, not 512?

Create a very short random signal with 16 samples, and do a DFT transform to convert it

into a spectrum of 16 complex values. Check out their complex coefficients to see why you

need N/2 + 1, not N/2

1

.

2. Now you build a mixture spectrogram by simply adding the two source spectrograms: X =

S + N.

1

I’ll allow you to use a toolbox for STFT, but I encourage you to use your own implementation.

WhatsApp us