sound processing - extract a repeated sample

category: music [glöplog]

Hi everybody,

Lets say i have a soundtrack. Inside, i have a sample that is repeated several times (a voice for example). The sound repeated is exactly the same, but each time with a different musical background (instruments or noise).

if i want to extract that sound (in this case the voice without the instruments), is there any technique, or tool that allows me to do that (based on the fact the sample is repeated several times)?

tecnique i was thinking :

1) find where the patterns are repeated and move each of them in separate samples
2) synchronise these samples (so they start at exactly at the same time and are the same length
3) run some algorythm that extract only what is similar to all samples. dont know if the algorythm will work on wave directly or on frequencies (after fft)

maybe this is a well know technique and this is possible to do with basic sound processing programs (like goldwave or audacity). they are lots of musicians here on pouet maybe there is some that can help.

added on the 2010-09-09 13:28:28 by Tigrou

Sounds like a derivate of http://en.wikipedia.org/wiki/Longest_common_substring_problem

added on the 2010-09-09 13:38:58 by rasmus/loonies

Maybe a set of fourier transformations of the repeated parts can be processed in some way in order to extract the sample?

added on the 2010-09-09 13:44:03 by Lord Graga

Sounds like a typical case for noise-reduction, doesn't it? Use the repeating sample as a noiseprint, use it to remove it from the original, then inverse the new sample with the old one and keep the results. Basically like when people create acapella-tracks.

Of course, the final quality will suck.

added on the 2010-09-09 13:50:51 by gloom

If you did 1) and 2) and the sample is *really* identical (this is important!), you could try this. I'm assuming you have two samples for now, but it might also work for more.

3) Convert both samples to mono.
4) Subtract one sample from the other. In a wave editor, this can be done by inverting the phase of one sample and mixpasting it with the other. You should hear both samples WITHOUT the common part now.
5) Now, mixpaste both original mono samples (not inverted). Now subtract the result from 4) from this mix and you should more or less get the original sample you want.

added on the 2010-09-09 13:52:03 by Saga Musix

Oh, and I actually think you have to one invert the sample which you inverted in step 4 also in step 5 again, before mixpasting them.

added on the 2010-09-09 13:54:31 by Saga Musix

-one

added on the 2010-09-09 13:54:41 by Saga Musix

@Saga: In theory that works perfectly, but you have a problem here with the framerate, the chance that you can _exactly_ allign the two samples is low, they'll always be a bit out of the wanted phase.
I tried it out once, and it does work, but you'll still hear the uncommon parts of the song faintly through it. if you can get those silent enough (be sure to put a gate on it) you'll be fine.

added on the 2010-09-09 14:43:28 by whynot2000

saga: That's what I said (badly, apparently). The idea is that the noiseprint would have to work as one of the phase opposites.

added on the 2010-09-09 14:46:04 by gloom

thx all for ur answers. i was just thinking the same as whynot2000 : aligning the two soundsample perfectly would be almost impossible.
i should try lord graga noise reduction technique, but it wont work for more than 2 samples

added on the 2010-09-09 14:53:47 by Tigrou

check the concepts of cross-correlation and auto-correlation, in time or frequency domain.

added on the 2010-09-09 14:54:36 by Navis

I was thinking roughly the same as saga, but with FFT data instead of the actual samples. Might help reduce the need to get them perfectly aligned.

added on the 2010-09-09 14:54:44 by psonice

psonice: That's what I have been trying to get across too, but apparently I suck at explaining things today. :)

added on the 2010-09-09 15:12:18 by gloom

Without any idea of what you want to achieve, another thing that might cause problems is if there's processing applied on the mix as a whole (aka "mastering").
Even if the sample used is identical, the small variations of [everything else] would result in the samples being technically identical. This means that the result after the phaseflip will not be totally clean.
If it weren't for this, getting a cappellas from singles with instrumental version would be a breeze (but it isn't).

added on the 2010-09-09 15:27:34 by lug00ber

gloom: re-reading what you said, yeah I see it now.. well, I do almost zero audio stuff, so I've no idea how noise cancellation is normally done, or what a noiseprint is. I'm guessing that would have made some sense to anyone doing audio stuff :)

added on the 2010-09-09 15:29:26 by psonice

FFT doesn't really help you much in reducing alignment constraints; subtraction in the frequency domain is exactly the same as subtraction in the time domain (DFT/FFT is a linear transformation).

As Navis says, the best bet for alignment is probably finding the point with maximum cross-correlation between the two samples. And of course, what lug00ber said.

There are also techniques for lead voice/instrument extraction that use the background music as a cue — basically the algorithm “learns” (via PCA and some sort of supervised learning) what components the instruments sound like, and then can use that to filter them out from the final mix. I'd guess this is overkill for what you're trying to achieve, though :-)

added on the 2010-09-09 16:48:10 by Sesse

Auto-correlation might get you the length of the sample (a peak). Or a look at the waveform.

If the sampler plays the sample exactly the same way (a) then split the soundtrack based on this length and _add_ these parts.

Ideally the voice will get a 6dB boost and the uncorrelated rest approximately a 3dB boost.

Note that the precondition (a) is unlikely to happen.

added on the 2010-09-09 16:55:01 by ponce

#ponce: Are you confusing autocorrelation with crosscorrelation? Autocorrelation is correlating one sample with itself, and by definition the peak will always be at A(0).

added on the 2010-09-09 18:01:07 by Sesse

Sorry, the correct symbol is R_xx(0).

added on the 2010-09-09 18:14:37 by Sesse

Nope.

I used it once for BPM detection and you have to ignore the base peak (ie, assuming a maximum possible BPM). Same problem here.

added on the 2010-09-09 18:15:27 by ponce

Ok, so the autocorrelation of the entire song — fair enough, if the two samples are indeed in the same track.

added on the 2010-09-09 18:16:08 by Sesse

On second thought, it might be more straightforward to just hand-pick the samples and add them by hand.

added on the 2010-09-09 18:19:31 by ponce

This looks nice: http://www.unmixingstation.com/

This one is free: http://www.elevayta.net/azuifgeh.htm

added on the 2010-09-09 18:21:34 by bdk

Melodyne can separate sources too with quite impressive results.

added on the 2010-09-09 18:23:11 by ponce

#ponce: Is it really as impressive as in the demo videos? I sort of always thought it was suspicious that they only did demos with one instrument at a time — and an acoustic guitar at that, about the simplest possible instrument to work with audio-wise.

added on the 2010-09-09 18:33:17 by Sesse

pouët.net

sound processing - extract a repeated sample

login