How do you remove noise to detect just the human voice?

A lot depends on your specific data. But if the noise is far from voice in frequency domain there is a simple brute-force trick of cutting off/out "bad" frequencies using wavelets. Let's import some sample recording:

voice = ExampleData[{"Sound", "Apollo11ReturnSafely"}]

enter image description here

WaveletScalogram is great for visualizing voice versus noise features:

cwt = ContinuousWaveletTransform[voice, GaborWavelet[6]];
WaveletScalogram[cwt, ColorFunction -> "AvocadoColors", ColorFunctionScaling -> False]

enter image description here

Voice is more rich and irregular in structure, noise is more monotonic and repetitive. So now based on the visual we can formulate a logical condition to cut out the noisy octaves (numbers on vertical axes):

cwtCUT = WaveletMapIndexed[#1 0.0 &, cwt, {u_ /; u >= 6 && u < 9, _}];
WaveletScalogram[cwtCUT, ColorFunction ->"AvocadoColors", ColorFunctionScaling -> False]

enter image description here

This is pretty brutal, like a surgery that cuts out good stuff too, because in this cases some voice frequencies blend with noise and we lost them. But it roughly works - signal is cleaner. You can hear how many background noises were suppressed (a few still stay though) - use headphones or good speakers. If in your cases noise is even further from voice in frequency domain - it will work much better.

InverseContinuousWaveletTransform[cwtCUT]

enter image description here


What you need is BandpassFilter, which is new in version 9. Assuming your audio is sampled at 22400 Hz, you can do:

BandpassFilter[data, {60 π, 180 π}, SampleRate -> 22400]

to filter it to between 60-180 Hz.


About a year ago,I saw a demo in Labview that can detect the voice of killer whale in a setting of the sound of seawater.

This image I serched from the Internet because I forgot where to find this demo.

I want to try the similar thing in Mathematica. Based upon Vitaliy Kaurov's approach:

voice = ExampleData[{"Sound", "Apollo11ReturnSafely"}];
data = voice[[1, 1, 1]]; r = voice[[1, 2]];
cwt = ContinuousWaveletTransform[data, 
GaborWavelet[6]];(*If you set cwt=ContinuousWaveletTransform[data,GaborWavelet[6],{Automatic,8}];
you will get more accurate result.But you must re-extract the interest region*)
WaveletScalogram[cwt, ColorFunction -> "AvocadoColors", ColorFunctionScaling -> False]

It gives the the scalogram of the wave.

Then you can use mma graph tools to describe your outline.

Firstly, press Ctrl+D to open the graph tools.

Secondly, press the button in the lower right corner.

enter image description here

Thirdly, you can get the coordinates(use Ctrl+C and Ctrl+V).

My data is the following result

test = {{7183, 40.14}, {7309, 39.89}, {7771, 39.77}, {7939, 39.64}, {8065, 39.39}, {8863, 38.64}, {9913, 37.15}, {1.067*^4, 35.9}, {1.096*^4, 35.65}, {1.13*^4, 35.27}, {1.163*^4, 35.15}, {1.201*^4, 35.02}, {1.247*^4, 35.02}, {1.306*^4, 35.02}, {1.369*^4, 35.27}, {1.428*^4, 35.65}, {1.47*^4, 35.9}, {1.495*^4, 36.15}, {1.52*^4, 36.4}, {1.541*^4, 36.77}, {1.587*^4, 37.27}, {1.629*^4, 37.64}, {1.671*^4, 38.02}, {1.726*^4, 38.14}, {1.789*^4, 38.27}, {1.873*^4, 38.27}, {1.957*^4, 38.27}, {2.02*^4, 38.02}, {2.062*^4, 38.02}, {2.083*^4, 38.02}, {2.104*^4, 38.14}, {2.129*^4, 38.14}, {2.184*^4, 38.14}, {2.238*^4, 37.77}, {2.276*^4, 37.39}, {2.314*^4, 37.02}, {2.343*^4, 36.65}, {2.373*^4, 36.4}, {2.41*^4, 35.77}, {2.427*^4, 35.52}, {2.431*^4, 35.02}, {2.415*^4, 33.53}, {2.402*^4, 32.9}, {2.373*^4, 32.53}, {2.322*^4, 32.28}, {2.314*^4, 32.15}, {2.259*^4, 32.15}, {2.217*^4, 32.15}, {2.196*^4, 32.03}, {2.179*^4, 32.15}, {2.163*^4, 31.91}, {2.121*^4, 31.53}, {2.104*^4,31.16}, {2.066*^4, 30.91}, {2.033*^4, 30.66}, {1.999*^4, 30.53}, {1.961*^4, 30.53}, {1.911*^4, 30.53}, {1.852*^4, 30.53}, {1.81*^4, 30.53}, {1.751*^4, 30.66}, {1.709*^4, 30.91}, {1.671*^4, 30.91}, {1.646*^4, 31.03}, {1.625*^4, 31.03}, {1.6*^4, 31.03}, {1.579*^4, 30.91}, {1.562*^4, 30.66}, {1.537*^4, 30.16}, {1.512*^4, 29.91}, {1.495*^4, 29.66}, {1.478*^4, 29.53}, {1.449*^4, 29.41}, {1.394*^4, 29.28}, {1.357*^4, 29.28}, {1.31*^4, 29.41}, {1.256*^4, 29.66}, {1.214*^4, 29.78}, {1.176*^4, 29.78}, {1.142*^4, 29.78}, {1.088*^4, 29.78}, {1.046*^4, 29.91}, {1.021*^4, 29.91}, {9913, 29.78}, {9745, 29.53}, {9493, 29.28}, {9241, 28.91}, {8947, 28.66}, {8737, 28.29}, {8527, 27.91}, {8317,     27.54}, {8065, 27.16}, {7855, 26.66}, {7603, 26.04}, {7267, 25.54}, {6973, 24.92}, {6806, 24.42}, {6554, 24.04}, {6344, 23.67}, {6176, 23.42}, {6050, 23.3}, {5840, 23.17}, {5714, 23.17}, {5672, 23.67}, {5588, 24.42}, {5546, 25.29}, {5546,     26.54}, {5546, 27.66}, {5546, 28.41}, {5546, 29.16}, {5588, 30.16}, {5630, 31.16}, {5630, 31.91}, {5672, 32.78}, {5672, 33.53}, {5672, 34.15}, {5672, 34.77}, {5672, 35.52}, {5714, 36.27}, {5714, 36.77}, {5714, 37.39}, {5714, 37.77}, {5756, 38.02}, {5756, 38.39}, {5840, 38.77}, {6008, 39.39}, {6092,     39.52}, {6176, 39.64}, {6302, 39.89}, {6428, 39.89}, {6554, 40.14}, {6638, 40.14}, {6764, 40.14}, {6848, 40.14}};
ListPlot@test

Then I define a function to transform the coordinates to the coordinates in WaveletScalogram:

g[{x_, y_}] := 
Module[{a = 
 Floor[(cwt["Octaves"] + 1) - y/cwt["Voices"], 
  1./cwt["Voices"]]}, {x, {Floor[a], 
 Floor[(a - Floor[a])*cwt["Voices"]] + 1}}];

In addition, I define a function to smoothen the coordinates:

smooth[lis_] := 1/3*(Total /@ Partition[RotateRight@lis, 3, 1, 1])

And

smoothtestdata = smooth@test; {ymin, ymax} = 
Through[{Ceiling@Min@# &, Floor@Max@# &}[smoothtestdata[[All, 2]]]];
WaveletCoordinate = g /@ (Round@
Module[{gra}, 
 gra = ListLinePlot[Append[smoothtestdata, smoothtestdata[[1]]], 
   MeshFunctions -> Function[{x, y}, y], 
   Mesh -> {Range[ymin, ymax, 1]}];
 Cases[Normal@gra, Point[ptlist_] :> ptlist, Infinity] // 
  SortBy[#, Last] &])

I get the result:

{{5638, {8, 1}}, {6541, {8, 1}}, {7035, {7, 4}}, {5578, {7, 4}}, {5552, {7, 3}}, {7534, {7, 3}}, {5546, {7, 2}}, {8022, {7, 2}}, {8576, {7, 1}}, {5546, {7, 1}}, {5556, {6, 4}}, {9273, {6, 4}}, {5583, {6, 3}}, {15207, {6, 3}}, {16069, {6, 2}}, {16414, {6, 2}}, {20768, {6, 2}}, {5614, {6, 2}}, {5645, {6, 1}}, {21748, {6, 1}}, {23970, {5, 4}}, {5663, {5, 4}}, {24177, {5, 3}}, {5672, {5, 3}}, {5676, {5, 2}}, {24236, {5, 2}}, {5696, {5, 1}}, {23957, {5, 1}}, {14766, {5, 1}}, {10686, {5, 1}}, {5714, {4, 4}}, {15657, {4, 4}}, {23130, {4, 4}}, {9977, {4, 4}}, {9241, {4, 3}}, {5739, {4, 3}}, {16923, {4, 3}}, {21869, {4, 3}}, {8466, {4, 2}}, {5913, {4, 2}}, {6464, {4, 1}}, {7255, {4, 1}}}  

The first element of each sublist is time (Surely, I have not considered the SampleRate now), the second is coordinate in wavelet ({Octaves,Voices})

In order to detect the interest region,I define a function.

f[lis_, pos_] := 
Module[{poslen = Length@pos, temp}, 
temp = ReplacePart[lis, i_ /; (i < pos[[1]]) -> 0];
Do[temp = ReplacePart[temp, 
 i_ /; (pos[[index]] < i < pos[[index + 1]]) -> 0], {index, 2, poslen - 2, 2}];
temp = ReplacePart[temp, i_ /; (i > pos[[poslen]]) -> 0]; temp]

Finally, set the irrelevant region to zero:

Module[{temp, tempwavelet = cwt},
Do[temp = Transpose[GatherBy[WaveletCoordinate, Last][[i]]]; 
tempwavelet = WaveletMapIndexed[f[#, Sort@temp[[1]]] &, tempwavelet, 
temp[[2, 1]]],
{i, 1, Length@GatherBy[WaveletCoordinate, Last]}]; 
tempwavelet = 
WaveletMapIndexed[0.*# &, tempwavelet, 
Except[Alternatives @@ WaveletCoordinate[[All, 2]]]]; tempwavelet]
WaveletScalogram[%, ColorFunction -> "AvocadoColors",  ColorFunctionScaling -> False]

enter image description here

If we transform the data to sound:

SampledSoundList[InverseContinuousWaveletTransform[%%], r] // Sound

we get:

enter image description here

You can hear the human voice more clearly! :)

About the function f: Think about the following graph:

enter image description here

If the orange region is data region,the green region is my interest,I want to extract the octave 2 and voice 1,I can use the code:

f[Range[10], {1, 6}](*Because my interest time is 1-6*)
(*result: {1,2,3,4,5,6,0,0,0,0}*)

Extract the octave 3 and voice 3:

f[Range[10], {4, 8}](*Because my interest time is 4-8*)
(*result: {0,0,0,4,5,6,7,8,0,0}*)

Extract the octave 3 and voice 4:

f[Range[10], {5,7}](*Because my interest time is 5-7*)
(*result: {0,0,0,0,5,6,7,0,0,0}*)

So If combine this function and WaveletMapIndexed,we can extract the data.

Let me guess,If we don't extract by hand,otherwise,use the picture processing to get the outline and remove the noise color from the voice color,What's like?