How to align similar TimeSeries like ImageAlign?

One way to approach this is with "Dynamic Time Warping". First, preprocess your data to get the MFCC coefficients and extract the data from the time series:

human = Audio["http://home.ustc.edu.cn/~xiaozh/SE/Audio/human.wav"];
hus = Audio["http://home.ustc.edu.cn/~xiaozh/SE/Audio/hus.wav"];
{humMFCC, husMFCC} = AudioLocalMeasurements[#, "MFCC"] & /@ {human, hus};
x = TimeSeriesMap[First, humMFCC]["Values"]; nX = Length[x];
y = TimeSeriesMap[First, husMFCC]["Values"]; nY = Length[y];

Now define the DTW method:

(*distance function*)
dist[s_, t_] := Abs[s - t];
(*boundary conditions*)
Clear[dtw];
dtw[1, 1] = dist[x[[1]], y[[1]]];
dtw[1, j_] := dtw[1, j] = dist[x[[1]], y[[j]]] + dtw[1, j - 1];
dtw[i_, 1] := dtw[i, 1] = dist[x[[i]], y[[1]]] + dtw[i - 1, 1];
(*main recursion*)   
dtw[i_, j_] := dtw[i, j] = dist[x[[i]], y[[j]]] + 
                           Min[dtw[i - 1, j - 1], dtw[i - 1, j], dtw[i, j - 1]];
(*finding best path through dtwMatrix*) 
pathFind[{i_, j_}] := Module[{nbhd}, 
   nbhd = {{i, Max[j - 1, 1]}, {Max[i - 1, 1], j}, {Max[i - 1, 1], Max[j - 1, 1]}}; 
   nbhd[[First[Ordering[Map[dtwMat[[#[[1]], #[[2]]]] &, nbhd]]]]]];

Finally, apply the DTW to your data:

distMat = Outer[dist, x, y];
dtwMat = dtwPath = Outer[dtw, Range[nX], Range[nY]];
bestPath = NestWhileList[pathFind, {nX, nY}, (#[[1]] > 1) || (#[[2]] > 1) &];
ArrayPlot[Reverse@#, Frame -> False] &@ ReplacePart[dtwPath, {{x_, y_} /; 
    MemberQ[bestPath, {x, y}] -> 0, {x_, y_} /; ! MemberQ[bestPath, {x, y}] -> 1}]

enter image description here

The picture represents one MFCC on the horizontal axis and the other on the vertical. The best path is the jagged diagonal line where the two time series are best aligned. This lines up the MFCCs of the two audio streams. The bestPath variable contains a collection of pairs {indexX, indexY} which show the optimal correspondence in the original sequences x and y, thus they can be used to index into x and y and so demonstrate the alignment. For example, here is a plot of the aligned first coefficients of the MFCCs:

indX = Reverse[Transpose[bestPath][[1]]];
indY = Reverse[Transpose[bestPath][[2]]];
ListLinePlot[{x[[indX]], y[[indY]]}, PlotStyle -> {Blue, Green}]

enter image description here

To realign the audio itself then requires taking this mapping and resampling the audio.

Update: thanks to partida for pointing out some indexing issues in the DTW.


In MMA11.0,there is a new function:WarpingCorrespondence

It makes the DTW(Dynamic time warping) very easy.

human = Audio["http://home.ustc.edu.cn/~xiaozh/SE/Audio/human.wav"];
hus = Audio["http://home.ustc.edu.cn/~xiaozh/SE/Audio/hus.wav"];
{humMFCC, husMFCC} = AudioLocalMeasurements[#, "MFCC"] & /@ {human, hus};
x = TimeSeriesMap[First, humMFCC]["Values"]; nX = Length[x];
y = TimeSeriesMap[First, husMFCC]["Values"]; nY = Length[y];

{n, m} = WarpingCorrespondence[x, y];
ListLinePlot[{x[[n]], y[[m]]}, PlotStyle -> {Blue, Green}]

enter image description here