In-place transposition of a matrix

The problem is, that the task is set uncorrectly. If you would meant by "the same place" use of the same matrix, it is a correct task. But when you are talking about writing down to the same area in memory, " the matrix is represented as a single array of size m*n", you have to add how is it represented there. Otherwards it is enough to change nothing except the function that reads that matrix - simply swap indexes in it.

You want to transpose the matrix representation in memory so, that the reading/setting function for this matrix by indexes remains the same. Don't you?

Also, we can't write down the algorithm not knowing, is the matrix written in memory by rows or by columns. OK, let's say it is written by rows. Isn't it?

If we set these two lacking conditions, the task becomes correct and is not hard to be solved.

Simply we should take every element in the matrix by linear index, find its row/column pair, transpose it, find another resulting linear index and put the value into the new place. The problem is that the transformation is autosymmetric only in the case of square matrices, so it really could not be done in site. Or it could, if we find the whole index transformation map and later use it on matrix.

Starting matrix A:
m- number of rows
n- number of columns
nm - number of elements
li - linear index
i - column number
j - row number

resulting matrix B:
lir - resulting linear index

Transforming array trans

//preparation
for (li=0;li<nm;li++){
    j=li / n;
    i=li-j*n;
    lir=i*m+j;
    trans[li]=lir;
}

// transposition
for (li=0;li<nm;li++){
   cur=li;
   lir=trans[cur];
   temp2=a[lir];
   cur=lir;
   while (cur!=li){
      lir=trans[cur];
      temp1=a[cur];
      a[cur]=temp2;
      temp2=temp1;
      check[cur]=1;
      cur=lir;
   }
}

Such auto transposing has sense only if there are heavy elements in cells.

It is possible to realize trans[] array as a function.


Doing this efficiently in the general case requires some effort. The non-square and in- versus out-of-place algorithms differ. Save yourself much effort and just use FFTW. I previously prepared a more complete write up, including sample code, on the matter.


Inspired by the Wikipedia - Following the cycles algorithm description, I came up with following C++ implementation:

#include <iostream>  // std::cout
#include <iterator>  // std::ostream_iterator
#include <algorithm> // std::swap (until C++11)
#include <vector>

template<class RandomIterator>
void transpose(RandomIterator first, RandomIterator last, int m)
{
    const int mn1 = (last - first - 1);
    const int n   = (last - first) / m;
    std::vector<bool> visited(last - first);
    RandomIterator cycle = first;
    while (++cycle != last) {
        if (visited[cycle - first])
            continue;
        int a = cycle - first;
        do  {
            a = a == mn1 ? mn1 : (n * a) % mn1;
            std::swap(*(first + a), *cycle);
            visited[a] = true;
        } while ((first + a) != cycle);
    }
}

int main()
{
    int a[] = { 0, 1, 2, 3, 4, 5, 6, 7 };
    transpose(a, a + 8, 4);
    std::copy(a, a + 8, std::ostream_iterator<int>(std::cout, " "));
}

The program makes the in-place matrix transposition of the 2 × 4 matrix

0 1 2 3
4 5 6 7

represented in row-major ordering {0, 1, 2, 3, 4, 5, 6, 7} into the 4 × 2 matrix

0 4
1 5
2 6
3 7

represented by the row-major ordering {0, 4, 1, 5, 2, 6, 3, 7}.

The argument m of transpose represents the rowsize, the columnsize n is determined by the rowsize and the sequence size. The algorithm needs m × n bits of auxiliary storage to store the information, which elements have been swapped. The indexes of the sequence are mapped with the following scheme:

0 → 0
1 → 2
2 → 4
3 → 6
4 → 1
5 → 3
6 → 5
7 → 7

The mapping function in general is:

idx → (idx × n) mod (m × n - 1) if idx < (m × n), idx → idx otherwise

We can identify four cycles within this sequence: { 0 }, { 1, 2, 4 }, {3, 5, 6} and { 7 }. Each cycle can be transposed independent of the other cycles. The variable cycle initially points to the second element (the first does not need to be moved because 0 → 0). The bit-array visited holds the already transposed elements and indicates, that index 1 (the second element) needs to be moved. Index 1 gets swapped with index 2 (mapping function). Now index 1 holds the element of index 2 and this element gets swapped with the element of index 4. Now index 1 holds the element of index 4. The element of index 4 should go to index 1, it is in the right place, transposing of the cycle has finished, all touched indexes have been marked visited. The variable cycle gets incremented till the first not visited index, which is 3. The procedure continues with this cycle till all cycles have been transposed.