Performance improvement for mirroring bit matrix around the diagonal

You will probably find that BitVector performs a good deal better than BitArray.

BitVector32 is more efficient than BitArray for Boolean values and small integers that are used internally. A BitArray can grow indefinitely as needed, but it has the memory and performance overhead that a class instance requires. In contrast, a BitVector32 uses only 32 bits.

http://msdn.microsoft.com/en-us/library/system.collections.specialized.bitvector32.aspx

If you initialize an array of BitVector32 and operate on those, it should be faster than operating on BitArray as you do now.

You may also get a performance boost if you use one thread to perform the mirroring and a second thread to perform the analysis of consecutive reads. The Task Parallel Library Dataflow provides a nice framework for that type of solution. You could have one Source Block to acquire the data buffer, one Transform Block to perform the mirroring, and one Target Block to perform the data processing.


The obvious solution is just extracting the bits and combining them again. You can do it with a loop but since it uses both left and right shift at the same time, otherwise you need a negative shift amount, so I unrolled it for easier understanding and more speed

out[0] = ((rxData[0] & 0x80)     )  | ((rxData[1] & 0x80) >> 1) | ((rxData[2] & 0x80) >> 2) | ((rxData[3] & 0x80) >> 3) |
         ((rxData[4] & 0x80) >> 4)  | ((rxData[5] & 0x80) >> 5) | ((rxData[6] & 0x80) >> 6) | ((rxData[7] & 0x80) >> 7);

out[1] = ((rxData[0] & 0x40) << 1)  | ((rxData[1] & 0x40)     ) | ((rxData[2] & 0x40) >> 1) | ((rxData[3] & 0x40) >> 2) |
         ((rxData[4] & 0x40) >> 3)  | ((rxData[5] & 0x40) >> 4) | ((rxData[6] & 0x40) >> 5) | ((rxData[7] & 0x40) >> 6);

out[2] = ((rxData[0] & 0x20) << 2)  | ((rxData[1] & 0x20) << 1) | ((rxData[2] & 0x20)     ) | ((rxData[3] & 0x20) >> 1) |
         ((rxData[4] & 0x20) >> 2)  | ((rxData[5] & 0x20) >> 3) | ((rxData[6] & 0x20) >> 4) | ((rxData[7] & 0x20) >> 5);

out[3] = ((rxData[0] & 0x10) << 3)  | ((rxData[1] & 0x10) << 2) | ((rxData[2] & 0x10) << 1) | ((rxData[3] & 0x10)     ) |
         ((rxData[4] & 0x10) >> 1)  | ((rxData[5] & 0x10) >> 2) | ((rxData[6] & 0x10) >> 3) | ((rxData[7] & 0x10) >> 4);

out[4] = ((rxData[0] & 0x08) << 4)  | ((rxData[1] & 0x08) << 3) | ((rxData[2] & 0x08) << 2) | ((rxData[3] & 0x08) << 1) |
         ((rxData[4] & 0x08)    )   | ((rxData[5] & 0x08) >> 1) | ((rxData[6] & 0x08) >> 2) | ((rxData[7] & 0x08) >> 3);

out[5] = ((rxData[0] & 0x04) << 5)  | ((rxData[1] & 0x04) << 4) | ((rxData[2] & 0x04) << 3) | ((rxData[3] & 0x04) << 2) |
         ((rxData[4] & 0x04) << 1)  | ((rxData[5] & 0x04)     ) | ((rxData[6] & 0x04) >> 1) | ((rxData[7] & 0x04) >> 2);

out[6] = ((rxData[0] & 0x02) << 6)  | ((rxData[1] & 0x02) << 5) | ((rxData[2] & 0x02) << 4) | ((rxData[3] & 0x02) << 3) |
         ((rxData[4] & 0x02) << 2)  | ((rxData[5] & 0x02) << 1) | ((rxData[6] & 0x02)     ) | ((rxData[7] & 0x02) >> 1);

out[7] = ((rxData[0] & 0x01) << 7)  | ((rxData[1] & 0x01) << 6) | ((rxData[2] & 0x01) << 5) | ((rxData[3] & 0x01) << 4) |
         ((rxData[4] & 0x01) << 3)  | ((rxData[5] & 0x01) << 2) | ((rxData[6] & 0x01) << 1) | ((rxData[7] & 0x01)     );