GCC -mthumb against -marm

Thumb is not the older instruction-set, but in fact the newer one. The current revision being Thumb-2, which is a mixed 16/32-bit instruction set. The Thumb1 instruction set was a compressed version of the original ARM instruction set. The CPU would fetch the the instruction, decompress it into ARM and then process it. These days (ARMv7 and above), Thumb-2 is preferred for everything but performance critical or system code. For example, GCC will by default generate Thumb2 for ARMv7 (Like your Tegra3), as the higher code density provided by the 16/32-bit ISA allows for better icache utilization. But this is something which is very hard to measure in a normal benchmark, because most benchmarks will fit into the L1 icache anyway.

For more information check the Wikipedia site: http://en.wikipedia.org/wiki/ARM_architecture#Thumb