How to speed up `RotationMatrix`?

Having this problem so often, I also generated some tools to handle it which I'd like to share. This is the code (along with a usage message which is basically a small modification of RotationMatrix::usage. Note that it does not handle exceptions and that it assumes that a C compiler is installed.

Quiet@Block[{angle, v, vv, u, uu, ww, e1, e2, e2prime, e3},
   uu = Table[u[[i]], {i, 1, 3}];
   vv = Table[v[[i]], {i, 1, 3}];

   rotationMatrix2D = Compile[
     {{angle, _Real}},
     {{Cos[angle], -Sin[angle]}, {Sin[angle], Cos[angle]}},
     CompilationTarget -> "C",
     RuntimeAttributes -> {Listable},
     Parallelization -> True,
     RuntimeOptions -> "Speed"
     ];

   With[{code = N[
        Simplify[ComplexExpand[RotationMatrix[angle, uu]], u[[1]] \[Element] Reals]
        ] /. Part -> Compile`GetElement},
    rotationMatrix3DAngleVector = Compile[
      { {angle, _Real},{u, _Real, 1}},
      code,
      CompilationTarget -> "C",
      RuntimeAttributes -> {Listable},
      Parallelization -> True,
      RuntimeOptions -> "Speed"
      ]
    ];

   ww = Cross[uu, vv];
   e2 = Cross[ww, uu];
   e2prime = Cross[ww, vv];

   With[{code = N[
        Plus[
         KroneckerProduct[vv, uu]/Sqrt[uu.uu]/Sqrt[vv.vv], 
         KroneckerProduct[e2prime, e2]/Sqrt[e2.e2]/Sqrt[e2prime.e2prime],
         KroneckerProduct[ww, ww]/ww.ww
         ]
        ] /. Part -> Compile`GetElement},
    rotationMatrix3DVectorVector = Compile[
      {{u, _Real, 1}, {v, _Real, 1}},
      code,
      CompilationTarget -> "C",
      RuntimeAttributes -> {Listable},
      Parallelization -> True,
      RuntimeOptions -> "Speed"
      ]
    ];

   e1 = uu/Sqrt[uu.uu];
   ww = Cross[uu, vv];
   e3 = ww/Sqrt[ww.ww];
   e2 = Simplify[Cross[e3, e1]];

   With[{code = N[Simplify@Plus[
          Cos[angle] Simplify@KroneckerProduct[e1, e1],
          Sin[angle] Simplify@KroneckerProduct[e2, e1],
          -Sin[angle] Simplify@KroneckerProduct[e1, e2],
          Cos[angle] Simplify@KroneckerProduct[e2, e2],
          Simplify@KroneckerProduct[e3, e3]
          ]] /. Part -> Compile`GetElement},
    rotationMatrix3DAngleVectorVector = Compile[
      {{angle, _Real}, {u, _Real, 1}, {v, _Real, 1}},
      code,
      CompilationTarget -> "C",
      RuntimeAttributes -> {Listable},
      Parallelization -> True,
      RuntimeOptions -> "Speed"
      ]
    ];
   ];


ClearAll[MyRotationMatrix];

MyRotationMatrix[angle_] := rotationMatrix2D[angle];

MyRotationMatrix[angle_, u_] := rotationMatrix3DAngleVector[angle, u];

MyRotationMatrix[{u_, v_}] := rotationMatrix3DVectorVector[u, v];

MyRotationMatrix[angle_, {u_, v_}] := rotationMatrix3DAngleVectorVector[angle, u, v];

MyRotationMatrix::usage = 
  "\!\(\*RowBox[{\"MyRotationMatrix\", \"[\", StyleBox[\"\[Theta]\", \
\"TR\"], \"]\"}]\) gives the 2D rotation matrix that rotates 2D \
vectors counterclockwise by \!\(\*StyleBox[\"\[Theta]\", \"TR\"]\) \
radians.\n\!\(\*RowBox[{\"MyRotationMatrix\", \"[\", \
RowBox[{StyleBox[\"\[Theta]\", \"TR\"], \",\", StyleBox[\"w\", \
\"TI\"]}], \"]\"}]\) gives the 3D rotation matrix for a \
counterclockwise rotation around the 3D vector \!\(\*StyleBox[\"w\", \
\"TI\"]\).\n\!\(\*RowBox[{\"MyRotationMatrix\", \"[\", RowBox[{\"{\", \
RowBox[{StyleBox[\"u\", \"TI\"], \",\", StyleBox[\"v\", \"TI\"]}], \
\"}\"}], \"]\"}]\) gives the 3D matrix that rotates the vector \
\!\(\*StyleBox[\"u\", \"TI\"]\) to the direction of the vector \
\!\(\*StyleBox[\"v\", \"TI\"]\).\n\!\(\*RowBox[{\"MyRotationMatrix\", \
\"[\", RowBox[{StyleBox[\"\[Theta]\", \"TR\"], \",\", RowBox[{\"{\", \
RowBox[{StyleBox[\"u\", \"TI\"], \",\", StyleBox[\"v\", \"TI\"]}], \
\"}\"}]}], \"]\"}]\) gives the matrix that rotates by \!\(\*StyleBox[\
\"\[Theta]\", \"TR\"]\) radians in the hyperplane spanned by \
\!\(\*StyleBox[\"u\", \"TI\"]\) and \!\(\*StyleBox[\"v\", \"TI\"]\).";

And here is a short test suite:

n = 1000;
angledata = RandomReal[{-2 Pi, 2 Pi}, n];
udata = RandomReal[{-1, 1}, {n, 3}];
vdata = RandomReal[{-1, 1}, {n, 3}];

t1 = First@RepeatedTiming[aa = MyRotationMatrix[angledata];];
t2 = First@RepeatedTiming[bb = RotationMatrix /@ angledata;];
Association["MyTime" -> t1, "Time" -> t2, "SpeedUp" -> t2/t1, 
 "Error" -> Max[Abs[aa - bb]]]

t1 = First@RepeatedTiming[aa = MyRotationMatrix[angledata , vdata];];
t2 = First@ RepeatedTiming[ bb = RotationMatrix @@@ Transpose[{angledata, vdata}];];
Association["MyTime" -> t1, "Time" -> t2, "SpeedUp" -> t2/t1, "Error" -> Max[Abs[aa - bb]]]


t1 = First@RepeatedTiming[aa = MyRotationMatrix[{udata, vdata}];];
t2 = First@ RepeatedTiming[bb = RotationMatrix /@ Transpose[{udata, vdata}];];
Association["MyTime" -> t1, "Time" -> t2, "SpeedUp" -> t2/t1,  "Error" -> Max[Abs[aa - bb]]]


t1 = First@RepeatedTiming[aa = MyRotationMatrix[angledata, {udata, vdata}];];
t2 = First@RepeatedTiming[bb = RotationMatrix @@@Transpose[{angledata, Transpose[{udata, vdata}]}];];
Association["MyTime" -> t1, "Time" -> t2, "SpeedUp" -> t2/t1, "Error" -> Max[Abs[aa - bb]]]

<|"MyTime" -> 0.000067, "Time" -> 0.032, "SpeedUp" -> 4.9*10^2, "Error" -> 1.11022*10^-16|>

<|"MyTime" -> 0.000098, "Time" -> 0.273, "SpeedUp" -> 2.8*10^3, "Error" -> 9.99201*10^-16|>

<|"MyTime" -> 0.00010, "Time" -> 0.17, "SpeedUp" -> 1.7*10^3, "Error" -> 8.88178*10^-16|>

<|"MyTime" -> 0.000096, "Time" -> 0.16, "SpeedUp" -> 1.7*10^3, "Error" -> 2.03171*10^-14|>

Edit

Fixed the argument pattern of the angle + vector case to make it compatible with RotationMatrix.


I have previously used the following routine based on ideas by Möller and Hughes in these previous answers, but it would be good to have it as an explicit answer here:

vectorRotate[vv1_?VectorQ, vv2_?VectorQ] := 
 Module[{v1 = Normalize[vv1], v2 = Normalize[vv2], c, d, d1, d2, t1, t2},
        d = v1.v2;
        If[TrueQ[Chop[1 + d] == 0],
           c = UnitVector[3, First[Ordering[Abs[v1], 1]]];
           t1 = c - v1; t2 = c - v2; d1 = t1.t1; d2 = t2.t2;
           IdentityMatrix[3] - 2 (Outer[Times, t2, t2]/d2 - 
           2 t2.t1 Outer[Times, t2, t1]/(d2 d1) + Outer[Times, t1, t1]/d1),
           c = Cross[v1, v2];
           d IdentityMatrix[3] + Outer[Times, c, c]/(1 + d) - LeviCivitaTensor[3].c]]

Using the simple test in the OP:

With[{n = 1000},
     udata = RandomReal[{-1, 1}, {n, 3}];
     vdata = RandomReal[{-1, 1}, {n, 3}]];

First @ RepeatedTiming[result1 = RotationMatrix /@ Transpose[{udata, vdata}];]
   0.272

First @ RepeatedTiming[result2 = MapThread[vectorRotate, {udata, vdata}];]
   0.19

Max[Abs[result1 - result2]]
   7.99916*10^-14