What's the difference between the minimum spanning tree algorithm for undirected vs directed graphs?

The core of your question seems to be what makes finding an MST (technically called an optimum branching or minimum-cost arborescence) in a directed graph different and therefore harder than finding an MST in an undirected graph.

Both Prim's and Kruskal's algorithms work because of the cut property. If G = (V, E) is a graph, then for any cut (S, V - S) in G, if there is a least-cost edge {u, v} crossing that cut, that edge must belong to all MSTs of G. Unfortunately, this property isn't true in the directed case. Here's a counterexample:

      2
  A ----> B
  |      | ^
1 |  -99 | | 1
  |      v |
  +-----> C

Here, the minimum-cost arborescence rooted at A is this one:

      2
  A ----> B
          |
      -99 |
          v
          C

However, look at the cut ({A}, {B, C}) The least-cost edge crossing this cut is the edge (A, C), but that edge doesn't appear in any minimum-cost arborescence rooted at A.

Without the cut property, Prim's algorithm and Kruskal's algorithm both fail. Try running Prim's algorithm on the graph given here, starting with node A as your included node. You'll add in edge (A, C), then edge (C, B), giving a suboptimal arborescence. Now, try running Kruskal's algorithm here. You'll first add edge (B, C), then add edge (A, C). Unfortunately, this isn't actually an arborescence because it has no root node.

The standard algorithm for finding minimum-cost arborescences (Edmonds-Chu) is actually probably closer in spirit to Boruvka's algorithm. Boruvka's algorithm works by simultaneously choosing, for each node, the least-cost edge connected to that node and adding it to the candidate MST. You then contract all CC's formed this way into single nodes and repeat this process until you have your tree.

In the undirected case, as long as the edge weights are distinct, this algorithm will never introduce a cycle (it's a good exercise to prove this), but this isn't the case in directed algorithms. The graph given above is a good example of this - if you try this, you pick (A, C) from A, (C, B) from C, and (B, C) from B, forming the cycle (B, C, B). The correction that the Edmonds-Chu algorithm uses works by contracting one of these cycles into a single node, then repeating this process in the reduced graph and "uncontracting" the cycles based on the result. In that sense it's similar to Boruvka's algorithm, though with appropriate modifications.

Hope this helps!