Automatically generating a dependency graph of an arbitrary Mathematica function?

Preamble

The problem is not as trivial as it may seem on the first glance. The main problem is that many symbols are localized by (lexical) scoping constructs and should not be counted. To fully solve this, we need a parser for Mathematica code, that would take scoping into account.

One of the most complete treatments of this problem was given by David Wagner in his Mathematica Journal article, and replicated partially in his book. I will follow his ideas but show my own implementation. I will implement a sort of a simplistic recusrive descent parser which would take scoping into account. This is not a complete thing, but it will illustrate certain subtleties involved (in particular, we should prevent premature evaluation of pieces of code during the analysis, so this is a good excercise in working with held/unevaluated expressions).

Implementation (for illustration only, does not pretend to be complete)

Here is the code:

ClearAll[getDeclaredSymbols, getDependenciesInDeclarations, $OneStepDependencies,
  getSymbolDependencies, getPatternSymbols,inSymbolDependencies, $inDepends];

SetAttributes[{getDeclaredSymbols, getDependenciesInDeclarations, 
   getSymbolDependencies, getPatternSymbols,inSymbolDependencies}, HoldAll];

$OneStepDependencies = False;

inSymbolDependencies[_] = False;

globalProperties[] =
    {DownValues, UpValues, OwnValues, SubValues, FormatValues, NValues, 
     Options, DefaultValues};


getDeclaredSymbols[{decs___}] :=
    Thread@Replace[HoldComplete[{decs}], HoldPattern[a_ = rhs_] :> a, {2}];

getDependenciesInDeclarations[{decs___}, dependsF_] :=
  Flatten@Cases[Unevaluated[{decs}], 
      HoldPattern[Set[a_, rhs_]] :> dependsF[rhs]];

getPatternSymbols[expr_] :=
  Cases[ 
     Unevaluated[expr], 
     Verbatim[Pattern][ss_, _] :> HoldComplete[ss], 
     {0, Infinity},  Heads -> True];

getSymbolDependencies[s_Symbol, dependsF_] :=
  Module[{result},
    inSymbolDependencies[s] = True;
     result = 
       Append[
         Replace[
            Flatten[Function[prop, prop[s]] /@ globalProperties[]],
            {
              (HoldPattern[lhs_] :> rhs_) :>
                With[{excl = getPatternSymbols[lhs]},
                 Complement[
                   Join[
                      withExcludedSymbols[dependsF[rhs], excl],
                      Module[{res},
                         (* To avoid infinite recursion *)
                         depends[s] = {HoldComplete[s]};
                         res = withExcludedSymbols[dependsF[lhs], excl];
                         depends[s] =.;
                         res
                      ]
                   ],
                   excl]
                ],
              x_ :> dependsF[x]
            },
            {1}
         ],
         HoldComplete[s]
       ];
    inSymbolDependencies[s] =.;
    result] /; ! TrueQ[inSymbolDependencies[s]];

getSymbolDependencies[s_Symbol, dependsF_] := {};


(* This function prevents leaking symbols on which global symbols colliding with 
** the pattern names (symbols) may depend 
*)
ClearAll[withExcludedSymbols];
SetAttributes[withExcludedSymbols, HoldFirst];
withExcludedSymbols[code_, syms : {___HoldComplete}] :=
   Module[{result, alreadyDisabled },
     SetAttributes[alreadyDisabled, HoldAllComplete];
     alreadyDisabled[_] = False;
     Replace[syms,
       HoldComplete[s_] :>
         If[! inSymbolDependencies[s],
            inSymbolDependencies[s] = True,
            (* else *)
            alreadyDisabled[s] = True
         ],
       {1}];
     result = code;
     Replace[syms, 
        HoldComplete[s_] :> 
           If[! alreadyDisabled[s], inSymbolDependencies[s] =.], 
        {1}
     ];
     ClearAll[alreadyDisabled];
     result
 ];


(* The main function *)
ClearAll[depends];
SetAttributes[depends, HoldAll];
depends[(RuleDelayed | SetDelayed)[lhs_, rhs_]] :=
   With[{pts = getPatternSymbols[lhs]},
      Complement[
        Join[
          withExcludedSymbols[depends[lhs], pts], 
          withExcludedSymbols[depends[rhs], pts]
        ],
        pts]
   ];
depends[Function[Null, body_, atts_]] := depends[body];
depends[Function[body_]] := depends[body];
depends[Function[var_, body_]] := depends[Function[{var}, body]];
depends[Function[{vars__}, body_]] := 
   Complement[depends[body], Thread[HoldComplete[{vars}]]];
depends[(With | Module)[decs_, body_]] :=
  Complement[
    Join[
      depends[body],
      getDependenciesInDeclarations[decs, depends]
    ],
    getDeclaredSymbols[decs]
  ];
depends[f_[elems___]] :=
  Union[depends[Unevaluated[f]], 
    Sequence @@ Map[depends, Unevaluated[{elems}]]];
depends[s_Symbol /; Context[s] === "System`"] := {};
depends[s_Symbol] /; ! $OneStepDependencies || ! TrueQ[$inDepends] :=  
   Block[{$inDepends = True},
      Union@Flatten@getSymbolDependencies[s, depends ]
   ];
depends[s_Symbol] := {HoldComplete[s]};
depends[a_ /; AtomQ[Unevaluated[a]]] := {};

Illustration

First, a few simple examples:

In[100]:= depends[Function[{a,b,c},a+b+c+d]]
Out[100]= {HoldComplete[d]}

In[101]:= depends[With[{d=e},Function[{a,b,c},a+b+c+d]]]
Out[101]= {HoldComplete[e]}

In[102]:= depends[p:{a_Integer,b_Integer}:>Total[p]]
Out[102]= {}

In[103]:= depends[p:{a_Integer,b_Integer}:>Total[p]*(a+b)^c]
Out[103]= {HoldComplete[c]}

Now, a power example:

In[223]:= depends[depends]
Out[223]= 
{HoldComplete[depends],HoldComplete[getDeclaredSymbols],
 HoldComplete[getDependenciesInDeclarations],HoldComplete[getPatternSymbols],
 HoldComplete[getSymbolDependencies],HoldComplete[globalProperties],
 HoldComplete[inSymbolDependencies],HoldComplete[withExcludedSymbols],
 HoldComplete[$inDepends],HoldComplete[$OneStepDependencies]}

As you can see, my code can handle recursive functions. The code of depends has many more symbols, but we only found those which are global (not localized by any of the scoping constructs).

Note that by default, all dependent symbols on all levels are included. To only get the "first-level" functions / symbols on which a given symbol depends, one has to set the variabe $OneStepDependencies to True:

In[224]:= 
$OneStepDependencies =True;
depends[depends]

Out[225]= {HoldComplete[depends],HoldComplete[getDeclaredSymbols],
HoldComplete[getDependenciesInDeclarations],HoldComplete[getPatternSymbols],
HoldComplete[getSymbolDependencies],HoldComplete[withExcludedSymbols],
HoldComplete[$inDepends],HoldComplete[$OneStepDependencies]}

This last regime can be used to reconstruct the dependency tree, as for example suggested in the answer by @Szabolcs.

Applicability

This answer is considerably more complex than the one by @Szabolcs, and probably also (considerably) slower, at least in some cases. When should one use it? The answer I think depends on how critical it is to find all dependencies. If one just needs to have a rough visual picture for the dependencies, then @Szabolcs's suggestion should work well in most cases. The present asnwer may have advantages when:

  • You want to analyze dependencies in an arbitrary piece of code, not necessarily placed in a function (this one is easily if not super-conveniently circumvented in @Szabolcs's approach by first creating a dummy zero-argument function with your code and then analyzing that)

  • It is critical for you to find all dependencies.

Things like

$functionDoingSomething = Function[var,If[test[var],f[var],g[var]]]
myFunction[x_,y_]:= x+ $functionDoingSomething [y]

will escape from the dependencies found by the @Szabolcs's code (as he mentioned himself in the comments), and can therefore cut away whole dependency sub-branches (for f, g and test here). There are other cases, for example related to UpValues, dependencies through Options and Defaults, and perhaps other possibilities as well.

There may be several situations when finding all dependencies correctly is critical. One is when you are using introspection programmatically, as one of the meta-programming tools - in such case you must be sure everything is correct, since you are building on top of this functionality. To generalize, you might need to use something like what I suggested (bug-free though :)), every time when the end user of this functionality will be someone (or something, like other function) other than yourself.

It may also be that you need the precise dependency picture for yourself, even if you don't intend to use it programmatically further.

In many cases however, all this is not very critical, and the suggestion by @Szabolcs may represent a better and easier alternative. The question is basically - do you want to create user-level or system-level tools.

Limitations, flaws and subtleties

EDIT

The current version of the code certainly contains bugs. For example, it can not handle the GraphEdit example from the answer of @Szabolcs without errors. While I hope to get these bugs fixed soon, I invite anyone interested to help me debugging the code. Feel free to update the answer, once you are sure that you correctly identified and truly fixed some bugs.

END EDIT

I did not intend this to be complete, so things like UpSetDelayed and TagSetDelayed are not covered, as well as probably some others. I did not also cover dynamic scoping (Block, Table, Do, etc), because in most cases dynamic scoping still means dependencies. The code above can however be straightforwardly extended to cover the cases missed here (and I might do that soon).

The code can be refactored further to have a more readable / nicer form. I intend to do this soon.


The answers from @LeonidShifrin and @Szabolcs are great, so I just want to share some incomplete thing I wrote for analyzing and visualizing Compiled "WVM" code. It's for compiler of Mathematica 7.0.1. Sorry if the code looks messy, it has been abandoned long ago.. (for the compiler version always got updated before I could figure out all the codes meaning..) If someone feel interested in it, please feel free to modify it.

(testCode = Compile[{{data, _Real, 1}, {y, _Real, 1}},
    Module[{n, z, testdata},
     n = Length[data];
     z = (data - y)/Sqrt[Abs[y]];
     testdata = 1/2 (Erf[#/Sqrt[2]] + 1) & /@ z;
     (Sqrt[n] + .12 + .11/Sqrt[n]) Max[
       Abs[Range[n]/n - Sort[testdata]]]
     ]
    ]) // CodeShow

enter image description here

enter image description here

btw I'm still wondering if it would be convenient to analyze the code by simulatively running and tracing it.


I didn't find my original code, but here's a start for implementing this:

First, let's say that a "function" is a symbol that has DownValues but no OwnValues (this latter requirement is just for safety now). This needs a lot more work to get right: for example, many built-ins have no visible DownValues at all, yet they are not inert (e.g. check that DownValues[Table] === {}). I am completely ignoring any SubValues (f[a][b] := ... type definitions) for now, which should probably be considered, and I didn't even think about how UpValues can cause any trouble. Also, I didn't verify whether it causes stubs to be loaded or not.

SetAttributes[functionQ, HoldAll]
functionQ[
  sym_Symbol] := (DownValues[sym] =!= {}) && (OwnValues[sym] === {})

This function will find all dependencies of the function passed to it.

SetAttributes[dependencies, HoldAll]
dependencies[sym_Symbol] := List @@ Select[
   Union@Level[(Hold @@ DownValues[sym])[[All, 2]], {-1}, Hold, 
     Heads -> True],
   functionQ
   ]

This one will build a graph using a very-inefficient algorithm (memoization in dependencies[] could help a lot in speeding this up, but then I'd make dependencies a localized symbol in Module below):

SetAttributes[dependencyGraph, HoldAll]
dependencyGraph[sym_Symbol] :=
 Module[{vertices, edges},
  vertices = 
   FixedPoint[Union@Flatten@Join[#, dependencies /@ #] &, {sym}];
  edges = 
   Flatten[Thread[# \[DirectedEdge] dependencies[#]] & /@ vertices];
  Graph[Tooltip[#, #] & /@ ToString /@ vertices, 
   Map[ToString, edges, {2}]]
  ]

Let's try it on some package functions. Hover the nodes to see function names as tooltips.

<< GraphUtilities`

dependencyGraph[MinCut]

Mathematica graphics

dependencyGraph[WeakComponents]

Mathematica graphics

Or on itself:

dependencyGraph[dependecyGraph]

Show@HighlightGraph[
  dependencyGraph[dependencyGraph], {"dependencyGraph"}, 
  VertexLabels -> "Name"]

Mathematica graphics

(Show here is a workaround for cutting off vertex labels)

This is just a starting point and needs a lot more work to make it useful. functionQ needs a lot more improvement, and there should be a way to limit how many dependencies are being followed (this could be implemented by checking symbol contexts: the dependency walker should stop as soon as it reaches a System` or perhaps non-Global` symbol. I'd make it possible to pass the dependency walker function a list of either blacklisted or whitelisted contexts, and specify a default.)

Note: Please feel free to build on this code and post an improved version as an answer.

Warning: Be careful with this function because it won't stop when it sees a System` symbol and it might produce a huge graph that's slow to lay out and show:

Mathematica graphics


Several people have commented above that what the OP is asking for is impossible or too difficult. I strongly disagree. These arguments could be brought up for any dynamic language (or in fact even for C itself, as it has a preprocessor and macros). You could say we shouldn't even have any code analysis in e.g. a Python IDE because it can't easily be done perfectly. Does that really mean we shouldn't do it at all, even if in the vast majority of cases a simple approach works and gives useful results?

I believe even a simple and imperfect approach can often prove very useful in practice.