How to extract functions used in a python code file?

You can extract all call expressions with:

import ast

class CallCollector(ast.NodeVisitor):
    def __init__(self):
        self.calls = []
        self.current = None

    def visit_Call(self, node):
        # new call, trace the function expression
        self.current = ''
        self.visit(node.func)
        self.calls.append(self.current)
        self.current = None

    def generic_visit(self, node):
        if self.current is not None:
            print "warning: {} node in function expression not supported".format(
                node.__class__.__name__)
        super(CallCollector, self).generic_visit(node)

    # record the func expression 
    def visit_Name(self, node):
        if self.current is None:
            return
        self.current += node.id

    def visit_Attribute(self, node):
        if self.current is None:
            self.generic_visit(node)
        self.visit(node.value)  
        self.current += '.' + node.attr

Use this with a ast parse tree:

tree = ast.parse(yoursource)
cc = CallCollector()
cc.visit(tree)
print cc.calls

Demo:

>>> tree = ast.parse('''\
... def foo():
...     print np.random.rand(4) + np.random.randn(4)
...     print linalg.norm(np.random.rand(4))
... ''')
>>> cc = CallCollector()
>>> cc.visit(tree)
>>> cc.calls
['np.random.rand', 'np.random.randn', 'linalg.norm']

The above walker only handles names and attributes; if you need more complex expression support, you'll have to extend this.

Note that collecting names like this is not a trivial task. Any indirection would not be handled. You could build a dictionary in your code of functions to call and dynamically swap out function objects, and static analysis like the above won't be able to track it.


In general, this problem is undecidable, consider for example getattribute(random, "random")().

If you want static analysis, the best there is now is jedi

If you accept dynamic solutions, then cover coverage is your best friend. It will show all used functions, rather than only directly referenced though.

Finally you can always roll your own dynamic instrumentation along the lines of:

import random
import logging

class Proxy(object):
    def __getattr__(self, name):
        logging.debug("tried to use random.%s", name)
        return getattribute(_random, name)

_random = random
random = Proxy()