Implementing the Prolog Unification algorithm in Python? Backtracking

I will quickly summarize the chapter about Unification Theory by Baader and Snyder from the Handbook of Automated Reasoning:

Terms are built from constants (starting with a lower case letter) and variables (starting with an upper case letter):

  • a constant without arguments is a term: e.g. car
  • a constant with terms as arguments, a so called function application, is a term. e.g. date(1,10,2000)
  • a variable is a term, e.g. Date (variables never have arguments)

A substitution is a map assigning terms to variables. In the literature, this is often written as {f(Y)/X, g(X)/Y} or with arrows {X→f(Y), Y→g(X)}. Applying a substitution to a term replaces each variable by the corresponding term in the list. E.g. the substitution above applied to tuple(X,Y) results in the term tuple(f(Y),g(X)).

Given two terms s and t, a unifier is a substitution that makes s and t equal. E.g. if we apply the substitution {a/X, a/Y} to the term date(X,1,2000), we get date(a,1,2000) and if we apply it to date(Y,1,2000) we also get date(a,1,2000). In other words, the (syntactic) equality date(X,1,2000) = date(Y,1,2000) can be solved by applying the unifier {a/X,a/Y}. Another, simpler unifier would be X/Y. The simplest such unifier is called the most general unifier. For our purposes it's enough to know that we can restrict ourselves to the search of such a most general unifier and that, if it exists, it is unique (up to the names of some variables).

Mortelli and Montanari (see section 2.2. of the article and the references there) gave a set of rules to compute such a most general unifier, if it exists. The input is a set of term pairs (e.g. { f(X,b) = f(a,Y), X = Y } ) and the output is a most general unifier, if it exists or failure if it does not exist. In the example, the substitution {a/X, b/Y} would make the first pair equal (f(a,b) = f(a,b)), but then the second pair would be different (a = b is not true).

The algorithm nondeterministically picks one equality from the set and applies one of the following rules to it:

  • Trivial: an equation s = s (or X=X) is already equal and can be safely removed.
  • Decomposition: an equality f(u,v) = f(s,t) is replaced by the equalities u=s and v=t.
  • Symbol Clash: an equality a=b or f(X) = g(X) terminates the process with failure.
  • Orient: an equality of the form t=X where t is not another variable is flipped to X=t such that the variable is on the left side.
  • Occurs check: if the equation is of the form X=t, t is not X itself and if X occurs somewhere within t, we fail. [1]
  • Variable elimination: of we have an equation X=t where X does not occur in t, we can apply the substitution t/X to all other problems.

When there is no rule left to apply, we end up with a set of equations {X=s, Y=t, ...} that represents the substitution to apply.

Here are some more examples:

  • {f(a,X) = f(Y,b)} is unifiable: decompose to get {a=Y, X=b} and flip to get {Y=a, X=b}
  • {f(a,X,X) = f(a,a,b)} is not unifiable: decompose to get {a=a,X=a, X=b}, eliminate a=a by triviality, then eliminate the variable X to get {a=b} and fail with symbol clash
  • {f(X,X) = f(Y,g(Y))} is not unifiable: decompose to get {X=Y, X=g(Y)}, eliminate the variable X to get {Y=g(Y)}, fail with occurs check

Even though the algorithm is non-deterministic (because we need to pick a equality to work on), the order does not matter. Because you can commit to any order, it is never necessary to undo your work and try a different equation instead. This technique is usually called backtracking and is necessary for the proof search in Prolog, but not for unification itself.

Now you're only left to pick a suitable data-structure for terms and substitutions and implement the algorithms for applying a substitution to a term as well as the rule based unification algorithm.

[1] If we try to solve X = f(X), we would see that X needs to be of the form f(Y) to apply decomposition. That leads to solving the problem f(Y) = f(f(Y)) and subsequently Y = f(Y). Since the left hand side always has one application of f less than the right hand side, they can not be equal as long we see a term as a finite structure.


I get more confused than enlightened

Been there, done that.

Note: For any source code referenced I did not test the code and can not say it is valid, they are given as an example and look correct enough that I would load them up and run test cases against them to determine their validity.

First: You will get much better search results if you use the correct terminology, use backward chaining instead of Backtracking. e.g. backward-chaining/inference.py

Second: Understand that your question has three separate phases listed.
1. Unification algorithm
2. Backward chaining that uses Unification
3. A data structure for a list. You would not implement this as Python source code but as text to be passed to your functions. See: cons

You should first develop and fully test unification before moving onto backward chaining. Then fully develop and test backward chaining before creating a list data structure. Then fully test your list data structure.

Third: There is more than one way to implement the unification algorithm.
a. You noted the one that uses transformation rules, or noted as A rule based approach in Unification Theory by Baader and Snyder, e.g. delete decompose etc.
b. I prefer the algorithm noted as Unification by recursive descent in Unification Theory by Baader and Snyder given in this OCaml example or Python example
c. I have seen some that use permutations but can't find a good reference at present.

Fourth: From personal experience, understand how each phase works by using pen and paper first, then implement it in code.

Fifth: Again from personal experience, there is lots of information out there on how to do this but the math and technical papers can be confusing as many gloss over something critical to a self-learner or are too dense. I would suggest that instead you focus on finding implementations of the source code/data structures and use that to learn.

Sixth: compare your results against actual working code, e.g. SWI-Prolog.

I can't stress enough how much you need to test each phase before moving on to the next and make sure you have a complete set of test cases.

When I wanted to learn how to write this in a functional language the books on AI 1 2 3 and The Programming Languages Zoo were invaluable. Had to install environments for Lisp and OCaml but was worth the effort.