Is there a built-in function I can use to measure the similarity of two trees?

One approach is with Level. If your expression is called struct,

Level[struct, {0, Infinity}]

or perhaps

Level[struct, {0, Infinity}, Heads -> True]

or even

Level[struct, #] & /@ Range[Depth[struct]]

You can also pick individual levels at will. As for measuring how far apart the levels are, you could do something as simple as length, or some function of length and depth. Crudely, this might be similar to

LeafCount[struct]

There's no unique measure, so I would use GraphPropertyDistribution on as many graph properties as you can imagine.


Alright, I'm sure this is an overkill bit of code, and those more experienced with Mathematica could probably think of a lot better way of doing this, but I am going to post my code for this question in hopes of maybe helping someone else that has a similar problem.

The following code takes two equations or lists (I'm labeling them as creatures, for the purposes of the program I am working on), and compares the similarity of structure between them. It takes into account the levels (if they have a different number of levels, and those levels have different items in them, that will add to the total difference between them), as well as the heads. So, {a,1} compared to {a,a} would have a difference of 1, while {a,1} compared to {a,3} or {a,1} compared to {a,12} would have a returned difference of 0. So, it does not take into account the value of different symbols, but it can differentiate between different symbols, such as Plus, etc.

This is also better suited for lists, because I haven't (yet) implemented Hold[] for equations. For my purposes, that is alright, because I am not setting values into the equations yet (so the variable a would not be set to any specific value yet - it would still be colored blue in Mathematica).

Also, there is (at least) one bug, where the integer 0 causes problems, such that comparing, for whatever reason, a + 0 and b + 1 does not work properly. However, comparing a + 0.0 and b + 1 does work.

Here is the code:

    (*This function takes one creature and returns the elementes in each \
level, extracted to lists*)
extractLevels[tList_] := Module[{elements, levels, elPositions},
  levels = Table[
    elements = Position[tList, _];
    elPositions = DeleteCases[elements, {__, 0} | {0} | {}];
    Extract[tList, Cases[elPositions, _?(Length[#] == level &)]]
    , {level, Depth[tList] - 1}]
  ]


(*This function returns the heads of all the individual levels from a \
single creature*)
returnCreatureHeadsByLevel[bestCreature_] := 
 Module[{bestCreatureLevels},
  bestCreatureLevels = extractLevels[bestCreature];
  Table[
   Table[
    Head[bestCreatureLevels[[x]][[itemsInLevel]]]
    , {itemsInLevel, Length[bestCreatureLevels[[x]]]}]
   , {x, Length[bestCreatureLevels]}]
  ]


(*This takes two lists of heads and finds the similarity between \
them, returning two lists to represent where they coincide(0) and \
where they do not(1)*)
returnCreatureLevelLossLists[headList1_, headList2_] := 
 Module[{aligned, creature1LossList, currentPart, lengthPart, 
   partLoss, loss, creatureNumber, divergenceLength, 
   convergenceLength, creature2LossList},
  aligned = SequenceAlignment[headList1, headList2];
  creature1LossList = Flatten[Table[
     currentPart = aligned[[x]];
     lengthPart = Length[currentPart];
     partLoss = If[Depth[currentPart] == 2, 0, 1];
     loss = partLoss;
     (*Create a list for the first creature*)
     creatureNumber = 1;
     If[partLoss == 1,
      (*This subsection represents what do for the divergent parts of \
the list*)
      divergenceLength = Length[currentPart[[creatureNumber]]];
      loss = Table[1, {divergenceLength}]
      ,
      convergenceLength = Length[currentPart];
      loss = Table[0, {convergenceLength}];
      ];
     loss
     , {x, Length[aligned]}]];
  creature2LossList = Flatten[Table[
     currentPart = aligned[[x]];
     lengthPart = Length[currentPart];
     partLoss = If[Depth[currentPart] == 2, 0, 1];
     loss = partLoss;
     (*Create a list for the first creature*)
     creatureNumber = 2;
     If[partLoss == 1,
      (*This subsection represents what do for the divergent parts of \
the list*)
      divergenceLength = Length[currentPart[[creatureNumber]]];
      loss = Table[1, {divergenceLength}]
      ,
      convergenceLength = Length[currentPart];
      loss = Table[0, {convergenceLength}];
      ];
     loss
     , {x, Length[aligned]}]];
  (*This, below, gives two lists, 
  where each of them represents one of the creatures. 
  The 0's are the locations the creatures are similar, 
  and the 1's are the locations where the creatures are different*)
  {creature1LossList, creature2LossList}
  ]



(*This function gives lists of divergent and convergent paths*)
giveDivergentConvergentPaths[creature1_, creature2_] := 
 Module[{divergentPaths, divergentList, convergentList},
  divergentPaths = 
   returnCreatureLevelLossLists[
    returnCreatureHeadsByLevel[creature1][[1]], 
    returnCreatureHeadsByLevel[creature2][[1]]];
  divergentList[1] = 
   Cases[Table[{divergentPaths[[1]][[x]], x}, {x, 
       Length[divergentPaths[[1]]]}], {1, _Integer}] /. {1, 
      r_Integer} -> r;
  divergentList[2] = 
   Cases[Table[{divergentPaths[[2]][[x]], x}, {x, 
       Length[divergentPaths[[2]]]}], {1, _Integer}] /. {1, 
      r_Integer} -> r;
  convergentList[1] = 
   Cases[Table[{divergentPaths[[1]][[x]], x}, {x, 
       Length[divergentPaths[[1]]]}], {0, _Integer}] /. {0, 
      r_Integer} -> r;
  convergentList[2] = 
   Cases[Table[{divergentPaths[[2]][[x]], x}, {x, 
       Length[divergentPaths[[2]]]}], {0, _Integer}] /. {0, 
      r_Integer} -> r;
  (*This might sometimes give confusing results, 
  but that is a result of SequenceAlignment, 
  which seems to pick the element that is closest to the beginning of \
the FIRST list, 
  and use that to find a match on the second list to start at. So, 
  we will get the divergence lists from BOTH creatures*)
  {{divergentList[1], divergentList[2]}, {convergentList[1], 
    convergentList[2]}}
  ]



(*Get the similarity between two creatures, in terms of the FIRST creature*)
tableCounter[creature1_, creature2_] := 
 Module[{sum, c1DivergentConvergent, c2DivergentConvergent, c1Loss, 
   result, c1Part, c2Part, creature1Divergence, c1Count, 
   creature2Divergence, c1Sum, c2Sum, creature1Convergence, 
   creature2Convergence},

  c1DivergentConvergent = 
   giveDivergentConvergentPaths[creature1, creature2];
  c2DivergentConvergent = 
   giveDivergentConvergentPaths[creature2, creature1];

  c1Count = 0;
  c1Loss = Table[
    If[
     (*If the current creature part is one of the converging branches..*)


     MemberQ[c1DivergentConvergent[[2]][[1]], x],
     (*Then, 
     just print out that creature part*)(*This part below is where we \
will call tableCounter again, to follow the convergent branches*)

     (*This part below also needs to code for the situation where the \
branches match until the very end, which it doesn't currently do*)
     c1Count += 1;
     (*Get the parts from each creature by the convergence indices*)
     c1Part = creature1[[c1DivergentConvergent[[2]][[1]][[c1Count]]]];
     c2Part = creature2[[c1DivergentConvergent[[2]][[2]][[c1Count]]]];


      (*Since this is a convergent path, 
      we will now make an if statement that either runs the function \
recurrently to a result, or does not run a recurrent statement, 
      but still gives a result*)
      If[(*Check to make sure neither part is of length\[Equal]0*)
       (*If either part 1 or part 2 is \[Equal] 0 is false, 
       then proceed to recurrence*)
       (Length[c1Part] == 0 || Length[c2Part] == 0) == False
       ,
       (*Execute recurrent function and set output to result*)
       result = 
        tableCounter[c1Part, 
         c2Part](*Remember that we are going to want the output of \
this whole function to be just one value, 
       being the TOTAL loss of the two compared*)
       ,
       (*Execute non-recurrent code and set output to result*)
       result = If[Length[c1Part] == 0,
         (*If the part that was length\[Equal]0 was c1Part, 
         then set its loss to 1*)
         (*1;*)(*We might actually want to set this to 1 or 0 \
depending on how it compares with the other creature at that location*)


         If[(*Test to see if c2Part also has a length of 0*)
          Length[c2Part] == 0,
          (*If the length of c2Part IS also 0, 
          compare the two and see if they have the same heads. 
          If yes, return 0. If no, return 1*)
          If[Head[c1Part] == Head[c2Part], 0, 1],
          (*At this point, 
          we know that the length of c2Part is greater than 0. So we \
will pick its first element, get its head, 
          and compare that head to the head of c1Part. 
          If they are the same, 
          we will return 0. If they are different, we will return 1*)


          If[Head[c2Part[[1]]] == Head[c1Part], 0, 1]
          ]
         ,
         (*Otherwise, flatten the list, 
         get its length and total it*)
         (*At this point, we know that the length of c2Part = 0, 
         and c1Part is > 0*)
         Length[
          Flatten[c1Part, 
           Infinity]](*We might want to use this part to see if they \
match at the first location, at least*)(*This part, right here, 
         is just how I want it, because, 
         if I were to look at the first location of the longer \
part(which I don't want to do), 
         then I would be providing for comparison over different \
levels and, again, I don't want to do that. 
         So I will leave this as it is*)
         ]
       ];

     (*Return the proper result to the table*)
     result


     (*Otherwise, 
     if the current creature part isn't a converging branch, 
     get the total number of elements in the unmatching branch*)
     , If[Length[creature1[[x]]] == 0,
      (*If the length of the current unmatching creature part is 0, 
      then set it's loss to 1*)
      1,
      (*Otherwise, flatten the list, get its length and total it*)
      Length[Flatten[creature1[[x]], Infinity]]
      ]
     ]
    , {x, Length[creature1]}];

  (*Total up the results for c1. 
  Make sure to leave contingencies for examples where the list is \
wierdly shaped*)
  If[Length[c1Loss] > 0,
   (*If the length of the loss table was greater than 0(meaning non-
   atomic), then find its total*)
   sum = Total[Flatten[c1Loss, Infinity]]
   ,
   (*If the length of the c1Loss table was 0, 
   then make sure it was a number, and set that to sum*)
   sum = If[IntegerQ[c1Loss],
     (*If it was an integer, set c1Loss to sum*)
     c1Loss,
     (*If it wasn't an integer, set sum to 0*)
     0]
   ];

  (*Return the sum of the losses*)
  sum
  ]



compareTwoCreatures[creature1_, creature2_] := Module[{a, b},
  a = tableCounter[creature1, creature2];
  b = tableCounter[creature2, creature1];
  If[(*Compare to see whether the loss is bigger according to \
swapping the creature locations in the function*)
   a >= b,
   (*If the loss was greatest with the first creature first, 
   then return the loss of the first creature*)
   a,
   (*If loss of second creature was greatest, return second*)
   b]
  ]

To use the code, we just input a list we want to compare, like the following:

compareTwoCreatures[{a, b, c}, {t, r, g}]

And we will get the output:

0

Or:

compareTwoCreatures[{a, b, {1, a, 3}}, {t, r, g}]

Evaluates to:

3

Or, lastly:

compareTwoCreatures[{a, b, c, {a, b}}, {t, r, g, {a, 1}}]

Gives:

1

More complex lists can also be compared, where there are a lot of dissimilarities - SequenceAlignment[], which I used in one of the functions, does a good job lining up the lists at the best location. SequenceAlignment[] is also called recursively for each level of two lists that are similar - so it is a function that, basically, compares the tree structure of two lists or equations.

I was hesitant to post this, as I am sure there are a lot of bugs and I'm not a professional programmer, but I've gotten a lot of help on this site, so maybe this will help someone else as well.

Edit (2018-10-3): I've come up with a much better function to do this, and thought I would post the results here to help anyone interested.

There are two functions, where the first takes two lists - let's call them a and b - and returns a list of all the positions of individual elements from the list a, with the addition of a number (the loss) that represents whether or not that element is present in the other list, in that specific location. 0 represents that it is present, 1 represents that it is not, and a real value indicates that both the positions, between the two lists, have a real number at that location - and the difference between them is returned. Here is the first function:

(*This function takes two creatures and returns all of the positions where the creatures have the same values, for the first creature, with a 0 *)
returnValidPositions[c1_,c2_]:=Module[{res,c1Parts,c2Parts,temp,creature1,creature2},
creature1=Normal[c1]/.Rule:>List;(*This converts the incoming creature to a list, also making sure to convert all remaining rules to lists, for easy extraction and comparison*)
creature2=Normal[c2]/.Rule:>List;
c1Parts=Position[creature1,_];
c2Parts=Position[creature2,_];
Table[
If[(*If the current c1Part location is a location in c2Parts list, then proceed*)
MemberQ[c2Parts,c1Parts[[x]]]
,
temp=ToString[Extract[creature1,c1Parts[[x]]]]== ToString[Extract[creature2,c1Parts[[x]]]];
(*Print[{temp,x}];*)
(*Print[{"Both have same positions",x}];*)
If[(*Check and see if the two objects are the same. If so, set the result to 0. If not, then proceed, to try and find the difference between them*)
(*Since c1Parts is a location that is shared between both creatures, it should be ok to use it to extract elements from both creatures*)
temp
(*If the shared location between the two creatures holds the same exact element, then...*)
,
(*Print[{"Both have same element",x}];*)
res={0,c1Parts[[x]]}(*If the element was the exact same, then set the loss to 0*)
,
(*If the shared position between the two elements is not the same, then proceed*)
If[(*If both of the parts at this location are reals, then find the difference between them, and use that difference as a loss*)
ToString[Extract[creature1,c1Parts[[x]]][[0]]]=="Real"&&ToString[Extract[creature2,c1Parts[[x]]][[0]]]=="Real"
,
(*Print[{"Both are reals",x}];*)
(*If both of the elements were reals, find the absolute difference between them*)
res={Abs[Extract[creature1,c1Parts[[x]]]-Extract[creature2,c1Parts[[x]]]],c1Parts[[x]]}
,
(*Print[{"Not both reals, but both same positions",x}];*)
(*If both objects weren't reals, then just set the loss equal to 1*)
res={1,c1Parts[[x]]}
]
]
,
(*Print[{"Position not shared between both creatures",x}];*)
(*If the c1Part location didn't exist in c2Parts, then set it to 1, because 1 will be a penalty, and represents difference*)
res={1,c1Parts[[x]]}
];
res(*Output res to show if the current position, in c1Parts, is also present in c2Parts, or similar to any degree*)
,{x,Length[c1Parts]}]
]

The second function takes an input of list a, list b, and a value, called getOnlyLargestDiff, set to True or False. The function gets the difference of position values between a and b, totals the difference for each one and, if getOnlyLargestDiff is set to True, returns the largest totaled value for the difference list of either list a or b. If set to False, it returns the sum of these two difference lists. Here the second, and final, function is:

(*This takes two creatures and finds the amount of difference between \
them*)
twoCreatureTotalDifference[creature1_, creature2_, 
  getOnlyLargestDiff_] := Module[{c1Total, c2Total, total},
  c1Total = 
   Total[returnValidPositions[creature1, creature2][[All, 1]]];
  c2Total = 
   Total[returnValidPositions[creature2, creature1][[All, 1]]];
  total = If[getOnlyLargestDiff == True,
    If[c1Total >= c2Total,
     c1Total
     ,
     c2Total
     ]
    ,
    (*If we aren't only returning the creature with the largest \
difference, then just sum the differences of the two*)
    c1Total + c2Total
    ]
  ]

This second function is the one that we call to find the difference between two lists. The order the lists are inserted into the function does not matter (for this second function only). Also, this code seems to be a lot more bug free than my previous attempt.