Find most even trisection of a tree graph in O(n) time

The Solution

The missing piece of the puzzle

The property of tree graphs that I did not know or intuit and that I needed to know so I could find the $O(n \log n)$ solution is that for every tree of $n$ nodes, you can choose a root such that no subtree has more than $n/2$ nodes, or, as @Dap put it, every tree graph $G$ has at least $1$ vertex $r$ such that the components of $G-r$ all have order at most $n/2$, or every tree has a center. Once you know that, everything else falls into place.

Intro

Answers from Dap and SmileyCraft contained helpful nuggets, but were overcomplicated and I could not prove their worst case complexity as being better than $O(n^2)$, though they probably are better than that. My solution is pretty close to Dap's, but simpler. For me, though, it was the simplification I was more interested in than the actual solution.

Because tree graphs are so special, they are talked about in graph theory, in set theory, and in computer science, with sometimes different names and symbols, as you may have noticed if you read the other answers to this question. They also have a lot of specially named properties and operations. So much so that I made a Glossary and included it at the bottom of this answer. Please refer to it if you are confused by any terms I use or the way I use them.

When I talk about a tree (rather than a Tree Graph), I am referring to a hierarchical tree as is commonly used in computer science, with a root, branches, children, etc. When I refer to "a cut on subtree $b$" of a tree, I mean either cutting $b$ from the root (and therefore the rest) of the tree or cutting an edge belonging to subtree $b$. In graph theory, the weight of a cut is the number of edges cut, but with trees, that number is always 1, so the weight of a cut is not that interesting, and I sometimes use "weight of cut" as shorthand to actually mean the size of the subtree created by the cut.

Whenever I talk about weight or heaviness, I am referring to the number of nodes of some subtree or the number of nodes remaining in the other tree after removing one or more subtrees. This corresponds to the order of the component of the Tree Graph represented by those nodes of the tree.

I am not going to try to prove everything. I started to but it was getting long and tedious. Leave me a comment if you think I asserted something that is untrue and maybe I will add a proof later.

  • The goal is to find the optimum trisection of a Tree Graph in $O(n \log n)$ time.
  • "Optimum" is defined as $\max(abc)$, where $a$, $b$, and $c$ are the sizes of the three components remaining after the trisection.

Part 1: Converting the Tree Graph into the right kind of Tree

  • Find vertex $r$ such that the components of $G-r$ all have order at most $n/2$. There is guaranteed to be at least $1$ such vertex.
  • Convert to a hierarchical tree by rooting the Tree Graph at $r$.

The resulting tree has some great properties. In particular it has a subtree I will be calling the "heavy subtree" that contains as many or more nodes as any other subtree, but at the same time is guaranteed to have weight $h \le n/2$.

Part 2: Operating on the Tree

Finding best bisection in $O(n)$

We can create the tree described above, including calculating the weight of every node and finding the heavy subtree, in $O(n)$.

The optimum bisection of the Tree Graph is obtained by cutting the edge between the root of the tree and the heavy subtree.

Show that this is true

Keep in mind that for integers, $x > y \iff (x-1)(y+1) > xy$.

By definition, the weight of the heavy subtree $h \le n/2$, and therefore the number of nodes not in it are $r = n - h$. Also, $h \ge$ any other subtree's weight $s$. Given that any subtree of a tree is smaller than the full tree, $h > s \iff h >$ the weight of any subtree of $s$. So $r \ge h \ge s \ge$ any other subtree. So $rh$ is the product of the $2$ largest and closest numbers we can make and is therefore the best we can do.

Finding the best trisection in $O(n \log n)$

The other missing piece

The best trisection requires at least $1$ cut on the heavy subtree. (Actually, because of symmetry, that is a slight overstatement. Formally, at least $1$ element of the set of equivalently optimal trisections can be made by a cut on the heavy subtree.)

The same reasoning that proves cutting the heavy subtree $h$ makes the best bisection proves that any trisection made with 2 cuts not on $h$ would not be made worse by cutting $h$ instead of either of the other 2 cuts.

Let $h$ be the weight of the heavy subtree, $x$ and $y$ be the weight of the 2 subtrees cut to make a trisection and leaving $r = n - (h + x + y)$ nodes not in the heavy subtree or in $x$ or $y$. $$h(r+x)y \ge (h+r)xy$$ $$h(r+x) \ge (h+r)x$$ $$hr + hx >= hx + rx$$ $$hr >= rx$$ $$h >= x $$

Since we specifically chose $h \ge$ all possible values of $x$, we can count on being able to find an optimal trisection while limiting ourselves to trisections that include at least $1$ cut on the heavy subtree. (Notice that $y$ completely drops out of the equation, so it does not matter if $x$ is greater or less than $y$, which means we do not have to worry that $y$ might be a subtree of $x$. If $y$ was a subtree of $x$, preventing us from keeping $y$ while replacing $x$ with $h$, we could replace $y$ with $h$ instead.)

By reasoning similar to the above bisection and trisection arguments, we can also be sure that if the optimal trisection requires 2 cuts on the heavy subtree, then it will be the optimal bisection of the heavy subtree cut from the rest of the tree. If you make $2$ cuts on the heavy subtree, you are committing to $r \ge n/2$ being $a$, the largest of the 3 components of the trisection. With $r > n/3$ you want to make it smaller, so you want $b+c$ to be as large as possible, which means you want a cut between the heavy subtree and the root. With that done and the requirement that the second cut be in the heavy subtree, the best second cut is by definition the optimal bisection of the heavy subtree.

Grinding it out

With that established, we only need some simple data structures. We have already found all possible subtree sizes for the whole tree. We can find all possible cuts on the heavy subtree and the best bisection of the heavy subtree in $O(n)$.

We can then put the sizes of all of the subtrees not on the heavy subtree into a binary search tree in $O(n \log n)$. Then, given a cut on the heavy subtree, we can find the best possible cut not on the heavy subtree in $O(\log n)$ time.

We then try all $O(n)$ cuts of the heavy subtree against all cuts not on the heavy subtree for $O(n \log n)$ total running time.

We have then calculated all the possible trisections we have not ruled out, and if we were paying attention, we remembered which one was $max(abc)$. We are done.

Optimization to $O(n)$ total running time

If we really care about the time complexity of this, we can get rid of the sorts and binary searches at a cost of increasing the number of $O(n)$ operations and making the solution even more of a grind.

We can, in one $O(n)$ pass through the data structure, find any one number that constitutes the "best" choice according to some arbitrary criteria, so long as either there is always a "best" choice or we do not care which of several equally good choices we pick as "best", which is our case here.

Let us divide the full tree $T$ of size $n$ into the heavy subtree $H$ and the rest of the tree $R$. Using the analysis that got us to the $O(n \log n)$ solution, we only need to find 10 numbers. I will use the shorthand "pair closest to x" to mean the pair of numbers made up of the smallest number $\ge x$ and the largest number $\le x$, even though of course the 2 numbers closest to $x$ might both be greater or less than $x$.

  • $h = |H|$, the weight of $H$
  • $h_{half}$: the weight of the subtree of $H$ closest to $h \over 2$
  • 2 weights of subtrees of $H$: the pair closest to $n \over 3$.
  • 2 weights of subtrees of $R$: the pair closest to $n \over 3$
  • 2 weights of subtrees of $H$: For each of weights $w$ of the pair of subtrees in $R$ closest to $n \over 3$: the weight of the subtree closest to $(n-w)\over 2$
  • 2 weights of subtrees of $R$: For each of weights $w$ of the pair of subtrees in $H$ closest to $n \over 3$: the weight of the subtree closest to $(n-w)\over 2$

The best trisection is either going to be $h_{half}(h - h_{half})(n-h)$ or some combination of 1 of the 4 subtrees of $H$ with 1 of the 4 subtrees of $R$ found above. Since some of the subtrees could be the same (and we would know right away), we we do not necessarily need to find all 10 numbers. We could get away with just finding $h$ and 2 subtrees of weight exactly $n\over 3$ and not have to look further. So we find 3-10 numbers to make from 1 to 1+4+4 = 9 possible partitions to try and pick the best, all $O(n)$ or $O(1)$ operations, and we have and answer in $O(n)$ time.

Glossary

  • A hierarchical tree of size or weight $n$ = a tree graph $G = (V,E)$ of order $n$
  • is a collection of $n$ nodes = $|G|$ = $|V|$ vertices (singular: vertex) connected by
  • $n-1$ connections = $|V|-1$ edges = $|E|$ edges,
  • such that every node is reachable = connected to every other node by a single unique path, which means there are no
  • nodes with more than one parent = cycles.
  • The $order$ of a component or a graph is the number of vertices it contains.
  • The $degree$ of a node/vertex is the number of connections to the node = edges having the vertex as an endpoint.
  • The $weight$ of a node in a hierarchical tree is the number of nodes in the subtree rooted at that node, which equals the number of the node's descendants plus $1$ for the node itself.
  • Something is heavier than something else if it has greater weight. Likewise, something with less weight is said to be lighter.
  • A component is a maximal connected graph = a set of vertices such that there is a path from any vertex to any other vertex. A tree graph is a special kind of graph that is a single component with no cycles or parallel edges. It has the minimal number of edges possible to make the vertices a single component (completely connected).
  • A hierarchical tree is a specialization of a tree graph. It has a single node designated as the root. All nodes except the root have exactly $1$ parent. (The root does not have a parent). All nodes connected to a node except for its parent are said to be children of that node and have that node as their parent. Thus for all nodes, if $a$ is $b$'s parent, $b$ is $a$'s child, and all nodes connected to the root are children of the root. Children are said to be below their parents and parents are said to be above their children. All the children of a given node are called siblings of each other. A node's children's children are called its grandchildren, and names consistent with genealogy are likewise used to describe the rest of the relationships, except that a node with no children is called a leaf. A node's descendants are its children and all of their descendants.
  • A subtree of a tree is a tree itself, consisting of a node and all of its descendants. The subtree corresponding to the root node is the entire tree.
  • A tree graph may be rooted at a vertex. Any node/vertex may be chosen as the root.
  • To root a tree graph at a vertex $r$ means to map it into a hierarchical tree with $r$ as the root, all the neighbors of $r$ as the root's children and all their parents the root, and then recursively for every vertex $v$ starting with the children of the root, designating neighbors of $v$ except for its parent as its children and designating its children's parent as $v$, recursing on $v$'s children until all the vertices are mapped. At the end, every vertex in the tree graph corresponds to exactly 1 node in the tree, and every edge in the tree graph corresponds to exactly 1 parent-child connection, and so they can be considered interchangeable.
  • When you disconnect a node from its parent, it is said to become the root of a disjoint subtree, and can be referred to as the subtree rooted at that node.
  • Similarly, when you cut an edge of a tree graph, you split (or $bisect$) it into $2$ tree graphs, one containing those vertices that remain connected to the vertex on one side of the edge (and all the edges that connect them) and those that remain connected to the vertex on the other side of the edge (and all the edges that connect them). This is called a $bisection$. Remove $2$ edges and you $trisect$ the tree (a $trisection$).
  • A node with no children has no descendants and is called a leaf. A vertex of degree $1$ is called a leaf. It is possible to root a tree graph at a vertex of degree $1$, but in a hierarchical tree the root is not considered a leaf unless it is the only node in the tree.