# How to find the "center" of a subset of vertices in a graph?

Here's an approximation algorithm that should offer a decent time-quality trade-off curve. The first part is due to Teofilo F. Gonzalez (Clustering to minimize the maximum intercluster distance, 1985), but I can't find a citation for the second offhand, probably because it's too thin to be a main result.

Let k be the number of times you're willing to run Dijkstra's algorithm (truncated after it reaches all of S, as you suggest). Gonzalez's algorithm runs Dijkstra k − 1 times to partition the terminals S into k clusters so as to 2-approximately minimize the maximum cluster radius. Conveniently, it produces as a byproduct k well-separated centers C and shortest path trees for k − 1 of them. We run Dijkstra one more time and then choose the optimum 1-center with respect to C. This center satisfies

approximate objective ≤ optimal objective + maximum cluster radius.


It's a little tricky to quantify an approximation factor here in terms of k. The key is bounded doubling dimension, which I'll illustrate by assuming Euclidean distances for a moment. Suppose that we're trying to find the 1-center of a disk of radius 1. The optimum is the center of the disk. How many disjoint radius-r balls centered inside the disk can there be? Their area is contained inside a disk of radius (1+r), which has area π(1+r)², so at most (π(1+r)²)/πr² = (1/r + 1)². The maximum cluster radius will be 2r, or in terms of k, on the order of 4/√k times as large as the optimal objective, so k = 100 will give you a solution within about 20% of optimal. Doubling dimension basically uses this argument as a definition.

For reference, Gonzalez's algorithm is

1. Choose any point in S.

2. Repeat k − 1 times: run Dijkstra from the most recently chosen point, choose the next point in S to maximize the minimum distance to all previously chosen points.

Then we run Dijkstra from the most recently chosen point one more time and then select the optimal 1-center for the chosen points.

I have no idea how to analyze this algorithm and no citation, but it seems like it might work.

1. Choose a starting center. Your current approximation should work well for this.

2. Compute a shortest path tree to S from the current center.

3. Prune the tree so that all leaves belong to S and compute its center.

4. If this center is better than the root, go back to Step 2.

The two formal properties I can really declare about this algorithm are that it always terminates and that it never does worse than the starting center (because if the center of the tree is not the root, then it must be a better center than the root, because the missing edges can improve it but not the root).

To compute the center of a tree efficiently, label each node with the maximum distance to the root over all of its descendants (in linear time by visiting the nodes in post order). Then descend in the tree via the child with the maximum label as long as it improves the radius. Everything in the child's subtree will get closer by the length of the child's parent edge; everything else will get further by the same amount.