Connecting line features and determining the length of the longest line

First, a little background to indicate why this is not a hard problem. The flow through a river guarantees that its segments, if correctly digitized, can always be oriented to form a directed acyclic graph (DAG). In turn, a graph can be linearly ordered if and only if it is a DAG, using a technique known as a topological sort. Topological sorting is fast: its time and space requirements are both O(|E| + |V|) (E = number of edges, V = number of vertices), which is as good as it gets. Creating such a linear ordering would make it easy to find the major stream bed.

Here, then, is a sketch of an algorithm. The stream's mouth lies along its major bed. Move upstream along each branch attached to the mouth (there may be more than one, if the mouth is a confluence) and recursively find the major bed leading down to that branch. Select the branch for which the total length is the greatest: that's your "backlink" along the major bed.

To make this clearer, I offer some (untested) pseudocode. The input is a set of line segments (or arcs) S (comprising the digitized stream), each having two distinct endpoints start(S) and end(S) and a positive length, length(S); and the river mouth p, which is a point. The output is a sequence of segments uniting the mouth with the most distant upstream point.

We will need to work with "marked segments" (S,p). These consist of one of the segments S along with one of its two endpoints, p. We will need to find all segments S that share an endpoint with a probe point q, mark those segments with their other endpoints, and return the set:

Procedure Extract(q: point, A: set of segments): Set of marked segments.

When no such segment can be found, Extract must return the empty set. As a side effect, Extract must remove all the segments it is returning from the set A, thereby modifying A itself.

I'm not giving an implementation of Extract: your GIS will provide the capability to select segments S sharing an endpoint with q. Marking them is simply a matter of comparing both start(S) and end(S) with q and returning whichever of the two endpoints does not match.

Now we're ready to solve the problem.

Procedure LongestUpstreamReach(p: point, A: set of segments): (Array of segments, length)
    A0 = A                        // Optional: preserves A
    C = Extract(p, A0)            // Removes found segments from the set A0!
    L = 0; B = empty array
    For each (S,q) in C:          // Loop over the segments meeting point p
        (B0, M) = LongestUpstreamReach(q, A0)
        If (length(S) + M > L) then
            B = append(S, B0)
            L = length(S) + M
        End if
    End for
    Return (B, L)
End LongestUpstreamReach

The procedure "append(S, B0)" sticks the segment S at the end of the array B0 and returns the new array.

(If the stream really is a tree: no islands, lakes, braids, etc--then you can dispense with the step of copying A into A0.)

The original question is answered by forming the union of the segments returned by LongestUpstreamReach.

To illustrate, let's consider the stream in the original map. Suppose it is digitized as a collection of seven arcs. Arc a goes from the mouth at point 0 (top of the map, at right in the figure below, which is rotated) upstream to the first confluence at point 1. It's a long arc, say 8 units long. Arc b branches to the left (in the map) and is short, about 2 units long. Arc c branches to the right and is about 4 units long, etc. Letting "b", "d", and "f" denote the left-side branches as we go from top to bottom on the map, and "a", "c", "e", and "g" the other branches, and numbering the vertices 0 through 7, we can abstractly represent the graph as the collection of arcs

A = {a=(0,1), b=(1,2), c=(1,3), d=(3,4), e=(3,5), f=(5,6), g=(5,7)}

I will suppose they have lengths 8, 2, 4, 1, 2, 2, 2 for a through g, respectively. The mouth is vertex 0.

Figure

The first example is the call to Extract(5, {f, g}). It returns the set of marked segments {(f,6), (g,7)}. Note that vertex 5 is at the confluence of arcs f and g (the two arcs at the bottom of the map) and that (f,6) and (g,7) mark each of these arcs with their upstream endpoints.

The next example is the call to LongestUpstreamReach(0, A). The first action it takes is the call to Extract(0, A). This returns a set containing the marked segment (a,1) and it removes segment a from the set A0, which now equals {b,c,d,e,f,g}. There is one iteration of the loop, where (S,q) = (a,1). During this iteration a call is made to LongestUpstreamReach(1, A0). Recursively, it must return either the sequence (g,e,c) or (f,e,c): both are equally valid. The length (M) it returns is 4+2+2 = 8. (Note that LongestUpstreamReach does not modify A0.) At the end of the loop, segment a has been appended to the stream bed and the length has been increased to 8+8 = 16. Thus the first return value consists of either the sequence (g,e,c,a) or (f,e,c,a), with a length of 16 in either case for the second return value. This shows how LongestUpstreamReach just moves upstream from the mouth, selecting at each confluence the branch with the longest distance yet to go, and keeps track of the segments traversed along its route.

A more efficient implementation is possible when there are many braids and islands, but for most purposes there will be little wasted effort if LongestUpstreamReach is implemented exactly as shown, because at each confluence there is no overlap among the searches in the various branches: the computing time (and stack depth) will be directly proportional to the total number of segments.


The Unsplit Line tool can be helpful for what you are trying to do, although you will need to divine some method to differentiate one stream branch from another (for the dissolve field). This assumes you have an ArcInfo license, though.

If you do not have such a license, you might consider the ArcObjects approach taking the XY of each vertex, filling an IPointCollection with them, and then creating an IGeometry as a PolyLineClass.