Exploding overlapping to new non-overlapping polygons?

This is a graph coloring problem.

Recall that a graph coloring is an assignment of a color to the vertices of a graph in such a way that no two vertices which share an edge will also have the same color. Specifically, the (abstract) vertices of the graph are the polygons. Two vertices are connected with an (undirected) edge whenever they intersect (as polygons). If we take any solution to the problem--which is a sequence of (say k) disjoint collections of the polygons--and assign a unique color to each collection in the sequence, then we will have obtained a k-coloring of the graph. It is desirable to find a small k.

This problem is pretty hard and remains unsolved for arbitrary graphs. Consider an approximate solution that's simple to code. A sequential algorithm ought to do. The Welsh-Powell algorithm is a greedy solution based on a descending ordering of the vertices by degree. Translated to the language of the original polygons, first sort the polygons in descending order of the number of other polygons they overlap. Working in order, give the first polygon an initial color. In each successive step, try to color the next polygon with an existing color: that is, choose a color that is not already used by any of that polygon's neighbors. (There are many ways to choose among the available colors; try either the one that has been least used or else choose one randomly.) If the next polygon cannot be colored with an existing color, create a new color and color it with that.

Once you have achieved a coloring with a small number of colors, perform zonalstats color by color: by construction, you're guaranteed that no two polygons of a given color overlap.


Here's sample code in R. (Python code wouldn't be much different.) First, we describe overlaps among the seven polygons shown.

Map of seven polygons

edges <- matrix(c(1,2, 2,3, 3,4, 4,5, 5,1, 2,6, 4,6, 4,7, 5,7, 1,7), ncol=2, byrow=TRUE)

That is, polygons 1 and 2 overlap, and so do polygons 2 and 3, 3 and 4, ..., 1 and 7.

Sort the vertices by descending degree:

vertices <- unique(as.vector(edges))
neighbors <- function(i) union(edges[edges[, 1]==i,2], edges[edges[, 2]==i,1])
nbrhoods <- sapply(vertices, neighbors)
degrees <- sapply(nbrhoods, length)
v <- vertices[rev(order(degrees))]

A (crude) sequential coloring algorithm uses the earliest available color not already used by any overlapping polygon:

color <- function(i) {
  n <- neighbors(i)
  candidate <- min(setdiff(1:color.next, colors[n]))
  if (candidate==color.next) color.next <<- color.next+1
  colors[i] <<- candidate
}

Initialize the data structures (colors and color.next) and apply the algorithm:

colors <- rep(0, length(vertices))
color.next <- 1
temp <- sapply(v, color)

Split the polygons into groups according to color:

split(vertices, colors)

The output in this example uses four colors:

$`1`
[1] 2 4

$`2`
[1] 3 6 7

$`3`
[1] 5

$`4`
[1] 1

Four-coloring of the polygons

It has partitioned the polygons into four non-overlapping groups. In this case the solution is not optimal ({{3,6,5}, {2,4}, {1,7}} is a three-coloring for this graph). In general the solution it gets shouldn't be too bad, though.


The methodology recommended by #whuber inspired me to take a new direction, and here is my arcpy solution, in two functions. The first, called countOverlaps, make two fields, "overlaps" and "ovlpCount" to record for each poly which polys overlapped with it, and how many overlaps occurred. The second function, explodeOverlaps, creates a third field, "expl", which gives a unique integer to each group of non-overlapping polys. The user can then export new fc's based on this field. The process is broken into two functions because I think the countOverlaps tool can prove useful by itself. Please excuse the sloppiness of the code (and the careless naming convention), as it's pretty preliminary, but it works. Also make sure that the "idName" field represents a field with unique IDs (only tested with integer IDs). Thank you whuber for providing me with the framework necessary to approach this problem!

def countOverlaps(fc,idName):
    intersect = arcpy.Intersect_analysis(fc,'intersect')
    findID = arcpy.FindIdentical_management(intersect,"explFindID","Shape")
    arcpy.MakeFeatureLayer_management(intersect,"intlyr")
    arcpy.AddJoin_management("intlyr",arcpy.Describe("intlyr").OIDfieldName,findID,"IN_FID","KEEP_ALL")
    segIDs = {}
    featseqName = "explFindID.FEAT_SEQ"
    idNewName = "intersect."+idName

    for row in arcpy.SearchCursor("intlyr"):
        idVal = row.getValue(idNewName)
        featseqVal = row.getValue(featseqName)
        segIDs[featseqVal] = []
    for row in arcpy.SearchCursor("intlyr"):
        idVal = row.getValue(idNewName)
        featseqVal = row.getValue(featseqName)
        segIDs[featseqVal].append(idVal)

    segIDs2 = {}
    for row in arcpy.SearchCursor("intlyr"):
        idVal = row.getValue(idNewName)
        segIDs2[idVal] = []

    for x,y in segIDs.iteritems():
        for segID in y:
            segIDs2[segID].extend([k for k in y if k != segID])

    for x,y in segIDs2.iteritems():
        segIDs2[x] = list(set(y))

    arcpy.RemoveJoin_management("intlyr",arcpy.Describe(findID).name)

    if 'overlaps' not in [k.name for k in arcpy.ListFields(fc)]:
        arcpy.AddField_management(fc,'overlaps',"TEXT")
    if 'ovlpCount' not in [k.name for k in arcpy.ListFields(fc)]:
        arcpy.AddField_management(fc,'ovlpCount',"SHORT")

    urows = arcpy.UpdateCursor(fc)
    for urow in urows:
        idVal = urow.getValue(idName)
        if segIDs2.get(idVal):
            urow.overlaps = str(segIDs2[idVal]).strip('[]')
            urow.ovlpCount = len(segIDs2[idVal])
        urows.updateRow(urow)

def explodeOverlaps(fc,idName):

    countOverlaps(fc,idName)

    arcpy.AddField_management(fc,'expl',"SHORT")

    urows = arcpy.UpdateCursor(fc,'"overlaps" IS NULL')
    for urow in urows:
        urow.expl = 1
        urows.updateRow(urow)

    i=1
    lyr = arcpy.MakeFeatureLayer_management(fc)
    while int(arcpy.GetCount_management(arcpy.SelectLayerByAttribute_management(lyr,"NEW_SELECTION",'"expl" IS NULL')).getOutput(0)) > 0:
        ovList=[]
        urows = arcpy.UpdateCursor(fc,'"expl" IS NULL','','','ovlpCount D')
        for urow in urows:
            ovVal = urow.overlaps
            idVal = urow.getValue(idName)
            intList = ovVal.replace(' ','').split(',')
            for x in intList:
                intList[intList.index(x)] = int(x)
            if idVal not in ovList:
                urow.expl = i
            urows.updateRow(urow)
            ovList.extend(intList)
        i+=1

It's been awhile, but I used this code for my own application and it's been working great — thank you. I re-wrote part of it to update it, apply it to lines (with a tolerance) and speed it up significantly (below — I'm running it on 50 million intersecting buffers and it only takes a couple of hours).

def ExplodeOverlappingLines(fc, tolerance, keep=True):
        print('Buffering lines...')
        idName = "ORIG_FID"
        fcbuf = arcpy.Buffer_analysis(fc, fc+'buf', tolerance, line_side='FULL', line_end_type='FLAT')
        print('Intersecting buffers...')
        intersect = arcpy.Intersect_analysis(fcbuf,'intersect')

        print('Creating dictionary of overlaps...')
        #Find identical shapes and put them together in a dictionary, unique shapes will only have one value
        segIDs = defaultdict(list)
        with arcpy.da.SearchCursor(intersect, ['Shape@WKT', idName]) as cursor:
            x=0
            for row in cursor:
                if x%100000 == 0:
                    print('Processed {} records for duplicate shapes...'.format(x))
                segIDs[row[0]].append(row[1])
                x+=1

        #Build dictionary of all buffers overlapping each buffer
        segIDs2 = defaultdict(list)
        for v in segIDs.values():
            for segID in v:
                segIDs2[segID].extend([k for k in v if k != segID and k not in segIDs2[segID]])

        print('Assigning lines to non-overlapping sets...')
        grpdict = {}
        # Mark all non-overlapping one to group 1
        for row in arcpy.da.SearchCursor(fcbuf, [idName]):
            if row[0] in segIDs2:
                grpdict[row[0]] = None
            else:
                grpdict[row[0]] = 1

        segIDs2sort = sorted(segIDs2.items(), key=lambda x: (len(x[1]), x[0])) #Sort dictionary by number of overlapping features then by keys
        i = 2
        while None in grpdict.values(): #As long as there remain features not assigned to a group
            print(i)
            ovset = set()  # list of all features overlapping features within current group
            s_update = ovset.update
            for rec in segIDs2sort:
                if grpdict[rec[0]] is None: #If feature has not been assigned a group
                    if rec[0] not in ovset: #If does not overlap with a feature in that group
                        grpdict[rec[0]] = i  # Assign current group to feature
                        s_update(rec[1])  # Add all overlapping feature to ovList
            i += 1 #Iterate to the next group

        print('Writing out results to "expl" field in...'.format(fc))
        arcpy.AddField_management(fc, 'expl', "SHORT")
        with arcpy.da.UpdateCursor(fc,
                                   [arcpy.Describe(fc).OIDfieldName, 'expl']) as cursor:
            for row in cursor:
                if row[0] in grpdict:
                    row[1] = grpdict[row[0]]
                    cursor.updateRow(row)

        if keep == False:
            print('Deleting intermediate outputs...')
            for fc in ['intersect', "explFindID"]:
                arcpy.Delete_management(fc)