How to randomly subset X% of selected points?

Here's a python function that will select random features in a layer based on percent, ignoring current selection:

def SelectRandomByPercent (layer, percent):
    #layer variable is the layer name in TOC
    #percent is percent as whole number  (0-100)
    if percent > 100:
        print "percent is greater than 100"
        return
    if percent < 0:
        print "percent is less than zero"
        return
    import random
    fc = arcpy.Describe (layer).catalogPath
    featureCount = float (arcpy.GetCount_management (fc).getOutput (0))
    count = int (featureCount * float (percent) / float (100))
    if not count:
        arcpy.SelectLayerByAttribute_management (layer, "CLEAR_SELECTION")
        return
    oids = [oid for oid, in arcpy.da.SearchCursor (fc, "OID@")]
    oidFldName = arcpy.Describe (layer).OIDFieldName
    path = arcpy.Describe (layer).path
    delimOidFld = arcpy.AddFieldDelimiters (path, oidFldName)
    randOids = random.sample (oids, count)
    oidsStr = ", ".join (map (str, randOids))
    sql = "{0} IN ({1})".format (delimOidFld, oidsStr)
    arcpy.SelectLayerByAttribute_management (layer, "", sql)

Copy/paste this into the python shell in ArcMap.

Then in the shell type SelectRandomByPercent ("layer", num), where layer is the name of your layer, and num is a whole number of your percent.

Random selection

A variation to find a subset selection as asked:

def SelectRandomByPercent (layer, percent):
    #layer variable is the layer name in TOC
    #percent is percent as whole number  (0-100)
    if percent > 100:
        print "percent is greater than 100"
        return
    if percent < 0:
        print "percent is less than zero"
        return
    import random
    featureCount = float (arcpy.GetCount_management (layer).getOutput (0))
    count = int (featureCount * float (percent) / float (100))
    if not count:
        arcpy.SelectLayerByAttribute_management (layer, "CLEAR_SELECTION")
        return
    oids = [oid for oid, in arcpy.da.SearchCursor (layer, "OID@")]
    oidFldName = arcpy.Describe (layer).OIDFieldName
    path = arcpy.Describe (layer).path
    delimOidFld = arcpy.AddFieldDelimiters (path, oidFldName)
    randOids = random.sample (oids, count)
    oidsStr = ", ".join (map (str, randOids))
    sql = "{0} IN ({1})".format (delimOidFld, oidsStr)
    arcpy.SelectLayerByAttribute_management (layer, "", sql)

Finally, one more variation to select a layer by a count, instead of a percent:

def SelectRandomByCount (layer, count):
    import random
    layerCount = int (arcpy.GetCount_management (layer).getOutput (0))
    if layerCount < count:
        print "input count is greater than layer count"
        return
    oids = [oid for oid, in arcpy.da.SearchCursor (layer, "OID@")]
    oidFldName = arcpy.Describe (layer).OIDFieldName
    path = arcpy.Describe (layer).path
    delimOidFld = arcpy.AddFieldDelimiters (path, oidFldName)
    randOids = random.sample (oids, count)
    oidsStr = ", ".join (map (str, randOids))
    sql = "{0} IN ({1})".format (delimOidFld, oidsStr)
    arcpy.SelectLayerByAttribute_management (layer, "", sql)

Generally, I also recommend using the spatial ecology tools as discussed by blah238.

However, another method you could try would be to add an attribute called Random to store a random number: enter image description here

Then, using the field calculator on that attribute, with the Python Parser, use the following codeblock:

import random
def rand():
  return random.random()

See image below:

This will create random values between 0 and 1. Then, if you want to select 20% of the features, you could select features where the Random value is less than 0.2. Of course, this will work better with many features. I created a feature class with only 7 features as a test and there were no values less than 0.2. However, it looks like you have plenty of features, so that shouldn't matter.

enter image description here


There is also an earlier Select features at random script from @StephenLead available for ArcGIS Desktop. Although written, I think, for ArcGIS 9.x, and last modified in 2008, I used it in about 2010 at 10.0, and it still worked well.