Creating a subset of XY data by most recent date

Maybe you could use the dissolve geoprocessing tool with Point as the dissolve field and specify Date as the statistics field with the MAX statistic type.

http://desktop.arcgis.com/en/arcmap/latest/tools/data-management-toolbox/dissolve.htm

It has many statistics type available too which has the range, first, last types too.


I suggest you to use GeoPandas python package. You can use it within ArcMap Python Console if you install it.

Sample data:

enter image description here


import geopandas as gp
from datetime import datetime

# open data file
df = gp.read_file("C:/PATH/TO/YOUR/FILE.shp")

enter image description here


# convert "SAMP_DATE" to datetime and add to new DATE column
# as far as I know, GeoPandas gets date type as string
df["DATE"] = df["SAMP_DATE"].apply(lambda x:datetime.strptime(x, '%m/%d/%Y'))

Note: If date format in your data is different, change '%m/%d/%Y' pattern. It seems the pattern is '%m/%d/%Y' in your data. Mine is '%Y-%m-%d'.

enter image description here


# sort, group, get first (most recent)
df_recent = df.sort_values(['X', 'Y', 'DATE', 'SAMP_TIME'], ascending=False) \
              .groupby(['X', 'Y']) \ # group by points having same coordinates
              .first() \ # get first row in group
              .reset_index()

Result of .sort_values() and .groupby():

enter image description here

Note: groupby preserves the order of rows within each group. SAMP_TIME is string and '15:50' string is bigger than '10:30' string. So, you don't need to convert that string to datetime etc.

Result of .first(): (most recent data you want)

enter image description here


# remove DATE column
df_recent.drop("DATE", axis=1, inplace=True)

# convert DataFrame to GeoDataFrame
df_recent= gp.GeoDataFrame(df_recent, crs=df.crs) 

# Save the most recent data
df_recent.to_file("C:/PATH/TO/NEW/FILE.shp")

All code without comments:

import geopandas as gp
from datetime import datetime

df = gp.read_file("C:/PATH/TO/YOUR/FILE.shp")
df["DATE"] = df["SAMP_DATE"].apply(lambda x:datetime.strptime(x, '%m/%d/%Y'))
df_recent = df.sort_values(['X', 'Y', 'DATE', 'SAMP_TIME'], ascending=False).groupby(['X', 'Y']).first().reset_index()

df_recent.drop("DATE", axis=1, inplace=True)
df_recent= gp.GeoDataFrame(df_recent, crs=df.crs)
df_recent.to_file("C:/PATH/TO/NEW/FILE.shp")

Here are steps I followed and got correct results. Do let me know me know how it works for you as it was tested with my own sample data

  1. Create new field as StringTime
  2. calculate StringTime = datetime.datetime.strptime( !Date! ,"%d-%m-%Y").strftime('%Y%m%d')
  3. Dissolve layer using field X_Point and Y_Point and in statistics use Max(StringTime) and Max(Time)

enter image description here