How can I use Mathematica to sort the US Congressional Districts by Perimeter÷Area

The US congressional districts are available as geographic entities in the form

Entity["USCongressionalDistrict", "Illinois:District7"]

As such, in principle you can enumerate them using

GeoEntities[Entity["Country", "UnitedStates"], "USCongressionalDistrict"]

but this is liable to timing out because you're asking for a lot of data, so instead you can break down the query into per-state lookups, giving you something like

Length[
 districts = Flatten[GeoEntities[#, "USCongressionalDistrict"] & /@ 
    GeoEntities[Entity["Country", "UnitedStates"], "USState"]]
 ]
(*449*)

(Or, as Michael pointed out, using the built-in EntityList["USCongressionalDistrict"] is way better.)


To get the data you want, you can use GeoArea. To get the perimeter you can apply GeoDistance to the data but you need to fish it out first. You can get the shape of the district out using EntityValue[#, "Polygon"] and this will return things of the form

Polygon[ GeoPosition[ {{lat1, lon1}, {lat2, lon2}, …, } ] ]

and you can't run GeoDistance on the Polygon - only on the GeoPosition object inside it or the lists of latitudes and longitudes inside that. Thus you can use

perimeter[polygon_] := GeoDistance[polygon[[1]]]

and to match it up you can define

area[district_] := EntityValue[district, "Area"]

To do the calculation, then, you can run

polygons = EntityValue[districts, "Polygon"]

which is relatively fast.


There is a slight complication with the data, though, and it's that not all districts appear as a single polygon. In fact, if you define districts = EntityList["USCongressionalDistrict"] and polygons as above, then there are

Select[polygons, Length[#] > 1 &] // Length

= 15 districts with multiple polygons. This is a bit surprising, since US congressional districts are meant to be territorially contiguous (and hence the gerrymandering stuff), but 11 of these have islands, which is fairly reasonable. Nevertheless, there are districts like the Tennessee 2nd district which are not, in fact, contiguous, even though their exclaves are separated only by land. To visualize these 15 districts, use

multiPolygonDistricts = Pick[districts, Length[#] > 1 & /@ polygons]
GeoGraphics[EntityValue[#, "Polygon"], PlotLabel -> #, 
     ImageSize -> {{250}, {250}}] & /@ multiPolygonDistricts

In addition to this, there are some districts which don't have polygon data available, such as

EntityValue[Entity["USCongressionalDistrict","Pennsylvania:District19"],"Polygon"]
(*Missing["NotAvailable"]*)

There are 12 such districts and you can find them using

Pick[districts, MissingQ /@ polygons]

These turn out to be historical districts (like the aforementioned PA 19th) that are no longer in use, and as of November 2016 and MM v11 they are no longer present in the data.


There are also some irregularities in the formatting of the polygon data, so for example

EntityValue[Entity["USCongressionalDistrict", "Tennessee:District9"], "Polygon"]

has an output of the form {Polygon[_]}, but

EntityValue[Entity["USCongressionalDistrict", "Texas:District1"], "Polygon"]

has an output of the form {{Polygon[_]}} with a deeper list, whereas

EntityValue[Entity["USCongressionalDistrict", "Alaska:District0"], "Polygon"]

returns a straight Polygon[_], though its insides are all messed up (i.e. it contains multiple polygons inside it, instead of a list of polygons, which will definitely mess with the perimeter calculator above); either way, if you're studying gerrymandering you presumably want to leave out this district as it's the entirety of Alaska. Again, this is buggy data which I'll leave to you to report.

EDIT: some issues with the data seem to have been resolved, but it's not necessarily all good yet. It's also not clear to me whether the fix is version-dependent (or fixed upstream at the server), either. So: watch out for this if you want to implement this.


To get clean data, then, you can do

cleanPolygons = Flatten /@ DeleteCases[DeleteMissing[polygons], Polygon[_]];
cleanDistricts = DeleteCases[Pick[districts, Not@*MissingQ /@ polygons], 
                   Entity["USCongressionalDistrict", "Alaska:District0"]];

You can now calculate the perimeters using

perimeters = Total /@ Map[perimeter, cleanPolygons, {2}]

which is now quite fast as well (the previous implementation was slowing the whole thing down). Calculating the areas as

areas = EntityValue[cleanDistricts, "Area"]

you finally have enough to get your rankings: doing

TableForm[
 (data = SortBy[{EntityValue[cleanDistricts, "Name"], perimeters^2/areas
           }\[Transpose], Minus@*Last])[[1 ;; 5]]
 ]

gets you for the top five

District                                        p^2/A
"North Carolina 12th Congressional District"    365.611
"Florida 5th Congressional District"            244.518
"North Carolina 1st Congressional District"     234.952
"North Carolina 4th Congressional District"     228.47
"Maryland 3rd Congressional District"           226.788

And, indeed, you get some pretty heavy gerrymandering going on:

Mathematica graphics

As an interesting trivia, note that the fourth place on this list, Maryland's 3rd district, really stretches the definition for "contiguous" - if you zoom in on the map into the Baltimore harbour, there's not a lot of district land connecting Federal Hill and Little Italy.

It's also important to note that it matters what year the data is from: Mathematica seems to be using the 2014 electoral districts, which have since changed in some states (North Carolina in particular, but also Florida and Virginia) precisely because they were deemed to be too gerrymandered. For a good illustration of the changes see this NPR animation for the NC redistricting.


First (update: I guess the missing ones should be deleted),

alldistricts = DeleteMissing@ EntityList["USCongressionalDistrict"];
allpolygons = #["Polygon"] & /@ alldistricts (* // DeleteMissing *);

Then find the GeoArea[poly] and perimeter* of them. Note that you might want to compute a dimensionless quantity such as perimeter^2/area.

Update: The perimeter was a little tricker than my original blithe remark led one to believe. Here is an improvement over GeoDistance, at least in terms of reliability:

perimeters = Total[allpolygons /. Polygon :> GeoLength@*GeoPath, {2}]
areas = Total[GeoArea /@ allpolygons, {2}]  (* why not put in the areas, too *)