Methods for Area Unit Data

Introduction

What is Area Unit Data?

  • Analysis of data associated with spatial zones or areas
  • Areas may be regular in shape and size, or irregular

Focus in Area Unit Data Analysis

  • Variation in an attribute across our spatial units
  • The spatial variation is not continuous
  • Spatial units are polygons
    • variation across polygons
    • no variation within polygons

Notation

Our substantive attribute of interest is \(Y\).

Our process is represented as:

\[ \{ Y(A_i), \ A_i \in A_1, A_2, \ldots, A_n \} \]

\[ A_1 \cup A_2 \cup \ldots \cup A_n = {R} \]

  • \(Y(A_i)\) is a set of random variables indexed by sub-regions
  • \(A_1, A_2, \ldots , A_n\) are sub-regions of \({R}\)

Areal Unit Data (Lattice)

Spatial Domain: \({R}\)

  • Discrete and fixed

  • Locations nonrandom

  • Locations countable

Examples of lattice data

  • Attributes collected by ZIP code

  • census tract

Lattice Data: Indexing

Site

  • Each location is now an area or site

  • One observation on \(Y\) for each site

  • Need a spatial index: \(Y(s_i)\)

\(Y(s_i)\)

  • \(s_i\) is a representative location within the site

  • e.g., centroid, largest city

  • Allows for measuring distances between sites

Lattice Data: County Per Capita Incomes

1969

Objectives

  • Infer whether there are a spatial trend or pattern in the attribute values recorded over the sub-regions
  • First order variation: Trend in the mean
  • Second order variation: Spatial dependence

Visualizing Area Unit Data

Choropleths

Interactivity

Analyzing Area Unit Data

Spatial Dependence

Hell might be a world without spatial dependence since it would be impossible to live there in any practical and meaningful way.

(longley2017geographical?)

Spatial Autocorrelation

  • Definition: The degree to which objects close to each other in space are also similar in other attributes.
  • Examples: Clustered patterns of disease, similar land uses in neighboring areas.
  • Measurement: Moran’s I, Geary’s C.

Spatial Autocorrelation (Homicide Rates 1969)

Area Unit Data in Python

Imports

import geopandas
import libpysal

Loading an example data set

south = libpysal.examples.load_example('South')

Finding out about the example

libpysal.examples.explain('South')

Creating a GeoDataFrame from a file

south_gdf = geopandas.read_file(south.get_path('south.shp'))

Plotting the geometries

south_gdf.plot()

Checking the Coordinate Reference System

south_gdf.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

Turning of the axis

ax = south_gdf.plot()
ax.set_axis_off();

Inspecting the GDF

south_gdf.shape
(1412, 70)

Inspecting the GeoSeries

south_gdf.geometry
0       POLYGON ((-80.62805 40.39816, -80.60204 40.480...
1       POLYGON ((-80.52625 40.16245, -80.5876 40.1750...
2       POLYGON ((-80.52517 40.02275, -80.73843 40.035...
3       POLYGON ((-80.52447 39.72113, -80.83248 39.718...
4       POLYGON ((-75.7727 39.38301, -75.79144 39.7237...
                              ...                        
1407    POLYGON ((-79.14433 36.54606, -79.21706 36.549...
1408    POLYGON ((-79.43775 37.61596, -79.45834 37.603...
1409    POLYGON ((-80.12475 37.1251, -80.14045 37.1283...
1410    POLYGON ((-76.39569 37.10771, -76.4027 37.0905...
1411    POLYGON ((-77.53178 38.56506, -77.72094 38.840...
Name: geometry, Length: 1412, dtype: geometry

Inspecting the Columns

south_gdf.columns
Index(['NAME', 'STATE_NAME', 'STATE_FIPS', 'CNTY_FIPS', 'FIPS', 'STFIPS',
       'COFIPS', 'FIPSNO', 'SOUTH', 'HR60', 'HR70', 'HR80', 'HR90', 'HC60',
       'HC70', 'HC80', 'HC90', 'PO60', 'PO70', 'PO80', 'PO90', 'RD60', 'RD70',
       'RD80', 'RD90', 'PS60', 'PS70', 'PS80', 'PS90', 'UE60', 'UE70', 'UE80',
       'UE90', 'DV60', 'DV70', 'DV80', 'DV90', 'MA60', 'MA70', 'MA80', 'MA90',
       'POL60', 'POL70', 'POL80', 'POL90', 'DNL60', 'DNL70', 'DNL80', 'DNL90',
       'MFIL59', 'MFIL69', 'MFIL79', 'MFIL89', 'FP59', 'FP69', 'FP79', 'FP89',
       'BLK60', 'BLK70', 'BLK80', 'BLK90', 'GI59', 'GI69', 'GI79', 'GI89',
       'FH60', 'FH70', 'FH80', 'FH90', 'geometry'],
      dtype='object')

Interactive Map

south_gdf.explore(column='HR60')
Make this Notebook Trusted to load map: File -> Trust Notebook

Describing a column

south_gdf.HR60.describe()
count    1412.000000
mean        7.292144
std         6.421018
min         0.000000
25%         3.213471
50%         6.245125
75%         9.956272
max        92.936803
Name: HR60, dtype: float64

Static Choropleth: HR60

ax = south_gdf.plot(column='HR60', scheme='Quantiles', k=5,
                    legend_kwds = {'loc': 'lower center'},
                    legend=True)
ax.set_axis_off();

How many states are there in this dataset

south_gdf.STATE_NAME.unique().shape
(17,)

How many counties?

south_gdf.shape[0]
1412

How many counties in each state?

south_gdf.groupby(by='STATE_NAME').count()
NAME STATE_FIPS CNTY_FIPS FIPS STFIPS COFIPS FIPSNO SOUTH HR60 HR70 ... BLK90 GI59 GI69 GI79 GI89 FH60 FH70 FH80 FH90 geometry
STATE_NAME
Alabama 67 67 67 67 67 67 67 67 67 67 ... 67 67 67 67 67 67 67 67 67 67
Arkansas 75 75 75 75 75 75 75 75 75 75 ... 75 75 75 75 75 75 75 75 75 75
Delaware 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 3 3 3
District of Columbia 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1
Florida 67 67 67 67 67 67 67 67 67 67 ... 67 67 67 67 67 67 67 67 67 67
Georgia 159 159 159 159 159 159 159 159 159 159 ... 159 159 159 159 159 159 159 159 159 159
Kentucky 120 120 120 120 120 120 120 120 120 120 ... 120 120 120 120 120 120 120 120 120 120
Louisiana 64 64 64 64 64 64 64 64 64 64 ... 64 64 64 64 64 64 64 64 64 64
Maryland 24 24 24 24 24 24 24 24 24 24 ... 24 24 24 24 24 24 24 24 24 24
Mississippi 82 82 82 82 82 82 82 82 82 82 ... 82 82 82 82 82 82 82 82 82 82
North Carolina 100 100 100 100 100 100 100 100 100 100 ... 100 100 100 100 100 100 100 100 100 100
Oklahoma 77 77 77 77 77 77 77 77 77 77 ... 77 77 77 77 77 77 77 77 77 77
South Carolina 46 46 46 46 46 46 46 46 46 46 ... 46 46 46 46 46 46 46 46 46 46
Tennessee 95 95 95 95 95 95 95 95 95 95 ... 95 95 95 95 95 95 95 95 95 95
Texas 254 254 254 254 254 254 254 254 254 254 ... 254 254 254 254 254 254 254 254 254 254
Virginia 123 123 123 123 123 123 123 123 123 123 ... 123 123 123 123 123 123 123 123 123 123
West Virginia 55 55 55 55 55 55 55 55 55 55 ... 55 55 55 55 55 55 55 55 55 55

17 rows × 69 columns

Which state had the highest median county homicide rate in 1960?

south_gdf[['STATE_NAME', 'HR60']].groupby(by='STATE_NAME').median()
HR60
STATE_NAME
Alabama 9.623977
Arkansas 4.704111
Delaware 4.228385
District of Columbia 10.471807
Florida 9.970306
Georgia 9.300076
Kentucky 5.235436
Louisiana 6.840286
Maryland 5.335208
Mississippi 8.919274
North Carolina 7.633043
Oklahoma 4.269126
South Carolina 7.509437
Tennessee 4.877751
Texas 4.326215
Virginia 6.672004
West Virginia 2.623226

Which county had the highest maximum county homicide rate in 1960?

south_gdf[['STATE_NAME', 'HR60']].groupby(by='STATE_NAME').max()
HR60
STATE_NAME
Alabama 24.903499
Arkansas 21.154427
Delaware 7.286472
District of Columbia 10.471807
Florida 40.744262
Georgia 53.304904
Kentucky 37.250885
Louisiana 18.243736
Maryland 14.327234
Mississippi 24.833923
North Carolina 25.660127
Oklahoma 17.088175
South Carolina 23.345940
Tennessee 20.894275
Texas 92.936803
Virginia 23.575639
West Virginia 11.482375

Intra-state dispersion

south_gdf[['STATE_NAME', 'HR60']].groupby(by='STATE_NAME').std()
HR60
STATE_NAME
Alabama 4.742337
Arkansas 4.574625
Delaware 1.815562
District of Columbia NaN
Florida 7.990692
Georgia 7.906488
Kentucky 6.354316
Louisiana 4.189146
Maryland 4.064360
Mississippi 4.972698
North Carolina 4.596952
Oklahoma 4.231132
South Carolina 4.018644
Tennessee 4.354979
Texas 8.223844
Virginia 4.826707
West Virginia 2.773659
sgdf = south_gdf[['STATE_NAME', 'HR60']].groupby(by='STATE_NAME').std()
cv = sgdf / south_gdf[['STATE_NAME', 'HR60']].groupby(by='STATE_NAME').mean() * 100
cv.sort_values(by='HR60', ascending=False)
HR60
STATE_NAME
Texas 144.992919
Kentucky 96.815524
West Virginia 93.234007
Arkansas 81.223752
Oklahoma 81.114430
Tennessee 75.426226
Georgia 73.774440
Maryland 71.898559
Florida 68.252692
Virginia 66.924041
Louisiana 59.994571
Mississippi 57.457024
North Carolina 57.013871
Alabama 49.070812
South Carolina 48.083524
Delaware 34.966796
District of Columbia NaN

Conclusion

Recap of Key Points

  • Definition of Area Unit Data
  • Objectives of Area Unit Data Analysis
  • Area Unit Data in Python

Questions

References