Exercise 4: Geodemographic Analysis of Los Angeles

Instructions

Using the DataStore for geosnap at /srv/data/geonsap extract census tract data for Los Angeles using the ACS 2021 dataset.
Define a list called cluster_variables containing the following variables:
- “median_household_income”,
- “median_home_value”,
- “p_edu_college_greater”,
- “p_edu_hs_less”,
- “p_nonhisp_white_persons”,
- “p_nonhisp_black_persons”,
- “p_hispanic_persons”,
- “p_asian_persons”,
Create a new GeoDataFrame that subsets the cluster_variables.
Add the variable n_total_pop to the new GeoDataFrame.
Fill any NAN values with 0
Plot a map of the geometries
Create a Queen contiguity spatial weights object for this dataframe
Create a new geodataframe by dropping the Channel Island tracts.
Recreate the Queen weights for the new geodataframe
Using small multiples, create choropleth maps using quintiles for each of the variables.
Create a pairplot using seaborn for the cluster variables.
Determine the number of neighborhoods that would be required if each neighborhood had an average of 8000 total population. Let this be the integer valued variable n_hoods
Using KMeans, define neighborhoods with k=n_hoods.
Using Agglomerative Clustering, define neighborhoods with the number of clusters equal to n_hoods
Add a contiguity constraint to Agglomerative clustering and generate new neighborhood definitions.
Using MaxP, define neighborhoods using the cluster_variables with a threshold of 8000 for the n_total_pop variable.
Compare all four cluster solutions by using small multiples to plot four categorical maps of the clusters.
Compare all four cluster solutions using the Rand Index. Comment on which solutions are most similar. Why might this be?
Compare all four cluster solutions based on silhouette scores. Comment on the rankings of the four methods in terms of these scores. Provide an explanation for your findings.

Export your notebook as a pdf and submit on Canvas.

Due April 11 midnight