Exercise 4: Geodemographic Analysis of Los Angeles
Instructions
- Using the DataStore for geosnap at
/srv/data/geonsap
extract census tract data for Los Angeles using the ACS 2021 dataset. - Define a list called
cluster_variables
containing the following variables:- “median_household_income”,
- “median_home_value”,
- “p_edu_college_greater”,
- “p_edu_hs_less”,
- “p_nonhisp_white_persons”,
- “p_nonhisp_black_persons”,
- “p_hispanic_persons”,
- “p_asian_persons”,
- Create a new GeoDataFrame that subsets the
cluster_variables
. - Add the variable
n_total_pop
to the new GeoDataFrame. - Fill any NAN values with 0
- Plot a map of the geometries
- Create a Queen contiguity spatial weights object for this dataframe
- Create a new geodataframe by dropping the Channel Island tracts.
- Recreate the Queen weights for the new geodataframe
- Using small multiples, create choropleth maps using quintiles for each of the variables.
- Create a pairplot using seaborn for the cluster variables.
- Determine the number of neighborhoods that would be required if each neighborhood had an average of 8000 total population. Let this be the integer valued variable
n_hoods
- Using KMeans, define neighborhoods with
k=n_hoods
. - Using Agglomerative Clustering, define neighborhoods with the number of clusters equal to
n_hoods
- Add a contiguity constraint to Agglomerative clustering and generate new neighborhood definitions.
- Using MaxP, define neighborhoods using the cluster_variables with a threshold of 8000 for the
n_total_pop
variable. - Compare all four cluster solutions by using small multiples to plot four categorical maps of the clusters.
- Compare all four cluster solutions using the Rand Index. Comment on which solutions are most similar. Why might this be?
- Compare all four cluster solutions based on silhouette scores. Comment on the rankings of the four methods in terms of these scores. Provide an explanation for your findings.
Export your notebook as a pdf and submit on Canvas.
Due April 11 midnight