import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
12345)
np.random.seed(= 60
n = np.random.rand(60,2)
xy = pd.DataFrame(data=xy, columns=['x', 'y'])
df ='x', y='y', data=df); sns.scatterplot(x
Nearest Neighbor Methods
Spring 2023
Introduction
Now that we have been introduced to the different statistical models than are used to represent point processes, we turn to the methods that are used to link observed point patterns back to the process that generated the pattern.
More specifically, the challenge that we face is as follows. Given an observed point pattern, we wish to make inferences about the process that generated the observed pattern.
The general approach that is used is to construct measures that characterise the observed point pattern, and then compare these against the proporties of the theoretical process models we explored previously.
For example, if we assume that the underlying process is CSR, we know what kinds of properties the empirical patterns from such a process should exhibit. The critical thing to keep in mind is that we never actually see the underlying process - we only see outcomes of the process (i.e., the pattern).
This raises a number of challenges that we will need to address later on, but for now we are going to build up an inituition of the general strategy for analyzing point patterns.
Example Patterns
To begin we are going to create two different point patterns, one from a CSR process and one from a clustered process. We will use these two patterns to introduce the different statistical methods used to analyze the patterns. Here we are in the rare circumstance in which we actually know what process generated the pattern.
CSR n=60
import pointpats as pp
<string>:1: UserWarning:
Shapely 2.0 is installed, but because PyGEOS is also installed, GeoPandas will still use PyGEOS by default for now. To force to use and test Shapely 2.0, you have to set the environment variable USE_PYGEOS=0. You can do this before starting the Python process, or in your code before importing geopandas:
import os
os.environ['USE_PYGEOS'] = '0'
import geopandas
In a future release, GeoPandas will switch to using Shapely by default. If you are using PyGEOS directly (calling PyGEOS functions on geometries from GeoPandas), this will then stop working and you are encouraged to migrate from PyGEOS to Shapely 2.0 (https://shapely.readthedocs.io/en/latest/migration_pygeos.html).
= pp.PointPattern(xy) csr
/home/serge/miniconda3/envs/classes/lib/python3.10/site-packages/libpysal/cg/shapes.py:1492: FutureWarning:
Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.
/home/serge/miniconda3/envs/classes/lib/python3.10/site-packages/libpysal/cg/shapes.py:1208: FutureWarning:
Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.
csr.summary()
Point Pattern
60 points
Bounding rectangle [(0.00838829794155349,0.024676210429265266), (0.9940145858999619,0.9613067360728214)]
Area of window: 0.9231676681785911
Intensity estimate for window: 64.99361066054225
x y
0 0.929616 0.316376
1 0.183919 0.204560
2 0.567725 0.595545
3 0.964515 0.653177
4 0.748907 0.653570
= pp.Window([(0,0), (0,1), (1,1), (1,0), (0,0)])
w = pp.PoissonClusterPointProcess(w, n, 2, 0.05, 1, asPP=True, conditioning=False)
draw 0].plot(window=True, title='Contagion Point Process (2 parents)') draw.realizations[
/home/serge/miniconda3/envs/classes/lib/python3.10/site-packages/libpysal/cg/shapes.py:1492: FutureWarning:
Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.
/home/serge/miniconda3/envs/classes/lib/python3.10/site-packages/libpysal/cg/shapes.py:1208: FutureWarning:
Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.
/home/serge/miniconda3/envs/classes/lib/python3.10/site-packages/libpysal/cg/shapes.py:1923: FutureWarning:
Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.
/home/serge/miniconda3/envs/classes/lib/python3.10/site-packages/libpysal/cg/shapes.py:103: FutureWarning:
Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.
= draw.realizations[0] clustered
clustered.summary()
Point Pattern
60 points
Bounding rectangle [(0.47331760265312733,0.023178703349462502), (0.9696584457277277,0.6150208352748628)]
Area of window: 1.0
Intensity estimate for window: 60.0
x y
0 0.513060 0.541971
1 0.473318 0.578385
2 0.508373 0.536200
3 0.881716 0.060328
4 0.894221 0.059273
Quadrat Statistics
import pointpats.quadrat_statistics as qs
= qs.QStatistic(csr, shape='rectangle', nx=3, ny=3)
csr_qr csr_qr.plot()
csr_qr.chi2
10.8
csr_qr.chi2_pvalue
0.21329101843394052
= qs.QStatistic(clustered, shape='rectangle', nx=3, ny=3)
clustered_qr clustered_qr.plot()
clustered_qr.chi2
209.99999999999994
clustered_qr.chi2_pvalue
4.976940117448032e-41
Nearest Neighbor Distances
; plt.scatter(csr.points.x, csr.points.y)
import networkx as nx
= nx.DiGraph()
G for idx, point in enumerate(csr.points.values):
=point)
G.add_node(idx, pos
= nx.get_node_attributes(G, 'pos')
pos =2) nx.draw(G, pos, node_size
= csr.knn(1) # here we have the indices of the nearest neighbors (nidx) and the distances (nnd) nidx, nnd
for idx, neighbor in enumerate(nidx):
= (idx, neighbor[0])
edge
G.add_edges_from([edge])
= nx.get_node_attributes(G, 'pos')
pos =2) nx.draw(G, pos, node_size
Here we draw an arrow towards the nearest neighbor for a given observation.
In some cases, an observation is also the nearest neighbor to its nearest neighbor, or so-called “mutual nearest neighbors”. These pairs would appear at the the end of a segment with arrows on both ends.
If we look in the extreme northwest three points, we see one pair of mutual nearest neighbors, while the third point is not a mutual nearest neighbor.
It is also possible for a point to be a nearest neighbor to more than a single point, as is also seen in this case.
The nearest neighbor distances are the lengths of these segments.
Mean Nearest Neighbor Distance
Our first distance based statistic was suggested by Clark and Evans (1954) as the average nearest neighbor distances:
\[\bar{d}_{min} = \frac{1}{n} \sum_{i} d_{i, min} \]
where \(d_{i, min}\) is the nearest neighbor distance for observation \(i\), and \(n\) is the number of observations.
Under a CSR process, the expected value of this statistic is:
\[E[\bar{d}_{min}] = \frac{1}{2 \sqrt{\lambda}}\]
The logic of the statistic is to compare the observed mean nearest neighbor distance to this expectation forming their ratio:
\[R = \frac{\bar{d}_{min}}{\frac{1}{2 \sqrt{\lambda}}} = 2 \bar{d}_{min} \sqrt{\lambda}\]
Values of \(R<1\) are indicative of a tendancy towards clustering since the observed nearest neighbor distances are smaller than expected under CSR.
Values of \(R>1\) are indicative of a uniform or dispersed pattern.
# the mean nearest neighbor distance nnd.mean()
0.07360281110243255
# the intensity using the window for the point pattern csr.lambda_window
64.99361066054225
= nnd.mean()
dmin = csr.lambda_window
lam = 2 * dmin * lam**(1/2)
R R
1.1867513365512292
Let’s compare this to the same statistic based on the mean nearest neighbor distance for the clustered pattern:
= clustered.knn(1) # here we have the indices of the nearest neighbors (nidx) and the distances (nnd) nidx, nnd
= nnd.mean()
dmin = clustered.lambda_window
lam = 2 * dmin * lam**(1/2)
R R
0.13087134840769868
So we see that the \(R\) value for the clustered pattern is much below 1, while the R value for the CSR pattern is slightly over 1.
What we would like to know is if these values are significantly different from what we would expect if the underlying process that generated the patterns was CSR?
One approach is to use theoretical results on the distribution for the \(R\) statistic from Petrere (1985). The expected value of \(R\) is \(E[R]=1\). The variance of the \(R\) statistic is: \[ \sigma^2_R = \frac{0.2732}{n}\]
import scipy.stats
def R_test(pattern):
= pattern.knn(1) # here we have the indices of the nearest neighbors (nidx) and the distances (nnd)
nidx, nnd = pattern.lambda_window
lam = 2 * nnd.mean() * lam**(1/2)
R = nnd.shape[0]
n = 0.2732 / n
var = var**(1/2)
se = (R - 1 )/ se
stat = scipy.stats.norm.sf(abs(stat)) * 2
p_value return R, stat, p_value
R_test(csr)
(1.1867513365512292, 2.7675724348891184, 0.005647549379017388)
R_test(clustered)
(0.13087134840769868, -12.880103258909552, 5.825756575218341e-38)
Inference via simulation
import pointpats
import numpy
= pointpats.PoissonPointProcess(csr.window, n, 99, asPP=True)
samples
= np.array([R_test(samples.realizations[k]) for k in samples.realizations])
r_tests
r_tests
/home/serge/miniconda3/envs/classes/lib/python3.10/site-packages/libpysal/cg/shapes.py:1923: FutureWarning:
Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.
/home/serge/miniconda3/envs/classes/lib/python3.10/site-packages/libpysal/cg/shapes.py:103: FutureWarning:
Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.
array([[ 9.81105197e-01, -2.80012645e-01, 7.79467803e-01],
[ 9.28415089e-01, -1.06085681e+00, 2.88754981e-01],
[ 1.05435206e+00, 8.05473519e-01, 4.20546481e-01],
[ 1.11821360e+00, 1.75187335e+00, 7.97955886e-02],
[ 1.12387424e+00, 1.83576157e+00, 6.63929272e-02],
[ 1.16724422e+00, 2.47848561e+00, 1.31941433e-02],
[ 1.10700813e+00, 1.58581325e+00, 1.12781678e-01],
[ 1.09108797e+00, 1.34988348e+00, 1.77053361e-01],
[ 9.69002352e-01, -4.59371472e-01, 6.45967431e-01],
[ 1.03760497e+00, 5.57289124e-01, 5.77329905e-01],
[ 1.02704882e+00, 4.00851546e-01, 6.88529426e-01],
[ 1.10508460e+00, 1.55730742e+00, 1.19397514e-01],
[ 1.15793416e+00, 2.34051463e+00, 1.92571837e-02],
[ 1.04250723e+00, 6.29938355e-01, 5.28734918e-01],
[ 1.10758728e+00, 1.59439606e+00, 1.10847354e-01],
[ 1.07177650e+00, 1.06369612e+00, 2.87466383e-01],
[ 1.02551521e+00, 3.78124168e-01, 7.05338356e-01],
[ 9.46840660e-01, -7.87797970e-01, 4.30814888e-01],
[ 1.03757582e+00, 5.56857151e-01, 5.77625033e-01],
[ 9.90592969e-01, -1.39408044e-01, 8.89127716e-01],
[ 1.04856233e+00, 7.19672330e-01, 4.71726767e-01],
[ 1.10811705e+00, 1.60224699e+00, 1.09101003e-01],
[ 9.54688187e-01, -6.71501077e-01, 5.01901374e-01],
[ 1.10995667e+00, 1.62950930e+00, 1.03205247e-01],
[ 1.15162238e+00, 2.24697675e+00, 2.46415129e-02],
[ 9.31182120e-01, -1.01985062e+00, 3.07799310e-01],
[ 9.71814911e-01, -4.17690588e-01, 6.76173355e-01],
[ 1.14149435e+00, 2.09688392e+00, 3.60038522e-02],
[ 1.07165594e+00, 1.06190948e+00, 2.88276781e-01],
[ 1.14437709e+00, 2.13960485e+00, 3.23867145e-02],
[ 9.26536025e-01, -1.08870370e+00, 2.76284570e-01],
[ 1.00659687e+00, 9.77627322e-02, 9.22120701e-01],
[ 1.19240160e+00, 2.85130688e+00, 4.35399261e-03],
[ 1.02219381e+00, 3.28902433e-01, 7.42229435e-01],
[ 1.17405296e+00, 2.57938804e+00, 9.89755372e-03],
[ 1.15649472e+00, 2.31918276e+00, 2.03851289e-02],
[ 9.45617539e-01, -8.05924083e-01, 4.20286624e-01],
[ 1.05893614e+00, 8.73407565e-01, 3.82440969e-01],
[ 1.09368529e+00, 1.38837464e+00, 1.65022993e-01],
[ 1.07631454e+00, 1.13094776e+00, 2.58077078e-01],
[ 9.78530574e-01, -3.18167420e-01, 7.50357945e-01],
[ 1.07064731e+00, 1.04696203e+00, 2.95117090e-01],
[ 1.08860485e+00, 1.31308486e+00, 1.89154355e-01],
[ 1.19551229e+00, 2.89740597e+00, 3.76262510e-03],
[ 1.07442246e+00, 1.10290799e+00, 2.70067126e-01],
[ 1.09651586e+00, 1.43032247e+00, 1.52624489e-01],
[ 1.10573862e+00, 1.56699976e+00, 1.17114749e-01],
[ 9.18296882e-01, -1.21080418e+00, 2.25970464e-01],
[ 9.75959250e-01, -3.56273307e-01, 7.21635897e-01],
[ 1.12270070e+00, 1.81837021e+00, 6.90075678e-02],
[ 1.16124927e+00, 2.38964311e+00, 1.68647518e-02],
[ 1.14288622e+00, 2.11751071e+00, 3.42165274e-02],
[ 8.95782250e-01, -1.54446109e+00, 1.22476670e-01],
[ 1.05289923e+00, 7.83943286e-01, 4.33073389e-01],
[ 1.21757563e+00, 3.22437488e+00, 1.26248010e-03],
[ 1.03138599e+00, 4.65126525e-01, 6.41840852e-01],
[ 1.07305973e+00, 1.08271294e+00, 2.78935859e-01],
[ 1.11578890e+00, 1.71594046e+00, 8.61729410e-02],
[ 9.58111837e-01, -6.20764093e-01, 5.34754852e-01],
[ 9.37476396e-01, -9.26572231e-01, 3.54148678e-01],
[ 9.63173031e-01, -5.45759439e-01, 5.85231308e-01],
[ 1.03085430e+00, 4.57247129e-01, 6.47493427e-01],
[ 1.12588430e+00, 1.86554984e+00, 6.21043732e-02],
[ 1.04886527e+00, 7.24161732e-01, 4.68966450e-01],
[ 1.11417108e+00, 1.69196511e+00, 9.06526264e-02],
[ 9.09443001e-01, -1.34201478e+00, 1.79591204e-01],
[ 1.11892848e+00, 1.76246763e+00, 7.79903206e-02],
[ 1.14728332e+00, 2.18267389e+00, 2.90598342e-02],
[ 9.66649872e-01, -4.94234183e-01, 6.21140802e-01],
[ 1.01975820e+00, 2.92807770e-01, 7.69669089e-01],
[ 1.06807716e+00, 1.00887343e+00, 3.13035340e-01],
[ 8.87332323e-01, -1.66968527e+00, 9.49816494e-02],
[ 1.06116818e+00, 9.06485480e-01, 3.64678947e-01],
[ 9.73221131e-01, -3.96851022e-01, 6.91477323e-01],
[ 1.10831264e+00, 1.60514552e+00, 1.08461783e-01],
[ 1.15121609e+00, 2.24095569e+00, 2.50289451e-02],
[ 1.06442269e+00, 9.54715861e-01, 3.39721407e-01],
[ 1.09045697e+00, 1.34053244e+00, 1.80072304e-01],
[ 1.21321146e+00, 3.15969983e+00, 1.57931759e-03],
[ 1.03753327e+00, 5.56226553e-01, 5.78055989e-01],
[ 9.92381824e-01, -1.12898015e-01, 9.10111410e-01],
[ 9.89597771e-01, -1.54156441e-01, 8.77486386e-01],
[ 8.69471915e-01, -1.93436865e+00, 5.30678183e-02],
[ 1.07456759e+00, 1.10505883e+00, 2.69134096e-01],
[ 1.01501782e+00, 2.22557427e-01, 8.23879974e-01],
[ 1.06339883e+00, 9.39542766e-01, 3.47452146e-01],
[ 1.04251892e+00, 6.30111598e-01, 5.28621572e-01],
[ 1.01595310e+00, 2.36417893e-01, 8.13108413e-01],
[ 9.76030325e-01, -3.55220016e-01, 7.22424770e-01],
[ 1.02700974e+00, 4.00272457e-01, 6.88955852e-01],
[ 1.05167838e+00, 7.65850808e-01, 4.43765079e-01],
[ 1.03268084e+00, 4.84315702e-01, 6.28161834e-01],
[ 1.07237329e+00, 1.07254022e+00, 2.83477458e-01],
[ 1.17864464e+00, 2.64743472e+00, 8.11050176e-03],
[ 1.00495856e+00, 7.34836527e-02, 9.41421252e-01],
[ 9.57599581e-01, -6.28355512e-01, 5.29771074e-01],
[ 1.01309195e+00, 1.94016950e-01, 8.46162610e-01],
[ 1.06834012e+00, 1.01277043e+00, 3.11169828e-01],
[ 1.04123161e+00, 6.11034274e-01, 5.41176890e-01]])
= R_test(csr) R_csr
0] R_csr[
1.1867513365512292
0] >= R_csr[0]).sum() (r_tests[:,
4
= R_test(clustered) R_clustered
import pandas
import seaborn as sns
= pandas.DataFrame(data=r_tests, columns=['R', 'z', 'p'])
df
='kde', x="R")
sns.displot(df, kind0], 0, 0.1, color='g');
plt.axvline(R_csr[0], 0, 0.1, color='r'); plt.axvline(R_clustered[
df.describe()
R | z | p | |
---|---|---|---|
count | 99.000000 | 99.000000 | 99.000000 |
mean | 1.052286 | 0.774857 | 0.357230 |
std | 0.078631 | 1.165280 | 0.282182 |
min | 0.869472 | -1.934369 | 0.001262 |
25% | 0.991487 | -0.126153 | 0.099093 |
50% | 1.054352 | 0.805474 | 0.295117 |
75% | 1.108215 | 1.603696 | 0.581644 |
max | 1.217576 | 3.224375 | 0.941421 |
= pointpats.PoissonPointProcess(csr.window, n, 999, asPP=True)
samples
= np.array([R_test(samples.realizations[k]) for k in samples.realizations])
r_tests
r_tests
/home/serge/miniconda3/envs/classes/lib/python3.10/site-packages/libpysal/cg/shapes.py:103: FutureWarning:
Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.
array([[ 1.04108274, 0.60882811, 0.54263838],
[ 1.08726785, 1.2932711 , 0.19591731],
[ 0.96240692, -0.5571128 , 0.57745036],
...,
[ 1.18467435, 2.73679237, 0.00620414],
[ 1.05998867, 0.88900566, 0.37400004],
[ 0.92024492, -1.18193507, 0.23723146]])
= pandas.DataFrame(data=r_tests, columns=['R', 'z', 'p'])
df
='kde', x="R")
sns.displot(df, kind0], 0, 0.1, color='g');
plt.axvline(R_csr[0], 0, 0.1, color='r'); plt.axvline(R_clustered[
df.describe()
R | z | p | |
---|---|---|---|
count | 999.000000 | 999.000000 | 999.000000 |
mean | 1.057947 | 0.858756 | 0.357294 |
std | 0.077832 | 1.153438 | 0.296165 |
min | 0.805327 | -2.884969 | 0.000001 |
25% | 1.004814 | 0.071334 | 0.092286 |
50% | 1.061867 | 0.916842 | 0.283196 |
75% | 1.111235 | 1.648461 | 0.577714 |
max | 1.328334 | 4.865765 | 0.998563 |