import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
np.random.seed(12345)
n = 60
xy = np.random.rand(60,2)
df = pd.DataFrame(data=xy, columns=['x', 'y'])
sns.scatterplot(x='x', y='y', data=df);
Spring 2023
Now that we have been introduced to the different statistical models than are used to represent point processes, we turn to the methods that are used to link observed point patterns back to the process that generated the pattern.
More specifically, the challenge that we face is as follows. Given an observed point pattern, we wish to make inferences about the process that generated the observed pattern.
The general approach that is used is to construct measures that characterise the observed point pattern, and then compare these against the proporties of the theoretical process models we explored previously.
For example, if we assume that the underlying process is CSR, we know what kinds of properties the empirical patterns from such a process should exhibit. The critical thing to keep in mind is that we never actually see the underlying process - we only see outcomes of the process (i.e., the pattern).
This raises a number of challenges that we will need to address later on, but for now we are going to build up an inituition of the general strategy for analyzing point patterns.
To begin we are going to create two different point patterns, one from a CSR process and one from a clustered process. We will use these two patterns to introduce the different statistical methods used to analyze the patterns. Here we are in the rare circumstance in which we actually know what process generated the pattern.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
np.random.seed(12345)
n = 60
xy = np.random.rand(60,2)
df = pd.DataFrame(data=xy, columns=['x', 'y'])
sns.scatterplot(x='x', y='y', data=df);
import pointpats as pp<string>:1: UserWarning:
Shapely 2.0 is installed, but because PyGEOS is also installed, GeoPandas will still use PyGEOS by default for now. To force to use and test Shapely 2.0, you have to set the environment variable USE_PYGEOS=0. You can do this before starting the Python process, or in your code before importing geopandas:
import os
os.environ['USE_PYGEOS'] = '0'
import geopandas
In a future release, GeoPandas will switch to using Shapely by default. If you are using PyGEOS directly (calling PyGEOS functions on geometries from GeoPandas), this will then stop working and you are encouraged to migrate from PyGEOS to Shapely 2.0 (https://shapely.readthedocs.io/en/latest/migration_pygeos.html).
csr = pp.PointPattern(xy)/home/serge/miniconda3/envs/classes/lib/python3.10/site-packages/libpysal/cg/shapes.py:1492: FutureWarning:
Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.
/home/serge/miniconda3/envs/classes/lib/python3.10/site-packages/libpysal/cg/shapes.py:1208: FutureWarning:
Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.
csr.summary()Point Pattern
60 points
Bounding rectangle [(0.00838829794155349,0.024676210429265266), (0.9940145858999619,0.9613067360728214)]
Area of window: 0.9231676681785911
Intensity estimate for window: 64.99361066054225
x y
0 0.929616 0.316376
1 0.183919 0.204560
2 0.567725 0.595545
3 0.964515 0.653177
4 0.748907 0.653570
w = pp.Window([(0,0), (0,1), (1,1), (1,0), (0,0)])
draw = pp.PoissonClusterPointProcess(w, n, 2, 0.05, 1, asPP=True, conditioning=False)
draw.realizations[0].plot(window=True, title='Contagion Point Process (2 parents)')/home/serge/miniconda3/envs/classes/lib/python3.10/site-packages/libpysal/cg/shapes.py:1492: FutureWarning:
Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.
/home/serge/miniconda3/envs/classes/lib/python3.10/site-packages/libpysal/cg/shapes.py:1208: FutureWarning:
Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.
/home/serge/miniconda3/envs/classes/lib/python3.10/site-packages/libpysal/cg/shapes.py:1923: FutureWarning:
Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.
/home/serge/miniconda3/envs/classes/lib/python3.10/site-packages/libpysal/cg/shapes.py:103: FutureWarning:
Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.

clustered = draw.realizations[0]clustered.summary()Point Pattern
60 points
Bounding rectangle [(0.47331760265312733,0.023178703349462502), (0.9696584457277277,0.6150208352748628)]
Area of window: 1.0
Intensity estimate for window: 60.0
x y
0 0.513060 0.541971
1 0.473318 0.578385
2 0.508373 0.536200
3 0.881716 0.060328
4 0.894221 0.059273
import pointpats.quadrat_statistics as qscsr_qr = qs.QStatistic(csr, shape='rectangle', nx=3, ny=3)
csr_qr.plot()
csr_qr.chi210.8
csr_qr.chi2_pvalue0.21329101843394052
clustered_qr = qs.QStatistic(clustered, shape='rectangle', nx=3, ny=3)
clustered_qr.plot()
clustered_qr.chi2209.99999999999994
clustered_qr.chi2_pvalue4.976940117448032e-41
plt.scatter(csr.points.x, csr.points.y);
import networkx as nxG = nx.DiGraph()
for idx, point in enumerate(csr.points.values):
G.add_node(idx, pos=point)
pos = nx.get_node_attributes(G, 'pos')
nx.draw(G, pos, node_size=2)
nidx, nnd = csr.knn(1) # here we have the indices of the nearest neighbors (nidx) and the distances (nnd)for idx, neighbor in enumerate(nidx):
edge = (idx, neighbor[0])
G.add_edges_from([edge])
pos = nx.get_node_attributes(G, 'pos')
nx.draw(G, pos, node_size=2)
Here we draw an arrow towards the nearest neighbor for a given observation.
In some cases, an observation is also the nearest neighbor to its nearest neighbor, or so-called “mutual nearest neighbors”. These pairs would appear at the the end of a segment with arrows on both ends.
If we look in the extreme northwest three points, we see one pair of mutual nearest neighbors, while the third point is not a mutual nearest neighbor.
It is also possible for a point to be a nearest neighbor to more than a single point, as is also seen in this case.
The nearest neighbor distances are the lengths of these segments.
Our first distance based statistic was suggested by Clark and Evans (1954) as the average nearest neighbor distances:
\[\bar{d}_{min} = \frac{1}{n} \sum_{i} d_{i, min} \]
where \(d_{i, min}\) is the nearest neighbor distance for observation \(i\), and \(n\) is the number of observations.
Under a CSR process, the expected value of this statistic is:
\[E[\bar{d}_{min}] = \frac{1}{2 \sqrt{\lambda}}\]
The logic of the statistic is to compare the observed mean nearest neighbor distance to this expectation forming their ratio:
\[R = \frac{\bar{d}_{min}}{\frac{1}{2 \sqrt{\lambda}}} = 2 \bar{d}_{min} \sqrt{\lambda}\]
Values of \(R<1\) are indicative of a tendancy towards clustering since the observed nearest neighbor distances are smaller than expected under CSR.
Values of \(R>1\) are indicative of a uniform or dispersed pattern.
nnd.mean() # the mean nearest neighbor distance0.07360281110243255
csr.lambda_window # the intensity using the window for the point pattern64.99361066054225
dmin = nnd.mean()
lam = csr.lambda_window
R = 2 * dmin * lam**(1/2)
R1.1867513365512292
Let’s compare this to the same statistic based on the mean nearest neighbor distance for the clustered pattern:
nidx, nnd = clustered.knn(1) # here we have the indices of the nearest neighbors (nidx) and the distances (nnd)dmin = nnd.mean()
lam = clustered.lambda_window
R = 2 * dmin * lam**(1/2)
R0.13087134840769868
So we see that the \(R\) value for the clustered pattern is much below 1, while the R value for the CSR pattern is slightly over 1.
What we would like to know is if these values are significantly different from what we would expect if the underlying process that generated the patterns was CSR?
One approach is to use theoretical results on the distribution for the \(R\) statistic from Petrere (1985). The expected value of \(R\) is \(E[R]=1\). The variance of the \(R\) statistic is: \[ \sigma^2_R = \frac{0.2732}{n}\]
import scipy.stats
def R_test(pattern):
nidx, nnd = pattern.knn(1) # here we have the indices of the nearest neighbors (nidx) and the distances (nnd)
lam = pattern.lambda_window
R = 2 * nnd.mean() * lam**(1/2)
n = nnd.shape[0]
var = 0.2732 / n
se = var**(1/2)
stat = (R - 1 )/ se
p_value = scipy.stats.norm.sf(abs(stat)) * 2
return R, stat, p_valueR_test(csr)(1.1867513365512292, 2.7675724348891184, 0.005647549379017388)
R_test(clustered)(0.13087134840769868, -12.880103258909552, 5.825756575218341e-38)
import pointpats
import numpy
samples = pointpats.PoissonPointProcess(csr.window, n, 99, asPP=True)
r_tests = np.array([R_test(samples.realizations[k]) for k in samples.realizations])
r_tests/home/serge/miniconda3/envs/classes/lib/python3.10/site-packages/libpysal/cg/shapes.py:1923: FutureWarning:
Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.
/home/serge/miniconda3/envs/classes/lib/python3.10/site-packages/libpysal/cg/shapes.py:103: FutureWarning:
Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.
array([[ 9.81105197e-01, -2.80012645e-01, 7.79467803e-01],
[ 9.28415089e-01, -1.06085681e+00, 2.88754981e-01],
[ 1.05435206e+00, 8.05473519e-01, 4.20546481e-01],
[ 1.11821360e+00, 1.75187335e+00, 7.97955886e-02],
[ 1.12387424e+00, 1.83576157e+00, 6.63929272e-02],
[ 1.16724422e+00, 2.47848561e+00, 1.31941433e-02],
[ 1.10700813e+00, 1.58581325e+00, 1.12781678e-01],
[ 1.09108797e+00, 1.34988348e+00, 1.77053361e-01],
[ 9.69002352e-01, -4.59371472e-01, 6.45967431e-01],
[ 1.03760497e+00, 5.57289124e-01, 5.77329905e-01],
[ 1.02704882e+00, 4.00851546e-01, 6.88529426e-01],
[ 1.10508460e+00, 1.55730742e+00, 1.19397514e-01],
[ 1.15793416e+00, 2.34051463e+00, 1.92571837e-02],
[ 1.04250723e+00, 6.29938355e-01, 5.28734918e-01],
[ 1.10758728e+00, 1.59439606e+00, 1.10847354e-01],
[ 1.07177650e+00, 1.06369612e+00, 2.87466383e-01],
[ 1.02551521e+00, 3.78124168e-01, 7.05338356e-01],
[ 9.46840660e-01, -7.87797970e-01, 4.30814888e-01],
[ 1.03757582e+00, 5.56857151e-01, 5.77625033e-01],
[ 9.90592969e-01, -1.39408044e-01, 8.89127716e-01],
[ 1.04856233e+00, 7.19672330e-01, 4.71726767e-01],
[ 1.10811705e+00, 1.60224699e+00, 1.09101003e-01],
[ 9.54688187e-01, -6.71501077e-01, 5.01901374e-01],
[ 1.10995667e+00, 1.62950930e+00, 1.03205247e-01],
[ 1.15162238e+00, 2.24697675e+00, 2.46415129e-02],
[ 9.31182120e-01, -1.01985062e+00, 3.07799310e-01],
[ 9.71814911e-01, -4.17690588e-01, 6.76173355e-01],
[ 1.14149435e+00, 2.09688392e+00, 3.60038522e-02],
[ 1.07165594e+00, 1.06190948e+00, 2.88276781e-01],
[ 1.14437709e+00, 2.13960485e+00, 3.23867145e-02],
[ 9.26536025e-01, -1.08870370e+00, 2.76284570e-01],
[ 1.00659687e+00, 9.77627322e-02, 9.22120701e-01],
[ 1.19240160e+00, 2.85130688e+00, 4.35399261e-03],
[ 1.02219381e+00, 3.28902433e-01, 7.42229435e-01],
[ 1.17405296e+00, 2.57938804e+00, 9.89755372e-03],
[ 1.15649472e+00, 2.31918276e+00, 2.03851289e-02],
[ 9.45617539e-01, -8.05924083e-01, 4.20286624e-01],
[ 1.05893614e+00, 8.73407565e-01, 3.82440969e-01],
[ 1.09368529e+00, 1.38837464e+00, 1.65022993e-01],
[ 1.07631454e+00, 1.13094776e+00, 2.58077078e-01],
[ 9.78530574e-01, -3.18167420e-01, 7.50357945e-01],
[ 1.07064731e+00, 1.04696203e+00, 2.95117090e-01],
[ 1.08860485e+00, 1.31308486e+00, 1.89154355e-01],
[ 1.19551229e+00, 2.89740597e+00, 3.76262510e-03],
[ 1.07442246e+00, 1.10290799e+00, 2.70067126e-01],
[ 1.09651586e+00, 1.43032247e+00, 1.52624489e-01],
[ 1.10573862e+00, 1.56699976e+00, 1.17114749e-01],
[ 9.18296882e-01, -1.21080418e+00, 2.25970464e-01],
[ 9.75959250e-01, -3.56273307e-01, 7.21635897e-01],
[ 1.12270070e+00, 1.81837021e+00, 6.90075678e-02],
[ 1.16124927e+00, 2.38964311e+00, 1.68647518e-02],
[ 1.14288622e+00, 2.11751071e+00, 3.42165274e-02],
[ 8.95782250e-01, -1.54446109e+00, 1.22476670e-01],
[ 1.05289923e+00, 7.83943286e-01, 4.33073389e-01],
[ 1.21757563e+00, 3.22437488e+00, 1.26248010e-03],
[ 1.03138599e+00, 4.65126525e-01, 6.41840852e-01],
[ 1.07305973e+00, 1.08271294e+00, 2.78935859e-01],
[ 1.11578890e+00, 1.71594046e+00, 8.61729410e-02],
[ 9.58111837e-01, -6.20764093e-01, 5.34754852e-01],
[ 9.37476396e-01, -9.26572231e-01, 3.54148678e-01],
[ 9.63173031e-01, -5.45759439e-01, 5.85231308e-01],
[ 1.03085430e+00, 4.57247129e-01, 6.47493427e-01],
[ 1.12588430e+00, 1.86554984e+00, 6.21043732e-02],
[ 1.04886527e+00, 7.24161732e-01, 4.68966450e-01],
[ 1.11417108e+00, 1.69196511e+00, 9.06526264e-02],
[ 9.09443001e-01, -1.34201478e+00, 1.79591204e-01],
[ 1.11892848e+00, 1.76246763e+00, 7.79903206e-02],
[ 1.14728332e+00, 2.18267389e+00, 2.90598342e-02],
[ 9.66649872e-01, -4.94234183e-01, 6.21140802e-01],
[ 1.01975820e+00, 2.92807770e-01, 7.69669089e-01],
[ 1.06807716e+00, 1.00887343e+00, 3.13035340e-01],
[ 8.87332323e-01, -1.66968527e+00, 9.49816494e-02],
[ 1.06116818e+00, 9.06485480e-01, 3.64678947e-01],
[ 9.73221131e-01, -3.96851022e-01, 6.91477323e-01],
[ 1.10831264e+00, 1.60514552e+00, 1.08461783e-01],
[ 1.15121609e+00, 2.24095569e+00, 2.50289451e-02],
[ 1.06442269e+00, 9.54715861e-01, 3.39721407e-01],
[ 1.09045697e+00, 1.34053244e+00, 1.80072304e-01],
[ 1.21321146e+00, 3.15969983e+00, 1.57931759e-03],
[ 1.03753327e+00, 5.56226553e-01, 5.78055989e-01],
[ 9.92381824e-01, -1.12898015e-01, 9.10111410e-01],
[ 9.89597771e-01, -1.54156441e-01, 8.77486386e-01],
[ 8.69471915e-01, -1.93436865e+00, 5.30678183e-02],
[ 1.07456759e+00, 1.10505883e+00, 2.69134096e-01],
[ 1.01501782e+00, 2.22557427e-01, 8.23879974e-01],
[ 1.06339883e+00, 9.39542766e-01, 3.47452146e-01],
[ 1.04251892e+00, 6.30111598e-01, 5.28621572e-01],
[ 1.01595310e+00, 2.36417893e-01, 8.13108413e-01],
[ 9.76030325e-01, -3.55220016e-01, 7.22424770e-01],
[ 1.02700974e+00, 4.00272457e-01, 6.88955852e-01],
[ 1.05167838e+00, 7.65850808e-01, 4.43765079e-01],
[ 1.03268084e+00, 4.84315702e-01, 6.28161834e-01],
[ 1.07237329e+00, 1.07254022e+00, 2.83477458e-01],
[ 1.17864464e+00, 2.64743472e+00, 8.11050176e-03],
[ 1.00495856e+00, 7.34836527e-02, 9.41421252e-01],
[ 9.57599581e-01, -6.28355512e-01, 5.29771074e-01],
[ 1.01309195e+00, 1.94016950e-01, 8.46162610e-01],
[ 1.06834012e+00, 1.01277043e+00, 3.11169828e-01],
[ 1.04123161e+00, 6.11034274e-01, 5.41176890e-01]])
R_csr = R_test(csr)R_csr[0]1.1867513365512292
(r_tests[:,0] >= R_csr[0]).sum()4
R_clustered = R_test(clustered)import pandas
import seaborn as snsdf = pandas.DataFrame(data=r_tests, columns=['R', 'z', 'p'])
sns.displot(df, kind='kde', x="R")
plt.axvline(R_csr[0], 0, 0.1, color='g');
plt.axvline(R_clustered[0], 0, 0.1, color='r');
df.describe()| R | z | p | |
|---|---|---|---|
| count | 99.000000 | 99.000000 | 99.000000 |
| mean | 1.052286 | 0.774857 | 0.357230 |
| std | 0.078631 | 1.165280 | 0.282182 |
| min | 0.869472 | -1.934369 | 0.001262 |
| 25% | 0.991487 | -0.126153 | 0.099093 |
| 50% | 1.054352 | 0.805474 | 0.295117 |
| 75% | 1.108215 | 1.603696 | 0.581644 |
| max | 1.217576 | 3.224375 | 0.941421 |
samples = pointpats.PoissonPointProcess(csr.window, n, 999, asPP=True)
r_tests = np.array([R_test(samples.realizations[k]) for k in samples.realizations])
r_tests/home/serge/miniconda3/envs/classes/lib/python3.10/site-packages/libpysal/cg/shapes.py:103: FutureWarning:
Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal.
array([[ 1.04108274, 0.60882811, 0.54263838],
[ 1.08726785, 1.2932711 , 0.19591731],
[ 0.96240692, -0.5571128 , 0.57745036],
...,
[ 1.18467435, 2.73679237, 0.00620414],
[ 1.05998867, 0.88900566, 0.37400004],
[ 0.92024492, -1.18193507, 0.23723146]])
df = pandas.DataFrame(data=r_tests, columns=['R', 'z', 'p'])
sns.displot(df, kind='kde', x="R")
plt.axvline(R_csr[0], 0, 0.1, color='g');
plt.axvline(R_clustered[0], 0, 0.1, color='r');
df.describe()| R | z | p | |
|---|---|---|---|
| count | 999.000000 | 999.000000 | 999.000000 |
| mean | 1.057947 | 0.858756 | 0.357294 |
| std | 0.077832 | 1.153438 | 0.296165 |
| min | 0.805327 | -2.884969 | 0.000001 |
| 25% | 1.004814 | 0.071334 | 0.092286 |
| 50% | 1.061867 | 0.916842 | 0.283196 |
| 75% | 1.111235 | 1.648461 | 0.577714 |
| max | 1.328334 | 4.865765 | 0.998563 |