1/31/23
facts and statistics collected together for reference or analysis
the quantities, characters, or symbols on which operations are performed by a computer, being stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.
things known or assumed as facts, making the basis of reasoning or calculate
Source: Oxford languages
A data set is a collection of observations recorded for individual units on a set of variables.
Variables are sometimes referred to as attributes or features (in machine learning parlance).
Scale | Operations | Example |
---|---|---|
nominal | mode, frequencies | Zip Code |
ordinal | A > B | Ranks, Primary, Intermediate |
interval | + - | Time |
ratio | + - * / | Weight, Kelvin |
Spatial data comes in many varieties and it is not easy to arrive at a system of classification that is simultaneously exclusive, exhaustive, imaginative, and satisfying.
– G. Upton & B. Fingleton
where matters
attribute data together with location
Geocode Input
Geocode Output
Map of Geocode Output
Errors in Geocode Output
“everything depends on everything else, but closer things more so”
http://en.wikipedia.org/wiki/Modifiable_areal_unit_problem
a mathemtical construct to capture randomness of values distributed over space
\[\{Z(s):s \in D \} \]
Quadrat Analysis
Distance Distributions
Spatial Domain: \(D\)
Discrete and fixed
Locations nonrandom
Locations countable
Examples of lattice data
Attributes collected by ZIP code
census tract
Site
Each location is now an area or site
One observation on \(Z\) for each site
Need a spatial index: \(Z(s_i)\)
\(Z(s_i)\)
\(s_i\) is a representative location within the site
e.g., centroid, largest city
Allows for measuring distances between sites
1969
Spatial Domain: \(D\)
A continuous and fixed set.
Meaning \(Z(s)\) can be observed everywhere within \(D\).
Between any two sample locations \(s_i\) and \(s_j\) you can theoretically place an infinite number of other samples.
By fixed: the points in \(D\) are non-stochastic
Continuous Variation
Because of the continuity of \(D\)
Geostatistical data is referred to as “spatial data with continuous variation.”
Continuity is associated with \(D\).
Attribute \(Z\) may, or may not, be continuous.
Sites
Tessellation
Interpolation
Kriging
A network is a system of linear features connected at intersections and interchanges.
These intersections and interchanges are called nodes.
The linear feature connecting any given pair of nodes is called an arc.
Formally, a network is defined as a directed graph \(G = (N, A)\) consisting of an indexed set of nodes \(N\) with \(n = |N|\) and a spanning set of directed arcs \(A\) with \(m = |A|\), where \(n\) is the number of nodes and \(m\) is the number of arcs.
Each arc on a network is represented as an ordered pair of nodes, in the form from node \(i\) to node \(j\), denoted by \((i, j)\).
Point Pattern Basics