momepy.describe_reached_agg#

momepy.describe_reached_agg(y, graph_index, graph, q=None, statistics=None)[source]#

Describe the distribution of values reached on a neighbourhood graph.

Given a neighborhood graph or a grouping, computes the descriptive statistics of values reached. Optionally, the values can be limited to a certain quantile range before computing the statistics.

The statistics calculated are count, sum, mean, median, std, nunique, mode. The desired statistics to compute can be passed to statistics

The neighbourhood is defined in graph. If graph is None, the function will assume topological distance 0 (element itself) and result_index is required in order to arrange the results. If graph, the results are arranged according to the spatial weights ordering.

Adapted from [Hermosilla et al., 2012] and [Feliciotti, 2018].

Parameters:
ySeries | numpy.array

A Series or numpy.array containing values to analyse.

graph_indexSeries | numpy.array

The unique ID that specifies the aggregation of y objects to graph groups.

graphlibpysal.graph.Graph (default None)

A spatial weights matrix of the element y is grouped into.

qtuple[float, float] | None, optional

Tuple of percentages for the percentiles to compute. Values must be between 0 and 100 inclusive. When set, values below and above the percentiles will be discarded before computation of the average. The percentiles are computed for each neighborhood. By default None.

statisticslist[str]

A list of stats functions to pass to groupby.agg.

Returns
——-
DataFrame

Notes

The numba package is used extensively in this function to accelerate the computation of statistics. Without numba, these computations may become slow on large data.

Examples

>>> from libpysal import graph
>>> path = momepy.datasets.get_path("bubenec")
>>> buildings = geopandas.read_file(path, layer="buildings")
>>> streets = geopandas.read_file(path, layer="streets")
>>> buildings["street_index"] = momepy.get_nearest_street(buildings, streets)
>>> buildings.head()
   uID                                           geometry  street_index
0    1  POLYGON ((1603599.221 6464369.816, 1603602.984...           0.0
1    2  POLYGON ((1603042.88 6464261.498, 1603038.961 ...          33.0
2    3  POLYGON ((1603044.65 6464178.035, 1603049.192 ...          10.0
3    4  POLYGON ((1603036.557 6464141.467, 1603036.969...           8.0
4    5  POLYGON ((1603082.387 6464142.022, 1603081.574...           8.0
>>> queen_contig = graph.Graph.build_contiguity(streets, rook=False)
>>> queen_contig
<Graph of 35 nodes and 148 nonzero edges indexed by
 [0, 1, 2, 3, 4, ...]>
>>> momepy.describe_reached_agg(
...     buildings.area,
...     buildings["street_index"],
...     queen_contig,
... ).head()  
   count        mean      median         std         min          max           sum  nunique        mode
0   43.0  643.595418  633.692589  412.563790   53.851509  2127.752228  27674.602973     43.0   53.851509
1   41.0  735.058515  662.921280  381.827737   51.246377  2127.752228  30137.399128     41.0   51.246377
2   50.0  636.304006  625.190488  450.182157   53.851509  2127.752228  31815.200298     50.0   53.851509
3    6.0  405.782514  370.352071  334.848563   57.138700   863.828420   2434.695086      6.0   57.138700
4    1.0  683.514930  683.514930         NaN  683.514930   683.514930    683.514930      1.0  683.514930

The result can be directly assigned a columns of the streets GeoDataFrame.

To eliminate the effect of outliers, you can take into account only values within a specified percentile range (q). At the same time, you can specify only a subset of statistics to compute:

>>> momepy.describe_reached_agg(
...     buildings.area,
...     buildings["street_index"],
...     queen_contig,
...     q=(10, 90),
...     statistics=["mean", "std"],
... ).head()
        mean         std
0  619.104840  250.369496
1  721.441808  216.516469
2  597.379925  297.213321
3  378.431992  274.631290
4  683.514930         NaN