momepy.describe_agg#

momepy.describe_agg(y, aggregation_key, q=None, statistics=None)[source]#

Describe the distribution of values within the groups of an aggregation.

The desired statistics to compute can be passed to statistics. By default the statistics calculated are count, sum, mean, median, std, nunique, mode.

Adapted from [Hermosilla et al., 2012] and [Feliciotti, 2018].

Parameters:
ySeries | numpy.array

A Series or numpy.array containing values to analyse.

aggregation_keySeries | numpy.array

The unique ID that specifies the aggregation of y objects to groups.

qtuple[float, float] | None, optional

Tuple of percentages for the percentiles to compute. Values must be between 0 and 100 inclusive. When set, values below and above the percentiles will be discarded before computation of the average. The percentiles are computed for each neighborhood. By default None.

statisticslist[str]

A list of stats functions to pass to groupby.agg.

Returns:
DataFrame

Notes

The numba package is used extensively in this function to accelerate the computation of statistics. Without numba, these computations may become slow on large data.

Examples

>>> path = momepy.datasets.get_path("bubenec")
>>> buildings = geopandas.read_file(path, layer="buildings")
>>> streets = geopandas.read_file(path, layer="streets")
>>> buildings["street_index"] = momepy.get_nearest_street(buildings, streets)
>>> buildings.head()
   uID                                           geometry  street_index
0    1  POLYGON ((1603599.221 6464369.816, 1603602.984...           0.0
1    2  POLYGON ((1603042.88 6464261.498, 1603038.961 ...          33.0
2    3  POLYGON ((1603044.65 6464178.035, 1603049.192 ...          10.0
3    4  POLYGON ((1603036.557 6464141.467, 1603036.969...           8.0
4    5  POLYGON ((1603082.387 6464142.022, 1603081.574...           8.0
>>> momepy.describe_agg(buildings.area, buildings["street_index"]).head()   
              count         mean       median         std         min          max          sum  nunique        mode
street_index
0.0             9.0   366.827019   339.636871  266.747247   68.336193   800.045495  3301.443174      9.0   68.336193
1.0             1.0   618.447036   618.447036         NaN  618.447036   618.447036   618.447036      1.0  618.447036
2.0            12.0   504.523575   535.973108  318.660691   92.280807  1057.998520  6054.282903     12.0   92.280807
5.0             5.0  1150.865099  1032.693716  580.660030  673.015192  2127.752228  5754.325496      5.0  673.015192
6.0             7.0   662.179187   662.192603  291.397747  184.798661  1188.294675  4635.254306      7.0  184.798661

The result can be directly assigned a columns of the streets GeoDataFrame.

To eliminate the effect of outliers, you can take into account only values within a specified percentile range (q). At the same time, you can specify only a subset of statistics to compute:

>>> momepy.describe_agg(
...     buildings.area,
...     buildings["street_index"],
...     q=(10, 90),
...     statistics=["mean", "std"],
... ).head()
                    mean         std
street_index
0.0           347.580212  219.797123
1.0           618.447036         NaN
2.0           476.592190  206.011102
5.0           984.519359  203.718644
6.0           652.432194   32.829824