momepy.describe_agg#
- momepy.describe_agg(y, aggregation_key, q=None, statistics=None)[source]#
Describe the distribution of values within the groups of an aggregation.
The desired statistics to compute can be passed to
statistics
. By default the statistics calculated are count, sum, mean, median, std, nunique, mode.Adapted from [Hermosilla et al., 2012] and [Feliciotti, 2018].
- Parameters:
- ySeries | numpy.array
A Series or numpy.array containing values to analyse.
- aggregation_keySeries | numpy.array
The unique ID that specifies the aggregation of
y
objects to groups.- qtuple[float, float] | None, optional
Tuple of percentages for the percentiles to compute. Values must be between 0 and 100 inclusive. When set, values below and above the percentiles will be discarded before computation of the average. The percentiles are computed for each neighborhood. By default None.
- statisticslist[str]
A list of stats functions to pass to groupby.agg.
- Returns:
- DataFrame
Notes
The numba package is used extensively in this function to accelerate the computation of statistics. Without numba, these computations may become slow on large data.
Examples
>>> path = momepy.datasets.get_path("bubenec") >>> buildings = geopandas.read_file(path, layer="buildings") >>> streets = geopandas.read_file(path, layer="streets") >>> buildings["street_index"] = momepy.get_nearest_street(buildings, streets) >>> buildings.head() uID geometry street_index 0 1 POLYGON ((1603599.221 6464369.816, 1603602.984... 0.0 1 2 POLYGON ((1603042.88 6464261.498, 1603038.961 ... 33.0 2 3 POLYGON ((1603044.65 6464178.035, 1603049.192 ... 10.0 3 4 POLYGON ((1603036.557 6464141.467, 1603036.969... 8.0 4 5 POLYGON ((1603082.387 6464142.022, 1603081.574... 8.0
>>> momepy.describe_agg(buildings.area, buildings["street_index"]).head() count mean median std min max sum nunique mode street_index 0.0 9.0 366.827019 339.636871 266.747247 68.336193 800.045495 3301.443174 9.0 68.336193 1.0 1.0 618.447036 618.447036 NaN 618.447036 618.447036 618.447036 1.0 618.447036 2.0 12.0 504.523575 535.973108 318.660691 92.280807 1057.998520 6054.282903 12.0 92.280807 5.0 5.0 1150.865099 1032.693716 580.660030 673.015192 2127.752228 5754.325496 5.0 673.015192 6.0 7.0 662.179187 662.192603 291.397747 184.798661 1188.294675 4635.254306 7.0 184.798661
The result can be directly assigned a columns of the
streets
GeoDataFrame.To eliminate the effect of outliers, you can take into account only values within a specified percentile range (
q
). At the same time, you can specify only a subset of statistics to compute:>>> momepy.describe_agg( ... buildings.area, ... buildings["street_index"], ... q=(10, 90), ... statistics=["mean", "std"], ... ).head() mean std street_index 0.0 347.580212 219.797123 1.0 618.447036 NaN 2.0 476.592190 206.011102 5.0 984.519359 203.718644 6.0 652.432194 32.829824