Note

This page was generated from examples/clustering.ipynb.
Interactive online version: Binder badge

Simplified detection of urban types#

Example adapted from the SDSC 2021 Workshop led by Martin Fleischmann. You can see the recording of the workshop on YouTube.

This example illustrates the potential of morphometrics captured by momepy in capturing the structure of cities. We will pick a town, fetch its data from the OpenStreetMap, and analyse it to detect individual types of urban structure within it.

This method is only illustrative and is based on the more extensive one published by Fleischmann et al. (2021) available from martinfleis/numerical-taxonomy-paper.

Fleischmann M, Feliciotti A, Romice O and Porta S (2021) Methodological Foundation of a Numerical Taxonomy of Urban Form. Environment and Planning B: Urban Analytics and City Science, doi: 10.1177/23998083211059835

It depends on the following packages:

- momepy
- osmnx
- clustergram
- bokeh
- scikit-learn
- geopy
- ipywidgets
[1]:
import geopandas
import libpysal
import matplotlib.pyplot as plt
import momepy
import osmnx
import pandas
from bokeh.io import output_notebook
from bokeh.plotting import show
from clustergram import Clustergram

output_notebook()
Loading BokehJS ...

Pick a place, ideally a town with a good coverage in OpenStreetMap and its local CRS.

[2]:
place = "Znojmo, Czechia"
local_crs = 5514

We can interactively explore the place we just selected.

[3]:
geopandas.tools.geocode(place).explore()
[3]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Input data#

We can use OSMnx to quickly download data from OpenStreetMap. If you intend to download larger areas, we recommend using pyrosm instead.

Buildings#

[4]:
buildings = osmnx.features_from_place(place, tags={"building": True})
buildings.head()
/Users/martin/miniforge3/envs/momepy/lib/python3.11/site-packages/osmnx/features.py:294: FutureWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
  polygon = gdf_place["geometry"].unary_union
[4]:
power geometry amenity brand brand:wikidata brand:wikipedia check_date name operator operator:wikidata ... material name:signed ref monitoring:water_level takeaway shelter_type construction ways type emergency
element_type osmid
node 3372076291 NaN POINT (16.05376 48.84683) NaN NaN NaN NaN NaN 7/I/10/A-120 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3372076393 NaN POINT (16.05581 48.84158) NaN NaN NaN NaN NaN 7/I/11/A-140 Z NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3372076394 NaN POINT (16.05867 48.83522) NaN NaN NaN NaN NaN 7/I/12/A-220 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3372076428 NaN POINT (16.03949 48.85599) NaN NaN NaN NaN NaN 7/I/8/E NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3372076429 NaN POINT (16.04133 48.85501) NaN NaN NaN NaN NaN 7/I/9/E NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 119 columns

The OSM input may need a bit of cleaning to ensure only proper polygons are kept.

[5]:
buildings.geom_type.value_counts()
[5]:
Polygon    12214
Point          7
Name: count, dtype: int64
[6]:
buildings = buildings[buildings.geom_type == "Polygon"].reset_index(drop=True)

And we should re-project the data from WGS84 to the local projection in meters (momepy default values assume meters not feet or degrees). We will also drop unnecessary columns.

[7]:
buildings = buildings[["geometry"]].to_crs(local_crs)
buildings.head()
[7]:
geometry
0 POLYGON ((-643743.474 -1193358.749, -643743.30...
1 POLYGON ((-643751.446 -1193530.633, -643749.37...
2 POLYGON ((-643281.601 -1193130.831, -643283.76...
3 POLYGON ((-643381.904 -1193174.697, -643388.48...
4 POLYGON ((-643370.45 -1193130.215, -643398.26 ...

Streets#

Similar operations are done with streets.

[8]:
osm_graph = osmnx.graph_from_place(place, network_type="drive")
osm_graph = osmnx.projection.project_graph(osm_graph, to_crs=local_crs)
streets = osmnx.graph_to_gdfs(
    osmnx.convert.to_undirected(osm_graph),
    nodes=False,
    edges=True,
    node_geometry=False,
    fill_edge_geometry=True,
).reset_index(drop=True)
streets.head()
/Users/martin/miniforge3/envs/momepy/lib/python3.11/site-packages/osmnx/graph.py:392: FutureWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
  polygon = gdf_place["geometry"].unary_union
[8]:
osmid ref name highway maxspeed oneway reversed length from to geometry lanes bridge junction width tunnel access
0 33733060 361 Přímětická secondary 50 False True 24.574 639231391 74103628 LINESTRING (-643229.639 -1192872.949, -643239.... NaN NaN NaN NaN NaN NaN
1 33733060 361 Přímětická secondary 50 False False 60.354 3775990798 74103628 LINESTRING (-643236.395 -1192790.304, -643237.... NaN NaN NaN NaN NaN NaN
2 50313252 NaN Raisova residential NaN True False 74.763 639231413 74103628 LINESTRING (-643291.344 -1192797.012, -643288.... NaN NaN NaN NaN NaN NaN
3 33733060 361 Přímětická secondary 50 False True 54.260 74142638 639231391 LINESTRING (-643205.434 -1192921.533, -643219.... NaN NaN NaN NaN NaN NaN
4 50313241 NaN Mičurinova residential NaN True False 101.376 639231391 639231314 LINESTRING (-643229.639 -1192872.949, -643233.... NaN NaN NaN NaN NaN NaN

We can also do some preprocessing using momepy to ensure we have proper network topology.

[9]:
streets = momepy.remove_false_nodes(streets)
streets = streets[["geometry"]]
[10]:
streets.head()
[10]:
geometry
0 LINESTRING (-643229.639 -1192872.949, -643239....
1 LINESTRING (-643236.395 -1192790.304, -643237....
2 LINESTRING (-643291.344 -1192797.012, -643288....
3 LINESTRING (-643205.434 -1192921.533, -643219....
4 LINESTRING (-643229.639 -1192872.949, -643233....

Generated data#

Tessellation#

Given building footprints:

blg

We can generate a spatial unit using morphological tessellation:

tess

[11]:
limit = momepy.buffered_limit(buildings, "adaptive")

tessellation = momepy.morphological_tessellation(buildings, clip=limit)
/Users/martin/miniforge3/envs/momepy/lib/python3.11/site-packages/pandas/core/arraylike.py:492: RuntimeWarning: invalid value encountered in create_collection
  return getattr(ufunc, method)(*new_inputs, **kwargs)

OpenStreetMap data are often problematic due to low quality of some polygons. If some collapse, we get a mismatch between the length of buildings and the length of polygons.

[12]:
collapsed, _ = momepy.verify_tessellation(tessellation, buildings)
/var/folders/2f/fhks6w_d0k556plcv3rfmshw0000gn/T/ipykernel_96250/3509021287.py:1: UserWarning: Tessellation does not fully match buildings. 20 element(s) disappeared during generation. Index of the affected elements: Index([ 3958,  3968,  4177,  4185,  4188,  4215,  4218,  4222,  4223,  4227,
        8534,  9026,  9061, 10465, 10686, 10687, 11255, 11473, 11474, 11478],
      dtype='int64').
  collapsed, _ = momepy.verify_tessellation(tessellation, buildings)
/var/folders/2f/fhks6w_d0k556plcv3rfmshw0000gn/T/ipykernel_96250/3509021287.py:1: UserWarning: Tessellation contains MultiPolygon elements. Initial objects should  be edited. Index of affected elements: [1, 209, 521, 524, 525, 636, 709, 762, 787, 1635, 1640, 1641, 1645, 1647, 1653, 1662, 1697, 1706, 2337, 2882, 2914, 2929, 3173, 3782, 4172, 4404, 4564, 4673, 4685, 4731, 4939, 5152, 5184, 5315, 5490, 5696, 5795, 6141, 6376, 6483, 6577, 6583, 6971, 7171, 7184, 7232, 7377, 7378, 7379, 7380, 7381, 7383, 7384, 7385, 7386, 7388, 7390, 7391, 7394, 7395, 7396, 7397, 7479, 7481, 7482, 7487, 7729, 7755, 7766, 7792, 7958, 8420, 8428, 8439, 8441, 8765, 8945, 8957, 9297, 9457, 9470, 9542, 9609, 9800, 9851, 9988, 10056, 10083, 10210, 10318, 10504, 10657, 10997, 11072, 11266, 11729, 11785, 11878].
  collapsed, _ = momepy.verify_tessellation(tessellation, buildings)

Better to drop affected buildings and re-create tessellation.

[13]:
buildings = buildings.drop(collapsed)
limit = momepy.buffered_limit(buildings, "adaptive")
tessellation = momepy.morphological_tessellation(buildings, clip=limit)

Check the result.

[14]:
tessellation.shape[0] == buildings.shape[0]
[14]:
True

Measure#

Measure individual morphometric characters. For details see the User Guide and the API reference.

Dimensions#

[17]:
buildings["building_area"] = buildings.area
tessellation["tess_area"] = tessellation.area
streets["length"] = streets.length

Shape#

[18]:
buildings["eri"] = momepy.equivalent_rectangular_index(buildings)
buildings["elongation"] = momepy.elongation(buildings)
tessellation["convexity"] = momepy.convexity(tessellation)
streets["linearity"] = momepy.linearity(streets)
[19]:
fig, ax = plt.subplots(1, 2, figsize=(24, 12))

buildings.plot("eri", ax=ax[0], scheme="natural_breaks", legend=True)
buildings.plot("elongation", ax=ax[1], scheme="natural_breaks", legend=True)

ax[0].set_axis_off()
ax[1].set_axis_off()
../_images/examples_clustering_40_0.png
[20]:
fig, ax = plt.subplots(1, 2, figsize=(24, 12))

tessellation.plot("convexity", ax=ax[0], scheme="natural_breaks", legend=True)
streets.plot("linearity", ax=ax[1], scheme="natural_breaks", legend=True)

ax[0].set_axis_off()
ax[1].set_axis_off()
../_images/examples_clustering_41_0.png

Spatial distribution#

[21]:
buildings["shared_walls"] = momepy.shared_walls(buildings) / buildings.length
buildings.plot(
    "shared_walls", figsize=(12, 12), scheme="natural_breaks", legend=True
).set_axis_off()
/Users/martin/miniforge3/envs/momepy/lib/python3.11/site-packages/shapely/set_operations.py:131: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
../_images/examples_clustering_43_1.png

Generate spatial graph using libpysal.

[22]:
queen_1 = libpysal.graph.Graph.build_contiguity(tessellation, rook=False)
[23]:
tessellation["neighbors"] = momepy.neighbors(
    tessellation, queen_1, weighted=True
)
tessellation["covered_area"] = queen_1.describe(tessellation.area)["sum"]
buildings["neighbor_distance"] = momepy.neighbor_distance(buildings, queen_1)
[24]:
fig, ax = plt.subplots(1, 2, figsize=(24, 12))

buildings.plot(
    "neighbor_distance", ax=ax[0], scheme="natural_breaks", legend=True
)
tessellation.plot(
    "covered_area", ax=ax[1], scheme="natural_breaks", legend=True
)

ax[0].set_axis_off()
ax[1].set_axis_off()
../_images/examples_clustering_47_0.png
[25]:
queen_3 = queen_1.higher_order(3)
buildings_q1 = libpysal.graph.Graph.build_contiguity(buildings, rook=False)

buildings["interbuilding_distance"] = momepy.mean_interbuilding_distance(
    buildings, queen_1, queen_3
)
buildings["adjacency"] = momepy.building_adjacency(buildings_q1, queen_3)
/Users/martin/Git/momepy/momepy/functional/_distribution.py:224: RuntimeWarning: invalid value encountered in scalar divide
  mean_distances[i] = sub_matrix.sum() / sub_matrix.nnz
[26]:
fig, ax = plt.subplots(1, 2, figsize=(24, 12))

buildings.plot(
    "interbuilding_distance", ax=ax[0], scheme="natural_breaks", legend=True
)
buildings.plot("adjacency", ax=ax[1], scheme="natural_breaks", legend=True)

ax[0].set_axis_off()
ax[1].set_axis_off()
../_images/examples_clustering_49_0.png
[27]:
profile = momepy.street_profile(streets, buildings)
streets[profile.columns] = profile
/Users/martin/miniforge3/envs/momepy/lib/python3.11/site-packages/pandas/core/arraylike.py:492: RuntimeWarning: invalid value encountered in intersection
  return getattr(ufunc, method)(*new_inputs, **kwargs)
[28]:
fig, ax = plt.subplots(1, 3, figsize=(24, 12))

streets.plot("width", ax=ax[0], scheme="natural_breaks", legend=True)
streets.plot("width_deviation", ax=ax[1], scheme="natural_breaks", legend=True)
streets.plot("openness", ax=ax[2], scheme="natural_breaks", legend=True)

ax[0].set_axis_off()
ax[1].set_axis_off()
ax[2].set_axis_off()
../_images/examples_clustering_51_0.png

Intensity#

[29]:
tessellation["car"] = buildings.area / tessellation.area
tessellation.plot(
    "car", figsize=(12, 12), vmin=0, vmax=1, legend=True
).set_axis_off()
../_images/examples_clustering_53_0.png

Connectivity#

[30]:
graph = momepy.gdf_to_nx(streets)
graph = momepy.node_degree(graph)
graph = momepy.closeness_centrality(graph, radius=400, distance="mm_len")
graph = momepy.meshedness(graph, radius=400, distance="mm_len")
nodes, edges = momepy.nx_to_gdf(graph)
[31]:
fig, ax = plt.subplots(1, 3, figsize=(24, 12))

nodes.plot(
    "degree", ax=ax[0], scheme="natural_breaks", legend=True, markersize=1
)
nodes.plot(
    "closeness",
    ax=ax[1],
    scheme="natural_breaks",
    legend=True,
    markersize=1,
    legend_kwds={"fmt": "{:.6f}"},
)
nodes.plot(
    "meshedness", ax=ax[2], scheme="natural_breaks", legend=True, markersize=1
)

ax[0].set_axis_off()
ax[1].set_axis_off()
ax[2].set_axis_off()
/Users/martin/miniforge3/envs/momepy/lib/python3.11/site-packages/mapclassify/classifiers.py:686: UserWarning: Not enough unique values in array to form 5 classes. Setting k to 4.
  self._classify()
../_images/examples_clustering_56_1.png
[32]:
buildings["edge_index"] = momepy.get_nearest_street(buildings, edges)
buildings["node_index"] = momepy.get_nearest_node(
    buildings, nodes, edges, buildings["edge_index"]
)

Link all data together (to tessellation cells or buildings).

[33]:
tessellation.head()
[33]:
geometry street_index tess_area convexity neighbors covered_area car
0 POLYGON ((-643705.747 -1193399.463, -643706.39... 937.0 17802.172927 0.991874 0.007811 14723.097694 0.009702
1 MULTIPOLYGON (((-643656.811 -1193513.936, -643... NaN 10599.038596 0.904145 0.012666 45000.300257 0.002072
2 POLYGON ((-643281.123 -1193135.364, -643281.27... 184.0 109.816857 0.993284 0.044640 12081.015691 0.744270
3 POLYGON ((-643396.488 -1193186.59, -643396.51 ... 603.0 948.152133 0.944649 0.046599 17126.108315 0.271025
4 POLYGON ((-643409.362 -1193171.532, -643409.64... 603.0 1750.723353 0.963757 0.052952 15559.865772 0.341471
[34]:
buildings.head()
[34]:
geometry street_index building_area eri elongation shared_walls neighbor_distance interbuilding_distance adjacency edge_index node_index
0 POLYGON ((-643743.474 -1193358.749, -643743.30... 937.0 172.724408 0.932381 0.844461 0.000000 56.674679 16.599365 0.764706 965.0 740.0
1 POLYGON ((-643751.446 -1193530.633, -643749.37... NaN 21.965841 0.999317 0.988333 0.000000 109.409786 22.629104 0.782609 837.0 580.0
2 POLYGON ((-643281.601 -1193130.831, -643283.76... 184.0 81.733340 0.997493 0.493378 0.000000 0.417396 10.858335 0.575000 193.0 98.0
3 POLYGON ((-643381.904 -1193174.697, -643388.48... 603.0 256.972722 1.000666 0.387946 0.485839 22.889341 12.210405 0.750000 297.0 486.0
4 POLYGON ((-643370.45 -1193130.215, -643398.26 ... 603.0 597.821399 0.998219 0.443877 0.347311 24.389141 11.803361 0.666667 297.0 486.0
[35]:
tessellation[buildings.columns.drop(["geometry", "street_index"])] = (
    buildings.drop(columns=["geometry", "street_index"])
)
merged = tessellation.merge(
    edges.drop(columns="geometry"),
    left_on="edge_index",
    right_index=True,
    how="left",
)
merged = merged.merge(
    nodes.drop(columns="geometry"),
    left_on="node_index",
    right_index=True,
    how="left",
)
[36]:
merged.columns
[36]:
Index(['geometry', 'street_index', 'tess_area', 'convexity', 'neighbors',
       'covered_area', 'car', 'building_area', 'eri', 'elongation',
       'shared_walls', 'neighbor_distance', 'interbuilding_distance',
       'adjacency', 'edge_index', 'node_index', 'length', 'linearity', 'width',
       'openness', 'width_deviation', 'mm_len', 'node_start', 'node_end', 'x',
       'y', 'degree', 'closeness', 'meshedness', 'nodeID'],
      dtype='object')

Understanding the context#

Measure first, second and third quartile of distribution of values within an area around each building.

[37]:
percentiles = []
for column in merged.columns.drop(
    [
        "street_index",
        "node_index",
        "edge_index",
        "nodeID",
        "mm_len",
        "node_start",
        "node_end",
        "geometry",
    ]
):
    perc = momepy.percentile(merged[column], queen_3)
    perc.columns = [f"{column}_" + str(x) for x in perc.columns]
    percentiles.append(perc)
[38]:
percentiles_joined = pandas.concat(percentiles, axis=1)
percentiles_joined.head()
[38]:
tess_area_25 tess_area_50 tess_area_75 convexity_25 convexity_50 convexity_75 neighbors_25 neighbors_50 neighbors_75 covered_area_25 ... y_75 degree_25 degree_50 degree_75 closeness_25 closeness_50 closeness_75 meshedness_25 meshedness_50 meshedness_75
focal
0 369.693629 626.079151 2359.235097 0.924459 0.953589 0.982927 0.025615 0.046237 0.063540 4011.681217 ... -1.193359e+06 3.0 3.0 3.0 0.000077 0.000107 0.000125 0.081081 0.092265 0.150943
1 442.501768 785.401076 2877.923774 0.902451 0.925140 0.973085 0.030303 0.037937 0.065890 5625.533503 ... -1.193374e+06 1.0 3.0 3.0 0.000009 0.000061 0.000107 0.000000 0.075023 0.105263
2 334.430672 443.283914 751.082577 0.931477 0.970869 0.986360 0.047184 0.055626 0.063359 3843.048071 ... -1.193079e+06 1.0 3.0 3.0 0.000093 0.000165 0.000220 0.066667 0.142857 0.178218
3 401.971416 544.201538 920.555185 0.896331 0.954957 0.977835 0.038791 0.056499 0.062440 4018.728582 ... -1.193084e+06 1.0 3.0 4.0 0.000093 0.000135 0.000175 0.066667 0.142857 0.162162
4 334.454239 471.247433 759.940534 0.916389 0.962276 0.989571 0.046767 0.056901 0.063341 3739.954823 ... -1.193084e+06 3.0 3.0 4.0 0.000093 0.000165 0.000220 0.066667 0.142857 0.178218

5 rows × 66 columns

See the difference between original convexity and spatially lagged one.

[39]:
fig, ax = plt.subplots(1, 2, figsize=(24, 12))

tessellation.plot("convexity", ax=ax[0], scheme="natural_breaks", legend=True)
merged.plot(
    percentiles_joined["convexity_50"].values,
    ax=ax[1],
    scheme="natural_breaks",
    legend=True,
)

ax[0].set_axis_off()
ax[1].set_axis_off()
../_images/examples_clustering_67_0.png

Clustering#

Now we can use obtained values within a cluster analysis that should detect types of urban structure.

Standardize values before clustering.

[40]:
standardized = (
    percentiles_joined - percentiles_joined.mean()
) / percentiles_joined.std()
standardized.head()
[40]:
tess_area_25 tess_area_50 tess_area_75 convexity_25 convexity_50 convexity_75 neighbors_25 neighbors_50 neighbors_75 covered_area_25 ... y_75 degree_25 degree_50 degree_75 closeness_25 closeness_50 closeness_75 meshedness_25 meshedness_50 meshedness_75
focal
0 -0.232604 -0.327126 0.182779 0.313037 -0.072057 0.182388 -0.820302 -0.117941 -0.025669 -0.291723 ... 0.026289 0.576119 0.253427 -0.073790 0.109853 0.320154 0.306863 -0.000513 -0.098333 0.160738
1 -0.137802 -0.188085 0.439424 -0.209634 -1.160943 -0.493619 -0.442184 -0.624040 0.073661 0.042363 ... 0.019032 -1.573344 0.253427 -0.073790 -1.124838 -0.434267 0.043104 -0.937209 -0.261538 -0.178540
2 -0.278519 -0.486652 -0.612927 0.479691 0.589316 0.418150 0.919593 0.454523 -0.033311 -0.326633 ... 0.162906 -1.573344 0.253427 -0.073790 0.410419 1.255785 1.714864 -0.167037 0.380569 0.363312
3 -0.190576 -0.398581 -0.529072 -0.354960 -0.019717 -0.167393 0.242536 0.507783 -0.072154 -0.290265 ... 0.160515 -1.573344 0.253427 1.364353 0.410419 0.777991 1.049198 -0.167037 0.380569 0.244063
4 -0.278488 -0.462248 -0.608544 0.121378 0.260422 0.638709 0.886009 0.532254 -0.034044 -0.347974 ... 0.160515 0.576119 0.253427 1.364353 0.410419 1.255785 1.714864 -0.167037 0.380569 0.363312

5 rows × 66 columns

How many clusters?#

To determine how many clusters we should aim for, we can use a little package called clustergram. See its documentation for details.

[41]:
cgram = Clustergram(range(1, 12), n_init=10, random_state=42)
cgram.fit(standardized.fillna(0))

show(cgram.bokeh())
K=1 skipped. Mean computed from data directly.
K=2 fitted in 0.046 seconds.
K=3 fitted in 0.227 seconds.
K=4 fitted in 0.124 seconds.
K=5 fitted in 0.137 seconds.
K=6 fitted in 0.098 seconds.
K=7 fitted in 0.142 seconds.
K=8 fitted in 0.145 seconds.
K=9 fitted in 0.186 seconds.
K=10 fitted in 0.218 seconds.
K=11 fitted in 0.204 seconds.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.

Clustegram gives us also the final labels. (Normally, you would run the final clustering on much larger number of initialisations.)

[42]:
cgram.labels.head()
[42]:
1 2 3 4 5 6 7 8 9 10 11
0 0 0 1 3 0 0 0 4 0 5 9
1 0 1 1 3 0 0 0 4 3 5 9
2 0 0 2 0 3 2 1 1 4 1 0
3 0 0 2 3 0 0 0 4 0 1 0
4 0 0 2 0 3 2 1 1 4 1 0
[43]:
merged["cluster"] = cgram.labels[10].values
buildings["cluster"] = merged["cluster"]
buildings.plot(
    "cluster", categorical=True, figsize=(16, 16), legend=True
).set_axis_off()
../_images/examples_clustering_74_0.png
[44]:
ax = buildings.plot("cluster", categorical=True, figsize=(16, 16), legend=True)
ax.set_xlim(-645000, -641000)
ax.set_ylim(-1195500, -1191000)
ax.set_axis_off()
../_images/examples_clustering_75_0.png