Note
Simplified detection of urban types#
Example adapted from the SDSC 2021 Workshop led by Martin Fleischmann. You can see the recording of the workshop on YouTube.
This example illustrates the potential of morphometrics captured by momepy
in capturing the structure of cities. We will pick a town, fetch its data from the OpenStreetMap, and analyse it to detect individual types of urban structure within it.
This method is only illustrative and is based on the more extensive one published by Fleischmann et al. (2021) available from martinfleis/numerical-taxonomy-paper.
Fleischmann M, Feliciotti A, Romice O and Porta S (2021) Methodological Foundation of a Numerical Taxonomy of Urban Form. Environment and Planning B: Urban Analytics and City Science, doi: 10.1177/23998083211059835
It depends on the following packages:
- momepy
- osmnx
- clustergram
- bokeh
- scikit-learn
- geopy
- ipywidgets
[1]:
import geopandas
import libpysal
import matplotlib.pyplot as plt
import momepy
import osmnx
import pandas
from bokeh.io import output_notebook
from bokeh.plotting import show
from clustergram import Clustergram
output_notebook()
Pick a place, ideally a town with a good coverage in OpenStreetMap and its local CRS.
[2]:
place = "Znojmo, Czechia"
local_crs = 5514
We can interactively explore the place we just selected.
[3]:
geopandas.tools.geocode(place).explore()
[3]:
Input data#
We can use OSMnx
to quickly download data from OpenStreetMap. If you intend to download larger areas, we recommend using pyrosm
instead.
Buildings#
[4]:
buildings = osmnx.features_from_place(place, tags={"building": True})
buildings.head()
/Users/martin/miniforge3/envs/momepy/lib/python3.11/site-packages/osmnx/features.py:294: FutureWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
polygon = gdf_place["geometry"].unary_union
[4]:
power | geometry | amenity | brand | brand:wikidata | brand:wikipedia | check_date | name | operator | operator:wikidata | ... | material | name:signed | ref | monitoring:water_level | takeaway | shelter_type | construction | ways | type | emergency | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
element_type | osmid | |||||||||||||||||||||
node | 3372076291 | NaN | POINT (16.05376 48.84683) | NaN | NaN | NaN | NaN | NaN | 7/I/10/A-120 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3372076393 | NaN | POINT (16.05581 48.84158) | NaN | NaN | NaN | NaN | NaN | 7/I/11/A-140 Z | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
3372076394 | NaN | POINT (16.05867 48.83522) | NaN | NaN | NaN | NaN | NaN | 7/I/12/A-220 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
3372076428 | NaN | POINT (16.03949 48.85599) | NaN | NaN | NaN | NaN | NaN | 7/I/8/E | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
3372076429 | NaN | POINT (16.04133 48.85501) | NaN | NaN | NaN | NaN | NaN | 7/I/9/E | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 119 columns
The OSM input may need a bit of cleaning to ensure only proper polygons are kept.
[5]:
buildings.geom_type.value_counts()
[5]:
Polygon 12214
Point 7
Name: count, dtype: int64
[6]:
buildings = buildings[buildings.geom_type == "Polygon"].reset_index(drop=True)
And we should re-project the data from WGS84 to the local projection in meters (momepy default values assume meters not feet or degrees). We will also drop unnecessary columns.
[7]:
buildings = buildings[["geometry"]].to_crs(local_crs)
buildings.head()
[7]:
geometry | |
---|---|
0 | POLYGON ((-643743.474 -1193358.749, -643743.30... |
1 | POLYGON ((-643751.446 -1193530.633, -643749.37... |
2 | POLYGON ((-643281.601 -1193130.831, -643283.76... |
3 | POLYGON ((-643381.904 -1193174.697, -643388.48... |
4 | POLYGON ((-643370.45 -1193130.215, -643398.26 ... |
Streets#
Similar operations are done with streets.
[8]:
osm_graph = osmnx.graph_from_place(place, network_type="drive")
osm_graph = osmnx.projection.project_graph(osm_graph, to_crs=local_crs)
streets = osmnx.graph_to_gdfs(
osmnx.convert.to_undirected(osm_graph),
nodes=False,
edges=True,
node_geometry=False,
fill_edge_geometry=True,
).reset_index(drop=True)
streets.head()
/Users/martin/miniforge3/envs/momepy/lib/python3.11/site-packages/osmnx/graph.py:392: FutureWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
polygon = gdf_place["geometry"].unary_union
[8]:
osmid | ref | name | highway | maxspeed | oneway | reversed | length | from | to | geometry | lanes | bridge | junction | width | tunnel | access | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 33733060 | 361 | Přímětická | secondary | 50 | False | True | 24.574 | 639231391 | 74103628 | LINESTRING (-643229.639 -1192872.949, -643239.... | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 33733060 | 361 | Přímětická | secondary | 50 | False | False | 60.354 | 3775990798 | 74103628 | LINESTRING (-643236.395 -1192790.304, -643237.... | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 50313252 | NaN | Raisova | residential | NaN | True | False | 74.763 | 639231413 | 74103628 | LINESTRING (-643291.344 -1192797.012, -643288.... | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 33733060 | 361 | Přímětická | secondary | 50 | False | True | 54.260 | 74142638 | 639231391 | LINESTRING (-643205.434 -1192921.533, -643219.... | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 50313241 | NaN | Mičurinova | residential | NaN | True | False | 101.376 | 639231391 | 639231314 | LINESTRING (-643229.639 -1192872.949, -643233.... | NaN | NaN | NaN | NaN | NaN | NaN |
We can also do some preprocessing using momepy to ensure we have proper network topology.
[9]:
streets = momepy.remove_false_nodes(streets)
streets = streets[["geometry"]]
[10]:
streets.head()
[10]:
geometry | |
---|---|
0 | LINESTRING (-643229.639 -1192872.949, -643239.... |
1 | LINESTRING (-643236.395 -1192790.304, -643237.... |
2 | LINESTRING (-643291.344 -1192797.012, -643288.... |
3 | LINESTRING (-643205.434 -1192921.533, -643219.... |
4 | LINESTRING (-643229.639 -1192872.949, -643233.... |
Generated data#
Tessellation#
Given building footprints:
We can generate a spatial unit using morphological tessellation:
[11]:
limit = momepy.buffered_limit(buildings, "adaptive")
tessellation = momepy.morphological_tessellation(buildings, clip=limit)
/Users/martin/miniforge3/envs/momepy/lib/python3.11/site-packages/pandas/core/arraylike.py:492: RuntimeWarning: invalid value encountered in create_collection
return getattr(ufunc, method)(*new_inputs, **kwargs)
OpenStreetMap data are often problematic due to low quality of some polygons. If some collapse, we get a mismatch between the length of buildings and the length of polygons.
[12]:
collapsed, _ = momepy.verify_tessellation(tessellation, buildings)
/var/folders/2f/fhks6w_d0k556plcv3rfmshw0000gn/T/ipykernel_96250/3509021287.py:1: UserWarning: Tessellation does not fully match buildings. 20 element(s) disappeared during generation. Index of the affected elements: Index([ 3958, 3968, 4177, 4185, 4188, 4215, 4218, 4222, 4223, 4227,
8534, 9026, 9061, 10465, 10686, 10687, 11255, 11473, 11474, 11478],
dtype='int64').
collapsed, _ = momepy.verify_tessellation(tessellation, buildings)
/var/folders/2f/fhks6w_d0k556plcv3rfmshw0000gn/T/ipykernel_96250/3509021287.py:1: UserWarning: Tessellation contains MultiPolygon elements. Initial objects should be edited. Index of affected elements: [1, 209, 521, 524, 525, 636, 709, 762, 787, 1635, 1640, 1641, 1645, 1647, 1653, 1662, 1697, 1706, 2337, 2882, 2914, 2929, 3173, 3782, 4172, 4404, 4564, 4673, 4685, 4731, 4939, 5152, 5184, 5315, 5490, 5696, 5795, 6141, 6376, 6483, 6577, 6583, 6971, 7171, 7184, 7232, 7377, 7378, 7379, 7380, 7381, 7383, 7384, 7385, 7386, 7388, 7390, 7391, 7394, 7395, 7396, 7397, 7479, 7481, 7482, 7487, 7729, 7755, 7766, 7792, 7958, 8420, 8428, 8439, 8441, 8765, 8945, 8957, 9297, 9457, 9470, 9542, 9609, 9800, 9851, 9988, 10056, 10083, 10210, 10318, 10504, 10657, 10997, 11072, 11266, 11729, 11785, 11878].
collapsed, _ = momepy.verify_tessellation(tessellation, buildings)
Better to drop affected buildings and re-create tessellation.
[13]:
buildings = buildings.drop(collapsed)
limit = momepy.buffered_limit(buildings, "adaptive")
tessellation = momepy.morphological_tessellation(buildings, clip=limit)
Check the result.
[14]:
tessellation.shape[0] == buildings.shape[0]
[14]:
True
Link streets#
Link unique IDs of streets to buildings and tessellation cells based on the nearest neighbor join.
[15]:
buildings["street_index"] = momepy.get_nearest_street(
buildings, streets, max_distance=100
)
buildings
[15]:
geometry | street_index | |
---|---|---|
0 | POLYGON ((-643743.474 -1193358.749, -643743.30... | 937.0 |
1 | POLYGON ((-643751.446 -1193530.633, -643749.37... | NaN |
2 | POLYGON ((-643281.601 -1193130.831, -643283.76... | 184.0 |
3 | POLYGON ((-643381.904 -1193174.697, -643388.48... | 603.0 |
4 | POLYGON ((-643370.45 -1193130.215, -643398.26 ... | 603.0 |
... | ... | ... |
12209 | POLYGON ((-642672.758 -1193279.306, -642675.66... | 110.0 |
12210 | POLYGON ((-642784.166 -1193282.597, -642782.58... | 109.0 |
12211 | POLYGON ((-641909.939 -1196185.823, -641920.49... | 239.0 |
12212 | POLYGON ((-641157.498 -1193763.763, -641173.34... | 962.0 |
12213 | POLYGON ((-645060.824 -1196234.041, -645050.36... | NaN |
12194 rows × 2 columns
Aattach the network index to the tessellation as well.
[16]:
tessellation["street_index"] = buildings["street_index"]
Measure#
Measure individual morphometric characters. For details see the User Guide and the API reference.
Dimensions#
[17]:
buildings["building_area"] = buildings.area
tessellation["tess_area"] = tessellation.area
streets["length"] = streets.length
Shape#
[18]:
buildings["eri"] = momepy.equivalent_rectangular_index(buildings)
buildings["elongation"] = momepy.elongation(buildings)
tessellation["convexity"] = momepy.convexity(tessellation)
streets["linearity"] = momepy.linearity(streets)
[19]:
fig, ax = plt.subplots(1, 2, figsize=(24, 12))
buildings.plot("eri", ax=ax[0], scheme="natural_breaks", legend=True)
buildings.plot("elongation", ax=ax[1], scheme="natural_breaks", legend=True)
ax[0].set_axis_off()
ax[1].set_axis_off()
[20]:
fig, ax = plt.subplots(1, 2, figsize=(24, 12))
tessellation.plot("convexity", ax=ax[0], scheme="natural_breaks", legend=True)
streets.plot("linearity", ax=ax[1], scheme="natural_breaks", legend=True)
ax[0].set_axis_off()
ax[1].set_axis_off()
Spatial distribution#
[21]:
buildings["shared_walls"] = momepy.shared_walls(buildings) / buildings.length
buildings.plot(
"shared_walls", figsize=(12, 12), scheme="natural_breaks", legend=True
).set_axis_off()
/Users/martin/miniforge3/envs/momepy/lib/python3.11/site-packages/shapely/set_operations.py:131: RuntimeWarning: invalid value encountered in intersection
return lib.intersection(a, b, **kwargs)
Generate spatial graph using libpysal
.
[22]:
queen_1 = libpysal.graph.Graph.build_contiguity(tessellation, rook=False)
[23]:
tessellation["neighbors"] = momepy.neighbors(
tessellation, queen_1, weighted=True
)
tessellation["covered_area"] = queen_1.describe(tessellation.area)["sum"]
buildings["neighbor_distance"] = momepy.neighbor_distance(buildings, queen_1)
[24]:
fig, ax = plt.subplots(1, 2, figsize=(24, 12))
buildings.plot(
"neighbor_distance", ax=ax[0], scheme="natural_breaks", legend=True
)
tessellation.plot(
"covered_area", ax=ax[1], scheme="natural_breaks", legend=True
)
ax[0].set_axis_off()
ax[1].set_axis_off()
[25]:
queen_3 = queen_1.higher_order(3)
buildings_q1 = libpysal.graph.Graph.build_contiguity(buildings, rook=False)
buildings["interbuilding_distance"] = momepy.mean_interbuilding_distance(
buildings, queen_1, queen_3
)
buildings["adjacency"] = momepy.building_adjacency(buildings_q1, queen_3)
/Users/martin/Git/momepy/momepy/functional/_distribution.py:224: RuntimeWarning: invalid value encountered in scalar divide
mean_distances[i] = sub_matrix.sum() / sub_matrix.nnz
[26]:
fig, ax = plt.subplots(1, 2, figsize=(24, 12))
buildings.plot(
"interbuilding_distance", ax=ax[0], scheme="natural_breaks", legend=True
)
buildings.plot("adjacency", ax=ax[1], scheme="natural_breaks", legend=True)
ax[0].set_axis_off()
ax[1].set_axis_off()
[27]:
profile = momepy.street_profile(streets, buildings)
streets[profile.columns] = profile
/Users/martin/miniforge3/envs/momepy/lib/python3.11/site-packages/pandas/core/arraylike.py:492: RuntimeWarning: invalid value encountered in intersection
return getattr(ufunc, method)(*new_inputs, **kwargs)
[28]:
fig, ax = plt.subplots(1, 3, figsize=(24, 12))
streets.plot("width", ax=ax[0], scheme="natural_breaks", legend=True)
streets.plot("width_deviation", ax=ax[1], scheme="natural_breaks", legend=True)
streets.plot("openness", ax=ax[2], scheme="natural_breaks", legend=True)
ax[0].set_axis_off()
ax[1].set_axis_off()
ax[2].set_axis_off()
Intensity#
[29]:
tessellation["car"] = buildings.area / tessellation.area
tessellation.plot(
"car", figsize=(12, 12), vmin=0, vmax=1, legend=True
).set_axis_off()
Connectivity#
[30]:
graph = momepy.gdf_to_nx(streets)
graph = momepy.node_degree(graph)
graph = momepy.closeness_centrality(graph, radius=400, distance="mm_len")
graph = momepy.meshedness(graph, radius=400, distance="mm_len")
nodes, edges = momepy.nx_to_gdf(graph)
[31]:
fig, ax = plt.subplots(1, 3, figsize=(24, 12))
nodes.plot(
"degree", ax=ax[0], scheme="natural_breaks", legend=True, markersize=1
)
nodes.plot(
"closeness",
ax=ax[1],
scheme="natural_breaks",
legend=True,
markersize=1,
legend_kwds={"fmt": "{:.6f}"},
)
nodes.plot(
"meshedness", ax=ax[2], scheme="natural_breaks", legend=True, markersize=1
)
ax[0].set_axis_off()
ax[1].set_axis_off()
ax[2].set_axis_off()
/Users/martin/miniforge3/envs/momepy/lib/python3.11/site-packages/mapclassify/classifiers.py:686: UserWarning: Not enough unique values in array to form 5 classes. Setting k to 4.
self._classify()
[32]:
buildings["edge_index"] = momepy.get_nearest_street(buildings, edges)
buildings["node_index"] = momepy.get_nearest_node(
buildings, nodes, edges, buildings["edge_index"]
)
Link all data together (to tessellation cells or buildings).
[33]:
tessellation.head()
[33]:
geometry | street_index | tess_area | convexity | neighbors | covered_area | car | |
---|---|---|---|---|---|---|---|
0 | POLYGON ((-643705.747 -1193399.463, -643706.39... | 937.0 | 17802.172927 | 0.991874 | 0.007811 | 14723.097694 | 0.009702 |
1 | MULTIPOLYGON (((-643656.811 -1193513.936, -643... | NaN | 10599.038596 | 0.904145 | 0.012666 | 45000.300257 | 0.002072 |
2 | POLYGON ((-643281.123 -1193135.364, -643281.27... | 184.0 | 109.816857 | 0.993284 | 0.044640 | 12081.015691 | 0.744270 |
3 | POLYGON ((-643396.488 -1193186.59, -643396.51 ... | 603.0 | 948.152133 | 0.944649 | 0.046599 | 17126.108315 | 0.271025 |
4 | POLYGON ((-643409.362 -1193171.532, -643409.64... | 603.0 | 1750.723353 | 0.963757 | 0.052952 | 15559.865772 | 0.341471 |
[34]:
buildings.head()
[34]:
geometry | street_index | building_area | eri | elongation | shared_walls | neighbor_distance | interbuilding_distance | adjacency | edge_index | node_index | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | POLYGON ((-643743.474 -1193358.749, -643743.30... | 937.0 | 172.724408 | 0.932381 | 0.844461 | 0.000000 | 56.674679 | 16.599365 | 0.764706 | 965.0 | 740.0 |
1 | POLYGON ((-643751.446 -1193530.633, -643749.37... | NaN | 21.965841 | 0.999317 | 0.988333 | 0.000000 | 109.409786 | 22.629104 | 0.782609 | 837.0 | 580.0 |
2 | POLYGON ((-643281.601 -1193130.831, -643283.76... | 184.0 | 81.733340 | 0.997493 | 0.493378 | 0.000000 | 0.417396 | 10.858335 | 0.575000 | 193.0 | 98.0 |
3 | POLYGON ((-643381.904 -1193174.697, -643388.48... | 603.0 | 256.972722 | 1.000666 | 0.387946 | 0.485839 | 22.889341 | 12.210405 | 0.750000 | 297.0 | 486.0 |
4 | POLYGON ((-643370.45 -1193130.215, -643398.26 ... | 603.0 | 597.821399 | 0.998219 | 0.443877 | 0.347311 | 24.389141 | 11.803361 | 0.666667 | 297.0 | 486.0 |
[35]:
tessellation[buildings.columns.drop(["geometry", "street_index"])] = (
buildings.drop(columns=["geometry", "street_index"])
)
merged = tessellation.merge(
edges.drop(columns="geometry"),
left_on="edge_index",
right_index=True,
how="left",
)
merged = merged.merge(
nodes.drop(columns="geometry"),
left_on="node_index",
right_index=True,
how="left",
)
[36]:
merged.columns
[36]:
Index(['geometry', 'street_index', 'tess_area', 'convexity', 'neighbors',
'covered_area', 'car', 'building_area', 'eri', 'elongation',
'shared_walls', 'neighbor_distance', 'interbuilding_distance',
'adjacency', 'edge_index', 'node_index', 'length', 'linearity', 'width',
'openness', 'width_deviation', 'mm_len', 'node_start', 'node_end', 'x',
'y', 'degree', 'closeness', 'meshedness', 'nodeID'],
dtype='object')
Understanding the context#
Measure first, second and third quartile of distribution of values within an area around each building.
[37]:
percentiles = []
for column in merged.columns.drop(
[
"street_index",
"node_index",
"edge_index",
"nodeID",
"mm_len",
"node_start",
"node_end",
"geometry",
]
):
perc = momepy.percentile(merged[column], queen_3)
perc.columns = [f"{column}_" + str(x) for x in perc.columns]
percentiles.append(perc)
[38]:
percentiles_joined = pandas.concat(percentiles, axis=1)
percentiles_joined.head()
[38]:
tess_area_25 | tess_area_50 | tess_area_75 | convexity_25 | convexity_50 | convexity_75 | neighbors_25 | neighbors_50 | neighbors_75 | covered_area_25 | ... | y_75 | degree_25 | degree_50 | degree_75 | closeness_25 | closeness_50 | closeness_75 | meshedness_25 | meshedness_50 | meshedness_75 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
focal | |||||||||||||||||||||
0 | 369.693629 | 626.079151 | 2359.235097 | 0.924459 | 0.953589 | 0.982927 | 0.025615 | 0.046237 | 0.063540 | 4011.681217 | ... | -1.193359e+06 | 3.0 | 3.0 | 3.0 | 0.000077 | 0.000107 | 0.000125 | 0.081081 | 0.092265 | 0.150943 |
1 | 442.501768 | 785.401076 | 2877.923774 | 0.902451 | 0.925140 | 0.973085 | 0.030303 | 0.037937 | 0.065890 | 5625.533503 | ... | -1.193374e+06 | 1.0 | 3.0 | 3.0 | 0.000009 | 0.000061 | 0.000107 | 0.000000 | 0.075023 | 0.105263 |
2 | 334.430672 | 443.283914 | 751.082577 | 0.931477 | 0.970869 | 0.986360 | 0.047184 | 0.055626 | 0.063359 | 3843.048071 | ... | -1.193079e+06 | 1.0 | 3.0 | 3.0 | 0.000093 | 0.000165 | 0.000220 | 0.066667 | 0.142857 | 0.178218 |
3 | 401.971416 | 544.201538 | 920.555185 | 0.896331 | 0.954957 | 0.977835 | 0.038791 | 0.056499 | 0.062440 | 4018.728582 | ... | -1.193084e+06 | 1.0 | 3.0 | 4.0 | 0.000093 | 0.000135 | 0.000175 | 0.066667 | 0.142857 | 0.162162 |
4 | 334.454239 | 471.247433 | 759.940534 | 0.916389 | 0.962276 | 0.989571 | 0.046767 | 0.056901 | 0.063341 | 3739.954823 | ... | -1.193084e+06 | 3.0 | 3.0 | 4.0 | 0.000093 | 0.000165 | 0.000220 | 0.066667 | 0.142857 | 0.178218 |
5 rows × 66 columns
See the difference between original convexity and spatially lagged one.
[39]:
fig, ax = plt.subplots(1, 2, figsize=(24, 12))
tessellation.plot("convexity", ax=ax[0], scheme="natural_breaks", legend=True)
merged.plot(
percentiles_joined["convexity_50"].values,
ax=ax[1],
scheme="natural_breaks",
legend=True,
)
ax[0].set_axis_off()
ax[1].set_axis_off()
Clustering#
Now we can use obtained values within a cluster analysis that should detect types of urban structure.
Standardize values before clustering.
[40]:
standardized = (
percentiles_joined - percentiles_joined.mean()
) / percentiles_joined.std()
standardized.head()
[40]:
tess_area_25 | tess_area_50 | tess_area_75 | convexity_25 | convexity_50 | convexity_75 | neighbors_25 | neighbors_50 | neighbors_75 | covered_area_25 | ... | y_75 | degree_25 | degree_50 | degree_75 | closeness_25 | closeness_50 | closeness_75 | meshedness_25 | meshedness_50 | meshedness_75 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
focal | |||||||||||||||||||||
0 | -0.232604 | -0.327126 | 0.182779 | 0.313037 | -0.072057 | 0.182388 | -0.820302 | -0.117941 | -0.025669 | -0.291723 | ... | 0.026289 | 0.576119 | 0.253427 | -0.073790 | 0.109853 | 0.320154 | 0.306863 | -0.000513 | -0.098333 | 0.160738 |
1 | -0.137802 | -0.188085 | 0.439424 | -0.209634 | -1.160943 | -0.493619 | -0.442184 | -0.624040 | 0.073661 | 0.042363 | ... | 0.019032 | -1.573344 | 0.253427 | -0.073790 | -1.124838 | -0.434267 | 0.043104 | -0.937209 | -0.261538 | -0.178540 |
2 | -0.278519 | -0.486652 | -0.612927 | 0.479691 | 0.589316 | 0.418150 | 0.919593 | 0.454523 | -0.033311 | -0.326633 | ... | 0.162906 | -1.573344 | 0.253427 | -0.073790 | 0.410419 | 1.255785 | 1.714864 | -0.167037 | 0.380569 | 0.363312 |
3 | -0.190576 | -0.398581 | -0.529072 | -0.354960 | -0.019717 | -0.167393 | 0.242536 | 0.507783 | -0.072154 | -0.290265 | ... | 0.160515 | -1.573344 | 0.253427 | 1.364353 | 0.410419 | 0.777991 | 1.049198 | -0.167037 | 0.380569 | 0.244063 |
4 | -0.278488 | -0.462248 | -0.608544 | 0.121378 | 0.260422 | 0.638709 | 0.886009 | 0.532254 | -0.034044 | -0.347974 | ... | 0.160515 | 0.576119 | 0.253427 | 1.364353 | 0.410419 | 1.255785 | 1.714864 | -0.167037 | 0.380569 | 0.363312 |
5 rows × 66 columns
How many clusters?#
To determine how many clusters we should aim for, we can use a little package called clustergram
. See its documentation for details.
[41]:
cgram = Clustergram(range(1, 12), n_init=10, random_state=42)
cgram.fit(standardized.fillna(0))
show(cgram.bokeh())
K=1 skipped. Mean computed from data directly.
K=2 fitted in 0.046 seconds.
K=3 fitted in 0.227 seconds.
K=4 fitted in 0.124 seconds.
K=5 fitted in 0.137 seconds.
K=6 fitted in 0.098 seconds.
K=7 fitted in 0.142 seconds.
K=8 fitted in 0.145 seconds.
K=9 fitted in 0.186 seconds.
K=10 fitted in 0.218 seconds.
K=11 fitted in 0.204 seconds.
BokehDeprecationWarning: 'circle() method with size value' was deprecated in Bokeh 3.4.0 and will be removed, use 'scatter(size=...) instead' instead.
Clustegram gives us also the final labels. (Normally, you would run the final clustering on much larger number of initialisations.)
[42]:
cgram.labels.head()
[42]:
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 1 | 3 | 0 | 0 | 0 | 4 | 0 | 5 | 9 |
1 | 0 | 1 | 1 | 3 | 0 | 0 | 0 | 4 | 3 | 5 | 9 |
2 | 0 | 0 | 2 | 0 | 3 | 2 | 1 | 1 | 4 | 1 | 0 |
3 | 0 | 0 | 2 | 3 | 0 | 0 | 0 | 4 | 0 | 1 | 0 |
4 | 0 | 0 | 2 | 0 | 3 | 2 | 1 | 1 | 4 | 1 | 0 |
[43]:
merged["cluster"] = cgram.labels[10].values
buildings["cluster"] = merged["cluster"]
buildings.plot(
"cluster", categorical=True, figsize=(16, 16), legend=True
).set_axis_off()
[44]:
ax = buildings.plot("cluster", categorical=True, figsize=(16, 16), legend=True)
ax.set_xlim(-645000, -641000)
ax.set_ylim(-1195500, -1191000)
ax.set_axis_off()