Analysing CityModels with pandas and scikit-learn

zurich.png

In the following I show how to compute some attributes from CityObject geometry and use these attributes as input for machine learning. For this we use the LoD2 model of Zürich.

Download the Zürich data set from https://3d.bk.tudelft.nl/opendata/cityjson/1.0/Zurich_Building_LoD2_V10.json

For this tutorial you’ll need pandas, shapely, matplotlib, numpy and scikit-learn.

[1]:
from pathlib import Path
from copy import deepcopy
from cjio import cityjson
from shapely.geometry import Polygon
import matplotlib.pyplot as plt
plt.close('all')
from sklearn.preprocessing import FunctionTransformer
from sklearn import cluster
import numpy as np

Set up the paths for the tutorial.

[2]:
package_dir = Path(__name__).resolve().parent.parent.parent
data_dir = package_dir / 'tests' / 'data'

path = data_dir / 'zurich' / 'zurich.json'
zurich = cityjson.load(path, transform=True)

Feature engineering

Here is a simple geometry function that computes the area of the groundsurface (footprint) of buildings in the model. It also show how to cast surfaces, in this case the ground surface, to Shapely Polygons.

[3]:
def compute_footprint_area(co):
    """Compute the area of the footprint"""
    footprint_area = 0
    for geom in co.geometry:

        # only LoD2 (or higher) objects have semantic surfaces
        if geom.lod >= 2.0:
            footprints = geom.get_surfaces(type='groundsurface')

            # there can be many surfaces with label 'groundsurface'
            for i,f in footprints.items():
                for multisurface in geom.get_surface_boundaries(f):
                    for surface in multisurface:

                        # cast to Shapely polygon
                        shapely_poly = Polygon(surface)
                        footprint_area += shapely_poly.area

    return footprint_area

Compute new attributes

Then we need to loop through the CityObjects and update add the new attributes. Note that the attributes CityObject attribute is just a dictionary.

Thus we compute the number of vertices of the CityObject and the area of is footprint. Then we going to cluster these two variables. This is completely arbitrary excercise which is simply meant to illustrate how to transform a city model into machine-learnable features.

[4]:
for co_id, co in zurich.cityobjects.items():
    co.attributes['nr_vertices'] = len(co.get_vertices())
    co.attributes['fp_area'] = compute_footprint_area(co)
    zurich.cityobjects[co_id] = co

Export to DataFrame

It is possible to export the city model into a pandas DataFrame. Note that only the CityObject attributes are exported into the dataframe, with CityObject IDs as the index of the dataframe. Thus if you want to export the attributes of SemanticSurfaces for example, then you need to add them as CityObject attributes.

The function below illustrates this operation.

[5]:
def assign_cityobject_attribute(cm):
    """Copy the semantic surface attributes to CityObject attributes.
    Returns a copy of the citymodel.
    """
    new_cos = {}
    cm_copy = deepcopy(cm)
    for co_id, co in cm.cityobjects.items():
        for geom in co.geometry:
            for srf in geom.surfaces.values():
                if 'attributes' in srf:
                    for attr,a_v in srf['attributes'].items():
                        if (attr not in co.attributes) or (co.attributes[attr] is None):
                            co.attributes[attr] = [a_v]
                        else:
                            co.attributes[attr].append(a_v)
        new_cos[co_id] = co
    cm_copy.cityobjects = new_cos
    return cm_copy
[6]:
df = zurich.to_dataframe()
df.head()
[6]:
creationDate Geomtype nr_vertices fp_area class Herkunft QualitaetStatus FileCreationDate Region GebaeudeStatus
UUID_93fc5bae-4446-4336-9ff8-6679ebfdfde3 2017-01-23 1.0 24 65.209763 NaN NaN NaN NaN NaN NaN
UUID_c9884c4e-1cac-47f5-b88b-6fb074c0ae50 2017-01-23 NaN 0 0.000000 BB01 EE_LB_2007 1.0 2012-02-23 2.0 1.0
UUID_a4a09780-153f-4385-ad19-3a92a6c4eec4 2017-01-23 1.0 38 20.784309 NaN NaN NaN NaN NaN NaN
UUID_ba0bb815-5276-4e35-b4c1-878cbf6ba934 2017-01-23 NaN 0 0.000000 BB07 EE_LB_2007 1.0 2012-02-23 2.0 1.0
UUID_bb1835bc-7437-453f-ac08-885de0503aaa 2017-01-23 1.0 87 69.363823 NaN NaN NaN NaN NaN NaN

In order to have a nicer distribution of the data, we remove the missing values and apply a log-transform on the two variables. Note that the FuntionTransformer.transform transforms a DataFrame to a numpy array that is ready to be used in scikit-learn. The details of a machine learning workflow is beyond the scope of this tutorial however.

[7]:
df_subset = df[df['Geomtype'].notnull() & df['fp_area'] > 0.0].loc[:, ['nr_vertices', 'fp_area']]
transformer = FunctionTransformer(np.log, validate=True)
df_logtransform = transformer.transform(df_subset)
[14]:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.scatter(df_logtransform[:,0], df_logtransform[:,1], alpha=0.3, s=2.0)
plt.show()
[9]:
def plot_model_results(model, data):
    fig = plt.figure()
    ax = fig.add_subplot(1, 1, 1)
    colormap = np.array(['lightblue', 'red', 'lime', 'blue','black'])
    ax.scatter(data[:,0], data[:,1], c=colormap[model.labels_], s=10, alpha=0.5)
    ax.set_xlabel('Number of vertices [log]')
    ax.set_ylabel('Footprint area [log]')
    plt.title(f"DBSCAN clustering with estimated {len(set(model.labels_))} clusters")
    plt.show()

Since we transformed our DataFrame, we can fit any model in scikit-learn. I use DBSCAN because I wanted to find the data points on the fringes of the central cluster.

[15]:
%matplotlib notebook
model = cluster.DBSCAN(eps=0.2).fit(df_logtransform)

plot_model_results(model, df_logtransform)

Merge the cluster labels back to the data frame.

[11]:
df_subset['dbscan'] = model.labels_

Save the results back to CityJSON

And merge the DataFrame with cluster labels back to the city model.

[12]:
for co_id, co in zurich.cityobjects.items():
    if co_id in df_subset.index:
        ml_results = dict(df_subset.loc[co_id])
    else:
        ml_results = {'nr_vertices': 'nan', 'fp_area': 'nan', 'dbscan': 'nan'}
    new_attrs = {**co.attributes, **ml_results}
    co.attributes = new_attrs
    zurich.cityobjects[co_id] = co

At the end, the save() method saves the edited city model into a CityJSON file.

[13]:
path_out = data_dir / 'zurich_output.json'
cityjson.save(zurich, path_out)

And view the results in QGIS again

zurichmlresult.png

However, you’ll need to set up the styling based on the cluster labels by hand.