Analysing CityModels with `pandas` and `scikit-learn`¶

In the following I show how to compute some attributes from CityObject geometry and use these attributes as input for machine learning. For this we use the LoD2 model of Zürich.

Download the Zürich data set from https://3d.bk.tudelft.nl/opendata/cityjson/1.0/Zurich_Building_LoD2_V10.json

For this tutorial you’ll need pandas, shapely, matplotlib, numpy and scikit-learn.

[1]:

from pathlib import Path
from copy import deepcopy
from cjio import cityjson
from shapely.geometry import Polygon
import matplotlib.pyplot as plt
plt.close('all')
from sklearn.preprocessing import FunctionTransformer
from sklearn import cluster
import numpy as np

Set up the paths for the tutorial.

[2]:

package_dir = Path(__name__).resolve().parent.parent.parent
data_dir = package_dir / 'tests' / 'data'

path = data_dir / 'zurich' / 'zurich.json'
zurich = cityjson.load(path, transform=True)

Feature engineering¶

Here is a simple geometry function that computes the area of the groundsurface (footprint) of buildings in the model. It also show how to cast surfaces, in this case the ground surface, to Shapely Polygons.

[3]:

def compute_footprint_area(co):
    """Compute the area of the footprint"""
    footprint_area = 0
    for geom in co.geometry:

        # only LoD2 (or higher) objects have semantic surfaces
        if geom.lod >= 2.0:
            footprints = geom.get_surfaces(type='groundsurface')

            # there can be many surfaces with label 'groundsurface'
            for i,f in footprints.items():
                for multisurface in geom.get_surface_boundaries(f):
                    for surface in multisurface:

                        # cast to Shapely polygon
                        shapely_poly = Polygon(surface)
                        footprint_area += shapely_poly.area

    return footprint_area

Compute new attributes¶

Then we need to loop through the CityObjects and update add the new attributes. Note that the attributes CityObject attribute is just a dictionary.

Thus we compute the number of vertices of the CityObject and the area of is footprint. Then we going to cluster these two variables. This is completely arbitrary excercise which is simply meant to illustrate how to transform a city model into machine-learnable features.

[4]:

for co_id, co in zurich.cityobjects.items():
    co.attributes['nr_vertices'] = len(co.get_vertices())
    co.attributes['fp_area'] = compute_footprint_area(co)
    zurich.cityobjects[co_id] = co

Export to DataFrame¶

It is possible to export the city model into a pandas DataFrame. Note that only the CityObject attributes are exported into the dataframe, with CityObject IDs as the index of the dataframe. Thus if you want to export the attributes of SemanticSurfaces for example, then you need to add them as CityObject attributes.

The function below illustrates this operation.

[5]:

def assign_cityobject_attribute(cm):
    """Copy the semantic surface attributes to CityObject attributes.
    Returns a copy of the citymodel.
    """
    new_cos = {}
    cm_copy = deepcopy(cm)
    for co_id, co in cm.cityobjects.items():
        for geom in co.geometry:
            for srf in geom.surfaces.values():
                if 'attributes' in srf:
                    for attr,a_v in srf['attributes'].items():
                        if (attr not in co.attributes) or (co.attributes[attr] is None):
                            co.attributes[attr] = [a_v]
                        else:
                            co.attributes[attr].append(a_v)
        new_cos[co_id] = co
    cm_copy.cityobjects = new_cos
    return cm_copy

[6]:

df = zurich.to_dataframe()
df.head()

[6]:

	creationDate	Geomtype	nr_vertices	fp_area	class	Herkunft	QualitaetStatus	FileCreationDate	Region	GebaeudeStatus
UUID_93fc5bae-4446-4336-9ff8-6679ebfdfde3	2017-01-23	1.0	24	65.209763	NaN	NaN	NaN	NaN	NaN	NaN
UUID_c9884c4e-1cac-47f5-b88b-6fb074c0ae50	2017-01-23	NaN	0	0.000000	BB01	EE_LB_2007	1.0	2012-02-23	2.0	1.0
UUID_a4a09780-153f-4385-ad19-3a92a6c4eec4	2017-01-23	1.0	38	20.784309	NaN	NaN	NaN	NaN	NaN	NaN
UUID_ba0bb815-5276-4e35-b4c1-878cbf6ba934	2017-01-23	NaN	0	0.000000	BB07	EE_LB_2007	1.0	2012-02-23	2.0	1.0
UUID_bb1835bc-7437-453f-ac08-885de0503aaa	2017-01-23	1.0	87	69.363823	NaN	NaN	NaN	NaN	NaN	NaN

In order to have a nicer distribution of the data, we remove the missing values and apply a log-transform on the two variables. Note that the FuntionTransformer.transform transforms a DataFrame to a numpy array that is ready to be used in scikit-learn. The details of a machine learning workflow is beyond the scope of this tutorial however.

[7]:

df_subset = df[df['Geomtype'].notnull() & df['fp_area'] > 0.0].loc[:, ['nr_vertices', 'fp_area']]
transformer = FunctionTransformer(np.log, validate=True)
df_logtransform = transformer.transform(df_subset)

[14]:

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.scatter(df_logtransform[:,0], df_logtransform[:,1], alpha=0.3, s=2.0)
plt.show()

[9]:

def plot_model_results(model, data):
    fig = plt.figure()
    ax = fig.add_subplot(1, 1, 1)
    colormap = np.array(['lightblue', 'red', 'lime', 'blue','black'])
    ax.scatter(data[:,0], data[:,1], c=colormap[model.labels_], s=10, alpha=0.5)
    ax.set_xlabel('Number of vertices [log]')
    ax.set_ylabel('Footprint area [log]')
    plt.title(f"DBSCAN clustering with estimated {len(set(model.labels_))} clusters")
    plt.show()

Since we transformed our DataFrame, we can fit any model in scikit-learn. I use DBSCAN because I wanted to find the data points on the fringes of the central cluster.

[15]:

%matplotlib notebook
model = cluster.DBSCAN(eps=0.2).fit(df_logtransform)

plot_model_results(model, df_logtransform)

Merge the cluster labels back to the data frame.

[11]:

df_subset['dbscan'] = model.labels_

Save the results back to CityJSON¶

And merge the DataFrame with cluster labels back to the city model.

[12]:

for co_id, co in zurich.cityobjects.items():
    if co_id in df_subset.index:
        ml_results = dict(df_subset.loc[co_id])
    else:
        ml_results = {'nr_vertices': 'nan', 'fp_area': 'nan', 'dbscan': 'nan'}
    new_attrs = {**co.attributes, **ml_results}
    co.attributes = new_attrs
    zurich.cityobjects[co_id] = co

At the end, the save() method saves the edited city model into a CityJSON file.

[13]:

path_out = data_dir / 'zurich_output.json'
cityjson.save(zurich, path_out)

And view the results in QGIS again¶

However, you’ll need to set up the styling based on the cluster labels by hand.

cjio API tutorial

Creating city models and objects

Analysing CityModels with pandas and scikit-learn¶