Set-Operations with Overlay

When working with multiple spatial datasets – especially multiple polygon or line datasets – users often wish to create new shapes based on places where those datasets overlap (or don’t overlap). These manipulations are often referred using the language of sets – intersections, unions, and differences. These types of operations are made available in the geopandas library through the overlay function.

The basic idea is demonstrated by the graphic below but keep in mind that overlays operate at the DataFrame level, not on individual geometries, and the properties from both are retained. In effect, for every shape in the first GeoDataFrame, this operation is executed against every other shape in the other GeoDataFrame:

_images/overlay_operations.png

Source: QGIS Documentation

(Note to users familiar with the shapely library: overlay can be thought of as offering versions of the standard shapely set-operations that deal with the complexities of applying set operations to two GeoSeries. The standard shapely set-operations are also available as GeoSeries methods.)

The different Overlay operations

First, we create some example data:

In [1]: from shapely.geometry import Polygon

In [2]: polys1 = geopandas.GeoSeries([Polygon([(0,0), (2,0), (2,2), (0,2)]),
   ...:                               Polygon([(2,2), (4,2), (4,4), (2,4)])])
   ...: 

In [3]: polys2 = geopandas.GeoSeries([Polygon([(1,1), (3,1), (3,3), (1,3)]),
   ...:                               Polygon([(3,3), (5,3), (5,5), (3,5)])])
   ...: 

In [4]: df1 = geopandas.GeoDataFrame({'geometry': polys1, 'df1':[1,2]})

In [5]: df2 = geopandas.GeoDataFrame({'geometry': polys2, 'df2':[1,2]})

These two GeoDataFrames have some overlapping areas:

In [6]: ax = df1.plot(color='red');

In [7]: df2.plot(ax=ax, color='green', alpha=0.5);
_images/overlay_example.png

We illustrate the different overlay modes with the above example. The overlay function will determine the set of all individual geometries from overlaying the two input GeoDataFrames. This result covers the area covered by the two input GeoDataFrames, and also preserves all unique regions defined by the combined boundaries of the two GeoDataFrames.

When using how='union', all those possible geometries are returned:

In [8]: res_union = geopandas.overlay(df1, df2, how='union')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-08d89f109843> in <module>()
----> 1 res_union = geopandas.overlay(df1, df2, how='union')

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in overlay(df1, df2, how, make_valid, use_sindex)
    369         result = _overlay_symmetric_diff(df1, df2)
    370     elif how == 'union':
--> 371         result = _overlay_union(df1, df2)
    372     elif how == 'identity':
    373         dfunion = _overlay_union(df1, df2)

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in _overlay_union(df1, df2)
    296     Overlay Union operation used in overlay function
    297     """
--> 298     dfinter = _overlay_intersection(df1, df2)
    299     dfsym = _overlay_symmetric_diff(df1, df2)
    300     dfunion = pd.concat([dfinter, dfsym], ignore_index=True, **CONCAT_KWARGS)

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in _overlay_intersection(df1, df2)
    210     spatial_index = df2.sindex
    211     bbox = df1.geometry.apply(lambda x: x.bounds)
--> 212     sidx = bbox.apply(lambda x: list(spatial_index.intersection(x)))
    213     # Create pairs of geometries in both dataframes to be intersected
    214     nei = []

/usr/lib/python3/dist-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   3192             else:
   3193                 values = self.astype(object).values
-> 3194                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3195 
   3196         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in <lambda>(x)
    210     spatial_index = df2.sindex
    211     bbox = df1.geometry.apply(lambda x: x.bounds)
--> 212     sidx = bbox.apply(lambda x: list(spatial_index.intersection(x)))
    213     # Create pairs of geometries in both dataframes to be intersected
    214     nei = []

AttributeError: 'NoneType' object has no attribute 'intersection'

In [9]: res_union
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-9-92e78ff8a48f> in <module>()
----> 1 res_union

NameError: name 'res_union' is not defined

In [10]: ax = res_union.plot(alpha=0.5, cmap='tab10')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-10-abed36dca550> in <module>()
----> 1 ax = res_union.plot(alpha=0.5, cmap='tab10')

NameError: name 'res_union' is not defined

In [11]: df1.plot(ax=ax, facecolor='none', edgecolor='k');

In [12]: df2.plot(ax=ax, facecolor='none', edgecolor='k');
_images/overlay_example_union.png

The other how operations will return different subsets of those geometries. With how='intersection', it returns only those geometries that are contained by both GeoDataFrames:

In [13]: res_intersection = geopandas.overlay(df1, df2, how='intersection')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-2d18ebb3a836> in <module>()
----> 1 res_intersection = geopandas.overlay(df1, df2, how='intersection')

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in overlay(df1, df2, how, make_valid, use_sindex)
    365         return _overlay_difference(df1, df2)
    366     elif how == 'intersection':
--> 367         result = _overlay_intersection(df1, df2)
    368     elif how == 'symmetric_difference':
    369         result = _overlay_symmetric_diff(df1, df2)

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in _overlay_intersection(df1, df2)
    210     spatial_index = df2.sindex
    211     bbox = df1.geometry.apply(lambda x: x.bounds)
--> 212     sidx = bbox.apply(lambda x: list(spatial_index.intersection(x)))
    213     # Create pairs of geometries in both dataframes to be intersected
    214     nei = []

/usr/lib/python3/dist-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   3192             else:
   3193                 values = self.astype(object).values
-> 3194                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3195 
   3196         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in <lambda>(x)
    210     spatial_index = df2.sindex
    211     bbox = df1.geometry.apply(lambda x: x.bounds)
--> 212     sidx = bbox.apply(lambda x: list(spatial_index.intersection(x)))
    213     # Create pairs of geometries in both dataframes to be intersected
    214     nei = []

AttributeError: 'NoneType' object has no attribute 'intersection'

In [14]: res_intersection
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-14-fc41d7acac3a> in <module>()
----> 1 res_intersection

NameError: name 'res_intersection' is not defined

In [15]: ax = res_intersection.plot(cmap='tab10')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-15-152451446e79> in <module>()
----> 1 ax = res_intersection.plot(cmap='tab10')

NameError: name 'res_intersection' is not defined

In [16]: df1.plot(ax=ax, facecolor='none', edgecolor='k');

In [17]: df2.plot(ax=ax, facecolor='none', edgecolor='k');
_images/overlay_example_intersection.png

how='symmetric_difference' is the opposite of 'intersection' and returns the geometries that are only part of one of the GeoDataFrames but not of both:

In [18]: res_symdiff = geopandas.overlay(df1, df2, how='symmetric_difference')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-18-74ac00908539> in <module>()
----> 1 res_symdiff = geopandas.overlay(df1, df2, how='symmetric_difference')

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in overlay(df1, df2, how, make_valid, use_sindex)
    367         result = _overlay_intersection(df1, df2)
    368     elif how == 'symmetric_difference':
--> 369         result = _overlay_symmetric_diff(df1, df2)
    370     elif how == 'union':
    371         result = _overlay_union(df1, df2)

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in _overlay_symmetric_diff(df1, df2)
    271     Overlay Symmetric Difference operation used in overlay function
    272     """
--> 273     dfdiff1 = _overlay_difference(df1, df2)
    274     dfdiff2 = _overlay_difference(df2, df1)
    275     dfdiff1['__idx1'] = range(len(dfdiff1))

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in _overlay_difference(df1, df2)
    253     spatial_index = df2.sindex
    254     bbox = df1.geometry.apply(lambda x: x.bounds)
--> 255     sidx = bbox.apply(lambda x: list(spatial_index.intersection(x)))
    256     # Create differences
    257     new_g = []

/usr/lib/python3/dist-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   3192             else:
   3193                 values = self.astype(object).values
-> 3194                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3195 
   3196         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in <lambda>(x)
    253     spatial_index = df2.sindex
    254     bbox = df1.geometry.apply(lambda x: x.bounds)
--> 255     sidx = bbox.apply(lambda x: list(spatial_index.intersection(x)))
    256     # Create differences
    257     new_g = []

AttributeError: 'NoneType' object has no attribute 'intersection'

In [19]: res_symdiff
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-19-81159af402c3> in <module>()
----> 1 res_symdiff

NameError: name 'res_symdiff' is not defined

In [20]: ax = res_symdiff.plot(cmap='tab10')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-20-96caff01b875> in <module>()
----> 1 ax = res_symdiff.plot(cmap='tab10')

NameError: name 'res_symdiff' is not defined

In [21]: df1.plot(ax=ax, facecolor='none', edgecolor='k');

In [22]: df2.plot(ax=ax, facecolor='none', edgecolor='k');
_images/overlay_example_symdiff.png

To obtain the geometries that are part of df1 but are not contained in df2, you can use how='difference':

In [23]: res_difference = geopandas.overlay(df1, df2, how='difference')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-23-2f631a1979e5> in <module>()
----> 1 res_difference = geopandas.overlay(df1, df2, how='difference')

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in overlay(df1, df2, how, make_valid, use_sindex)
    363 
    364     if how == 'difference':
--> 365         return _overlay_difference(df1, df2)
    366     elif how == 'intersection':
    367         result = _overlay_intersection(df1, df2)

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in _overlay_difference(df1, df2)
    253     spatial_index = df2.sindex
    254     bbox = df1.geometry.apply(lambda x: x.bounds)
--> 255     sidx = bbox.apply(lambda x: list(spatial_index.intersection(x)))
    256     # Create differences
    257     new_g = []

/usr/lib/python3/dist-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   3192             else:
   3193                 values = self.astype(object).values
-> 3194                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3195 
   3196         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in <lambda>(x)
    253     spatial_index = df2.sindex
    254     bbox = df1.geometry.apply(lambda x: x.bounds)
--> 255     sidx = bbox.apply(lambda x: list(spatial_index.intersection(x)))
    256     # Create differences
    257     new_g = []

AttributeError: 'NoneType' object has no attribute 'intersection'

In [24]: res_difference
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-24-d8f1a9e0dbdf> in <module>()
----> 1 res_difference

NameError: name 'res_difference' is not defined

In [25]: ax = res_difference.plot(cmap='tab10')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-25-8ca9a1a26abd> in <module>()
----> 1 ax = res_difference.plot(cmap='tab10')

NameError: name 'res_difference' is not defined

In [26]: df1.plot(ax=ax, facecolor='none', edgecolor='k');

In [27]: df2.plot(ax=ax, facecolor='none', edgecolor='k');
_images/overlay_example_difference.png

Finally, with how='identity', the result consists of the surface of df1, but with the geometries obtained from overlaying df1 with df2:

In [28]: res_identity = geopandas.overlay(df1, df2, how='identity')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-28-e662a83cbbed> in <module>()
----> 1 res_identity = geopandas.overlay(df1, df2, how='identity')

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in overlay(df1, df2, how, make_valid, use_sindex)
    371         result = _overlay_union(df1, df2)
    372     elif how == 'identity':
--> 373         dfunion = _overlay_union(df1, df2)
    374         result = dfunion[dfunion['__idx1'].notnull()].copy()
    375         result.reset_index(drop=True, inplace=True)

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in _overlay_union(df1, df2)
    296     Overlay Union operation used in overlay function
    297     """
--> 298     dfinter = _overlay_intersection(df1, df2)
    299     dfsym = _overlay_symmetric_diff(df1, df2)
    300     dfunion = pd.concat([dfinter, dfsym], ignore_index=True, **CONCAT_KWARGS)

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in _overlay_intersection(df1, df2)
    210     spatial_index = df2.sindex
    211     bbox = df1.geometry.apply(lambda x: x.bounds)
--> 212     sidx = bbox.apply(lambda x: list(spatial_index.intersection(x)))
    213     # Create pairs of geometries in both dataframes to be intersected
    214     nei = []

/usr/lib/python3/dist-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   3192             else:
   3193                 values = self.astype(object).values
-> 3194                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3195 
   3196         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in <lambda>(x)
    210     spatial_index = df2.sindex
    211     bbox = df1.geometry.apply(lambda x: x.bounds)
--> 212     sidx = bbox.apply(lambda x: list(spatial_index.intersection(x)))
    213     # Create pairs of geometries in both dataframes to be intersected
    214     nei = []

AttributeError: 'NoneType' object has no attribute 'intersection'

In [29]: res_identity
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-29-3161c912a7ae> in <module>()
----> 1 res_identity

NameError: name 'res_identity' is not defined

In [30]: ax = res_identity.plot(cmap='tab10')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-30-2b781fec7375> in <module>()
----> 1 ax = res_identity.plot(cmap='tab10')

NameError: name 'res_identity' is not defined

In [31]: df1.plot(ax=ax, facecolor='none', edgecolor='k');

In [32]: df2.plot(ax=ax, facecolor='none', edgecolor='k');
_images/overlay_example_identity.png

Overlay Countries Example

First, we load the countries and cities example datasets and select :

In [33]: world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))

In [34]: capitals = geopandas.read_file(geopandas.datasets.get_path('naturalearth_cities'))

# Select South Amarica and some columns
In [35]: countries = world[world['continent'] == "South America"]

In [36]: countries = countries[['geometry', 'name']]

# Project to crs that uses meters as distance measure
In [37]: countries = countries.to_crs('+init=epsg:3395')

In [38]: capitals = capitals.to_crs('+init=epsg:3395')

To illustrate the overlay function, consider the following case in which one wishes to identify the “core” portion of each country – defined as areas within 500km of a capital – using a GeoDataFrame of countries and a GeoDataFrame of capitals.

# Look at countries:
In [39]: countries.plot();

# Now buffer cities to find area within 500km.
# Check CRS -- World Mercator, units of meters.
In [40]: capitals.crs
Out[40]: '+init=epsg:3395'

# make 500km buffer
In [41]: capitals['geometry']= capitals.buffer(500000)

In [42]: capitals.plot();
_images/world_basic.png _images/capital_buffers.png

To select only the portion of countries within 500km of a capital, we specify the how option to be “intersect”, which creates a new set of polygons where these two layers overlap:

In [43]: country_cores = geopandas.overlay(countries, capitals, how='intersection')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-43-2b44852a4019> in <module>()
----> 1 country_cores = geopandas.overlay(countries, capitals, how='intersection')

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in overlay(df1, df2, how, make_valid, use_sindex)
    365         return _overlay_difference(df1, df2)
    366     elif how == 'intersection':
--> 367         result = _overlay_intersection(df1, df2)
    368     elif how == 'symmetric_difference':
    369         result = _overlay_symmetric_diff(df1, df2)

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in _overlay_intersection(df1, df2)
    210     spatial_index = df2.sindex
    211     bbox = df1.geometry.apply(lambda x: x.bounds)
--> 212     sidx = bbox.apply(lambda x: list(spatial_index.intersection(x)))
    213     # Create pairs of geometries in both dataframes to be intersected
    214     nei = []

/usr/lib/python3/dist-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   3192             else:
   3193                 values = self.astype(object).values
-> 3194                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3195 
   3196         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in <lambda>(x)
    210     spatial_index = df2.sindex
    211     bbox = df1.geometry.apply(lambda x: x.bounds)
--> 212     sidx = bbox.apply(lambda x: list(spatial_index.intersection(x)))
    213     # Create pairs of geometries in both dataframes to be intersected
    214     nei = []

AttributeError: 'NoneType' object has no attribute 'intersection'

In [44]: country_cores.plot(alpha=0.5, edgecolor='k', cmap='tab10');
_images/country_cores.png

Changing the “how” option allows for different types of overlay operations. For example, if we were interested in the portions of countries far from capitals (the peripheries), we would compute the difference of the two.

In [45]: country_peripheries = geopandas.overlay(countries, capitals, how='difference')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-45-cada8b87a9e6> in <module>()
----> 1 country_peripheries = geopandas.overlay(countries, capitals, how='difference')

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in overlay(df1, df2, how, make_valid, use_sindex)
    363 
    364     if how == 'difference':
--> 365         return _overlay_difference(df1, df2)
    366     elif how == 'intersection':
    367         result = _overlay_intersection(df1, df2)

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in _overlay_difference(df1, df2)
    253     spatial_index = df2.sindex
    254     bbox = df1.geometry.apply(lambda x: x.bounds)
--> 255     sidx = bbox.apply(lambda x: list(spatial_index.intersection(x)))
    256     # Create differences
    257     new_g = []

/usr/lib/python3/dist-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   3192             else:
   3193                 values = self.astype(object).values
-> 3194                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3195 
   3196         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()

/build/python-geopandas-0.4.0/geopandas/tools/overlay.py in <lambda>(x)
    253     spatial_index = df2.sindex
    254     bbox = df1.geometry.apply(lambda x: x.bounds)
--> 255     sidx = bbox.apply(lambda x: list(spatial_index.intersection(x)))
    256     # Create differences
    257     new_g = []

AttributeError: 'NoneType' object has no attribute 'intersection'

In [46]: country_peripheries.plot(alpha=0.5, edgecolor='k', cmap='tab10');
_images/country_peripheries.png

More Examples

A larger set of examples of the use of overlay can be found here