dask_geopandas.GeoDataFrame.spatial_shuffle#

GeoDataFrame.spatial_shuffle(by='hilbert', level=None, calculate_partitions=True, npartitions=None, divisions=None, **kwargs)#

Shuffle the data into spatially consistent partitions.

This realigns the dataset to be spatially sorted, i.e. geometries that are spatially near each other will be within the same partition. This is useful especially for overlay operations like a spatial join as it reduces the number of interactions between individual partitions.

The spatial information is stored in the index and will replace the existing index.

Note that spatial_shuffle uses set_index under the hood and comes with all its potential performance drawbacks.

Parameters:
bystring (default ‘hilbert’)

Spatial sorting method, one of {‘hilbert’, ‘morton’, ‘geohash’}. See hilbert_distance, morton_distance and geohash methods for details.

levelint (default None)

Level (precision) of the Hilbert and Morton curves used as a sorting method. Defaults to 16. Does not have an effect for the 'geohash' option.

calculate_partitionsbool (default True)

Calculate new spatial partitions after shuffling

npartitionsint, None, or ‘auto’

The ideal number of output partitions. If None, use the same as the input. If ‘auto’ then decide by memory use. Only used when divisions is not given. If divisions is given, the number of output partitions will be len(divisions) - 1.

divisions: list, optional

The “dividing lines” used to split the new index into partitions. Needs to match the values returned by the sorting method.

**kwargs

Keyword arguments passed to set_index.

Returns:
dask_geopandas.GeoDataFrame

Notes

This method, similarly to calculate_spatial_partitions, is computed partially eagerly as it needs to calculate the distances for all existing partitions before it can determine the divisions for the new spatially-shuffled partitions.