The relationship between Dask-GeoPandas and GeoPandas is the same as the relationship
pandas. We recommend checking the
Dask documentation to better understand how
DataFrames are scaled before diving into Dask-GeoPandas.
Given a GeoPandas dataframe
import geopandas df = geopandas.read_file('...')
We can repartition it into a Dask-GeoPandas dataframe:
import dask_geopandas ddf = dask_geopandas.from_geopandas(df, npartitions=4)
By default, this repartitions the data naively by rows. However, you can also provide spatial partitioning to take advantage of the spatial structure of the GeoDataFrame.
ddf = ddf.spatial_shuffle()
The familiar spatial attributes and methods of GeoPandas are also available and will be computed in parallel:
Additionally, if you have a distributed dask.dataframe you can pass columns of
x-y points to the
import dask.dataframe as dd import dask_geopandas ddf = dd.read_csv('...') ddf = ddf.set_geometry( dask_geopandas.points_from_xy(ddf, 'latitude', 'longitude') )
Writing files (and reading back) is currently supported for the Parquet and Feather file formats.
ddf.to_parquet("path/to/dir/") ddf = dask_geopandas.read_parquet("path/to/dir/")
Traditional GIS file formats can be read into partitioned GeoDataFrame
pyogrio) but not written.
ddf = dask_geopandas.read_file("file.gpkg", npartitions=4)