dask_geopandas.GeoDataFrame#
- class dask_geopandas.GeoDataFrame(expr, spatial_partitions=None)#
Parallel GeoPandas GeoDataFrame
Do not use this class directly. Instead use functions like
dask_geopandas.read_parquet(),ordask_geopandas.from_geopandas().- __init__(expr, spatial_partitions=None)#
Methods
__init__(expr[, spatial_partitions])abs()Return a Series/DataFrame with absolute numeric value of each element.
add(other[, axis, level, fill_value])add_prefix(prefix)Prefix labels with string prefix.
add_suffix(suffix)Suffix labels with string suffix.
affine_transform(matrix)Return a
GeoSerieswith translated geometries.align(other[, join, axis, fill_value])Align two objects on their axes with the specified join method.
all([axis, skipna, split_every])Return whether all elements are True, potentially over an axis.
analyze([filename, format])Outputs statistics about every node in the expression.
any([axis, skipna, split_every])Return whether any element is True, potentially over an axis.
apply(function, *args[, meta, axis])Parallel version of pandas.DataFrame.apply
assign(**pairs)Assign new columns to a DataFrame.
astype(dtypes)Cast a pandas object to a specified dtype
dtype.bfill([axis, limit])Fill NA/NaN values by using the next valid observation to fill the gap.
buffer(distance[, resolution])Returns a
GeoSeriesof geometries representing all points within a givendistanceof each geometric object.calculate_spatial_partitions()Calculate spatial partitions
categorize([columns, index, split_every])Convert columns of the DataFrame to category dtype.
clear_divisions()Forget division information.
clip(mask[, keep_geom_type])Clip points, lines, or polygon geometries to the mask extent.
combine(other, func[, fill_value, overwrite])Perform column-wise combine with another DataFrame.
combine_first(other)Update null elements with value in the same location in other.
compute(**kwargs)Compute this dask collection
compute_current_divisions([col, set_divisions])Compute the current divisions of the DataFrame.
contains(other, *args, **kwargs)Returns a
Seriesofdtype('bool')with valueTruefor each aligned geometry that contains other.copy()Make a copy of the dataframe
corr([method, min_periods, numeric_only, ...])Compute pairwise correlation of columns, excluding NA/null values.
count([axis, numeric_only, split_every])Count non-NA cells for each column or row.
cov([min_periods, numeric_only, split_every])Compute pairwise covariance of columns, excluding NA/null values.
covered_by(other, *args, **kwargs)Returns a
Seriesofdtype('bool')with valueTruefor each aligned geometry that is entirely covered by other.covers(other, *args, **kwargs)Returns a
Seriesofdtype('bool')with valueTruefor each aligned geometry that is entirely covering other.crosses(other, *args, **kwargs)Returns a
Seriesofdtype('bool')with valueTruefor each aligned geometry that cross other.cummax([axis, skipna])Return cumulative maximum over a DataFrame or Series axis.
cummin([axis, skipna])Return cumulative minimum over a DataFrame or Series axis.
cumprod([axis, skipna])Return cumulative product over a DataFrame or Series axis.
cumsum([axis, skipna])Return cumulative sum over a DataFrame or Series axis.
describe([split_every, percentiles, ...])Generate descriptive statistics.
diff([periods, axis])First discrete difference of element.
difference(other, *args, **kwargs)Returns a
GeoSeriesof the points in each aligned geometry that are not in other.disjoint(other, *args, **kwargs)Returns a
Seriesofdtype('bool')with valueTruefor each aligned geometry disjoint to other.dissolve([by, aggfunc, split_out])Dissolve geometries within
groupbyinto a single geometry.distance(other, *args, **kwargs)Returns a
Seriescontaining the distance to aligned other.div(other[, axis, level, fill_value])divide(other[, axis, level, fill_value])dot(other[, meta])Compute the dot product between the Series and the columns of other.
drop([labels, axis, columns, errors])Drop specified labels from rows or columns.
drop_duplicates([subset, split_every, ...])Return DataFrame with duplicate rows removed.
dropna([how, subset, thresh])Remove missing values.
enforce_runtime_divisions()Enforce the current divisions at runtime.
eq(other[, level, axis])eval(expr, **kwargs)Evaluate a string describing operations on DataFrame columns.
explain([stage, format])Create a graph representation of the Expression.
explode([column, ignore_index, index_parts])Explode multi-part geometries into multiple single geometries.
ffill([axis, limit])Fill NA/NaN values by propagating the last valid observation to next valid.
fillna([value, axis])Fill NA/NaN values using the specified method.
floordiv(other[, axis, level, fill_value])from_dict(data, *[, npartitions, orient, ...])Construct a Dask DataFrame from a Python Dictionary
ge(other[, level, axis])geohash([as_string, precision])Calculate geohash based on the middle points of the geometry bounds for a given precision.
geom_equals(other, *args, **kwargs)Returns a
Seriesofdtype('bool')with valueTruefor each aligned geometry equal to other.geom_equals_exact(other, tolerance)Return True for all geometries that equal aligned other to a given tolerance, else False.
get_partition(n)Get a dask DataFrame/Series representing the nth partition.
groupby(by[, group_keys, sort, observed, dropna])Group DataFrame using a mapper or by a Series of columns.
gt(other[, level, axis])head([n, npartitions, compute])First n rows of the dataset
hilbert_distance([total_bounds, level])Calculate the distance along a Hilbert curve.
idxmax([axis, skipna, numeric_only, split_every])Return index of first occurrence of maximum over requested axis.
idxmin([axis, skipna, numeric_only, split_every])Return index of first occurrence of minimum over requested axis.
info([buf, verbose, memory_usage])Concise summary of a Dask DataFrame
interpolate(distance[, normalized])Return a point at the specified distance along each geometry
intersection(other, *args, **kwargs)Returns a
GeoSeriesof the intersection of points in each aligned geometry with other.intersects(other, *args, **kwargs)Returns a
Seriesofdtype('bool')with valueTruefor each aligned geometry that intersects other.isin(values)Whether each element in the DataFrame is contained in values.
isna()Detect missing values.
isnull()DataFrame.isnull is an alias for DataFrame.isna.
items()Iterate over (column name, Series) pairs.
iterrows()Iterate over DataFrame rows as (index, Series) pairs.
itertuples([index, name])Iterate over DataFrame rows as namedtuples.
join(other[, on, how, lsuffix, rsuffix, ...])Join columns of another DataFrame.
kurt([axis, fisher, bias, nan_policy, ...])Return unbiased kurtosis over requested axis.
kurtosis([axis, fisher, bias, nan_policy, ...])Return unbiased kurtosis over requested axis.
le(other[, level, axis])lower_once()lt(other[, level, axis])map(func[, na_action, meta])map_overlap(func, before, after, *args[, ...])Apply a function to each partition, sharing rows with adjacent partitions.
map_partitions(func, *args[, meta, ...])Apply a Python function to each partition
mask(cond[, other])Replace values where the condition is True.
max([axis, skipna, numeric_only, split_every])Return the maximum of the values over the requested axis.
mean([axis, skipna, numeric_only, split_every])Return the mean of the values over the requested axis.
median([axis, numeric_only])Return the median of the values over the requested axis.
median_approximate([axis, method, numeric_only])Return the approximate median of the values over the requested axis.
melt([id_vars, value_vars, var_name, ...])Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.
memory_usage([deep, index])Return the memory usage of each column in bytes.
memory_usage_per_partition([index, deep])Return the memory usage of each partition
merge(right[, how, on, left_on, right_on, ...])Merge the DataFrame with another DataFrame
min([axis, skipna, numeric_only, split_every])Return the minimum of the values over the requested axis.
mod(other[, axis, level, fill_value])mode([dropna, split_every, numeric_only])Get the mode(s) of each element along the selected axis.
morton_distance([total_bounds, level])Calculate the distance of geometries along the Morton curve
mul(other[, axis, level, fill_value])ne(other[, level, axis])nlargest([n, columns, split_every])Return the first n rows ordered by columns in descending order.
notnull()DataFrame.notnull is an alias for DataFrame.notna.
nsmallest([n, columns, split_every])Return the first n rows ordered by columns in ascending order.
nunique([axis, dropna, split_every])Count number of distinct elements in specified axis.
nunique_approx([split_every])Approximate number of unique rows.
optimize([fuse])Optimizes the DataFrame.
overlaps(other, *args, **kwargs)Returns True for all aligned geometries that overlap other, else False.
persist([fuse])Persist this dask collection into memory
pipe(func, *args, **kwargs)Apply chainable functions that expect Series or DataFrames.
pivot_table(index, columns, values[, aggfunc])Create a spreadsheet-style pivot table as a DataFrame.
pop(item)Return item and drop from frame.
pow(other[, axis, level, fill_value])pprint()Outputs a string representation of the DataFrame.
prod([axis, skipna, numeric_only, ...])Return the product of the values over the requested axis.
product([axis, skipna, numeric_only, ...])Return the product of the values over the requested axis.
project(other, *args, **kwargs)Return the distance along each geometry nearest to other
quantile([q, axis, numeric_only, method])Approximate row-wise and precise column-wise quantiles of DataFrame
query(expr, **kwargs)Filter dataframe with complex expression
radd(other[, axis, level, fill_value])random_split(frac[, random_state, shuffle])Pseudorandomly split dataframe into different pieces row-wise
rdiv(other[, axis, level, fill_value])reduction(chunk[, aggregate, combine, meta, ...])Generic row-wise reductions.
relate(other, *args, **kwargs)Returns the DE-9IM intersection matrices for the geometries
rename([index, columns])Rename columns or index labels.
rename_axis([mapper, index, columns, axis])Set the name of the axis for the index or columns.
rename_geometry(col)Renames the GeoDataFrame geometry column to the specified name.
repartition([divisions, npartitions, ...])Repartition a collection
replace([to_replace, value, regex])Replace values given in to_replace with value.
representative_point()Returns a
GeoSeriesof (cheaply computed) points that are guaranteed to be within each geometry.resample(rule[, closed, label])Resample time-series data.
reset_index([drop])Reset the index to the default index.
rfloordiv(other[, axis, level, fill_value])rmod(other[, axis, level, fill_value])rmul(other[, axis, level, fill_value])rolling(window, **kwargs)Provides rolling transformations.
rotate(angle[, origin, use_radians])Returns a
GeoSerieswith rotated geometries.round([decimals])Round a DataFrame to a variable number of decimal places.
rpow(other[, axis, level, fill_value])rsub(other[, axis, level, fill_value])rtruediv(other[, axis, level, fill_value])sample([n, frac, replace, random_state])Random sample of items
scale([xfact, yfact, zfact, origin])Returns a
GeoSerieswith scaled geometries.select_dtypes([include, exclude])Return a subset of the DataFrame's columns based on the column dtypes.
sem([axis, skipna, ddof, split_every, ...])Return unbiased standard error of the mean over requested axis.
set_crs(value[, allow_override])Set the value of the crs on a new object
set_geometry(col)Set the GeoDataFrame geometry using either an existing column or the specified input.
set_index(*args, **kwargs)Set the DataFrame index (row labels) using an existing column.
shift([periods, freq, axis])Shift index by desired number of periods with an optional time freq.
shuffle([on, ignore_index, npartitions, ...])Rearrange DataFrame into new partitions
simplify(*args, **kwargs)Returns a
GeoSeriescontaining a simplified representation of each geometry.sjoin(df[, how, predicate])Spatial join of two GeoDataFrames.
skew([xs, ys, origin, use_radians])Returns a
GeoSerieswith skewed geometries.sort_values(by[, npartitions, ascending, ...])Sort the dataset by a single column.
spatial_shuffle([by, level, ...])Shuffle the data into spatially consistent partitions.
squeeze([axis])Squeeze 1 dimensional axis objects into scalars.
std([axis, skipna, ddof, numeric_only, ...])Return sample standard deviation over requested axis.
sub(other[, axis, level, fill_value])sum([axis, skipna, numeric_only, min_count, ...])Return the sum of the values over the requested axis.
symmetric_difference(other, *args, **kwargs)Returns a
GeoSeriesof the symmetric difference of points in each aligned geometry with other.tail([n, compute])Last n rows of the dataset
to_backend([backend])Move to a new DataFrame backend
to_bag([index, format])Create a Dask Bag from a Series
to_crs([crs, epsg])Returns a
GeoSerieswith all geometries transformed to a new coordinate reference system.to_csv(filename, **kwargs)See dd.to_csv docstring for more information
to_dask_array([lengths, meta, optimize])Convert a dask DataFrame to a dask array.
Create a dask.dataframe object from a dask_geopandas object
to_delayed([optimize_graph])Convert into a list of
dask.delayedobjects, one per partition.to_feather(path, *args, **kwargs)See dask_geopadandas.to_feather docstring for more information
to_hdf(path_or_buf, key[, mode, append])See dd.to_hdf docstring for more information
to_html([max_rows])Render a DataFrame as an HTML table.
to_json(filename, *args, **kwargs)See dd.to_json docstring for more information
to_orc(path, *args, **kwargs)See dd.to_orc docstring for more information
to_parquet(path, *args, **kwargs)See dask_geopadandas.to_parquet docstring for more information
to_records([index, lengths])to_sql(name, uri[, schema, if_exists, ...])to_string([max_rows])Render a DataFrame to a console-friendly tabular output.
to_timestamp([freq, how])Cast to DatetimeIndex of timestamps, at beginning of period.
to_wkb([hex])Encode all geometry columns in the GeoDataFrame to WKB.
to_wkt(**kwargs)Encode all geometry columns in the GeoDataFrame to WKT.
touches(other, *args, **kwargs)Returns a
Seriesofdtype('bool')with valueTruefor each aligned geometry that touches other.translate([xoff, yoff, zoff])Returns a
GeoSerieswith translated geometries.truediv(other[, axis, level, fill_value])union(other, *args, **kwargs)Returns a
GeoSeriesof the union of points in each aligned geometry with other.union_all()var([axis, skipna, ddof, numeric_only, ...])Return unbiased variance over requested axis.
visualize([tasks])Visualize the expression or task graph
where(cond[, other])Replace values where the condition is False.
within(other, *args, **kwargs)Returns a
Seriesofdtype('bool')with valueTruefor each aligned geometry that is within other.Attributes
areaReturns a
Seriescontaining the area of each geometry in theGeoSeriesexpressed in the units of the CRS.axesboundaryReturns a
GeoSeriesof lower dimensional objects representing each geometry's set-theoretic boundary.boundsReturns a
DataFramewith columnsminx,miny,maxx,maxyvalues containing the bounds for each geometry.centroidReturns a
GeoSeriesof points representing the centroid of each geometry.columnsconvex_hullReturns a
GeoSeriesof geometries representing the convex hull of each geometry.The Coordinate Reference System (CRS) represented as a
pyproj.CRSobject.Coordinate based indexer to select by intersection with bounding box.
daskdivisionsTuple of
npartitions + 1values, in ascending order, marking the lower/upper bounds of each partition's index.dtypesReturn data types
emptyenvelopeReturns a
GeoSeriesof geometries representing the envelope of each geometry.exprexteriorReturns a
GeoSeriesof LinearRings representing the outer boundary of each polygon in the GeoSeries.geom_typeReturns a
Seriesof strings specifying the Geometry Type of each object.geometryhas_zReturns a
Seriesofdtype('bool')with valueTruefor features that have a z-component.ilocPurely integer-location based indexing for selection by position.
indexReturn dask Index instance
interiorsReturns a
Seriesof List representing the inner rings of each polygon in the GeoSeries.is_emptyReturns a
Seriesofdtype('bool')with valueTruefor empty geometries.is_ringReturns a
Seriesofdtype('bool')with valueTruefor features that are closed.is_simpleReturns a
Seriesofdtype('bool')with valueTruefor geometries that do not cross themselves.is_validReturns a
Seriesofdtype('bool')with valueTruefor geometries that are valid.known_divisionsWhether the divisions are known.
lengthReturns a
Seriescontaining the length of each geometry expressed in the units of the CRS.locPurely label-location based indexer for selection by label.
nbytesndimReturn dimensionality
npartitionsReturn number of partitions
partitionsSlice dataframe by partitions
shapesindexNeed to figure out how to concatenate spatial indexes
sizeSize of the Series or DataFrame as a Delayed object.
spatial_partitionsThe spatial extent of each of the partitions of the dask GeoDataFrame.
total_boundsReturns a tuple containing
minx,miny,maxx,maxyvalues for the bounds of the series as a whole.typeReturn the geometry type of each geometry in the GeoSeries
unary_unionReturns a geometry containing the union of all geometries in the
GeoSeries.valuesReturn a dask.array of the values of this dataframe