dask_histogram.histogram2d
dask_histogram.histogram2d¶
- dask_histogram.histogram2d(x, y, bins=10, range=None, normed=None, weights=None, density=False, *, histogram=None, storage=Double(), threads=None, split_every=None)[source]¶
Histogram Dask data in two dimensions.
- Parameters
x (dask.array.Array or dask.dataframe.Series) – Array representing the x coordinates of the data to the histogrammed.
y (dask.array.Array or dask.dataframe.Series) – Array representing the y coordinates of the data to the histogrammed.
bins (int, (int, int), array, (array, array), optional) –
The bin specification:
If a singe int, both dimensions will that that number of bins
If a pair of ints, the first int is the total number of bins along the x-axis, and the second is the total number of bins along the y-axis.
If a single array, the array represents the bin edges along each dimension.
If a pair of arrays, the first array corresponds to the edges along x-axis, the second corresponds to the edges along the y-axis.
range (((float, float), (float, float)), optional) – If integers are passed to the bins argument, range is required to define the min and max of each axis, that is: ((xmin, xmax), (ymin, ymax)).
normed (bool, optional) – An unsupported argument that has been deprecated in the NumPy API (preserved to maintain calls dependent on argument order).
weights (dask.array.Array or dask.dataframe.Series, optional) – An array of values weighing each sample in the input data. The chunks of the weights must be identical to the chunking along the 0th (row) axis of the data sample.
density (bool) – If
False
(default), the returned array represents the number of samples in each bin. IfTrue
, the returned array represents the probability density function at each bin.histogram (Any, optional) – If not
None
, a collection instance is returned instead of the array style return.storage (boost_histogram.storage.Storage) – Define the storage used by the
Histogram
object.threads (int, optional) – Ignored argument kept for compatibility with boost-histogram. We let Dask have complete control over threads.
- Returns
The default return is the style of
dask.array.histogram2d()
: An array of bin contents, an array of the x-edges, and an array of the y-edges. If the histogram argument is used then the return is adask_histogram.AggHistogram
collection instance.- Return type
tuple(dask.array.Array, dask.array.Array, dask.array.Array) or AggHistogram
See also
Examples
Uniform distributions along each dimension with the array return style:
>>> import dask_histogram as dh >>> import dask.array as da >>> x = da.random.uniform(0.0, 1.0, size=(1000,), chunks=200) >>> y = da.random.uniform(0.4, 0.6, size=(1000,), chunks=200) >>> h, edgesx, edgesy = dh.histogram2d(x, y, bins=(12, 4), range=((0, 1), (0.4, 0.6)))
Now with the collection object return style:
>>> h = dh.histogram2d( ... x, y, bins=(12, 4), range=((0, 1), (0.4, 0.6)), histogram=True ... ) >>> type(h) <class 'dask_histogram.core.AggHistogram'>
With variable bins and sample weights from a
dask.dataframe.Series
originating from adask.dataframe.DataFrame
column (df below must have npartitions equal to the size of the chunks in x and y):>>> x = da.random.uniform(0.0, 1.0, size=(1000,), chunks=200) >>> y = da.random.uniform(0.4, 0.6, size=(1000,), chunks=200) >>> df = dask_dataframe_factory() >>> w = df["weights"] >>> binsx = [0.0, 0.2, 0.6, 0.8, 1.0] >>> binsy = [0.40, 0.45, 0.50, 0.55, 0.60] >>> h, edges1, edges2 = dh.histogram2d( ... x, y, bins=[binsx, binsy], weights=w ... )