dask_histogram.histogram2d

dask_histogram.histogram2d¶

dask_histogram.histogram2d(x, y, bins=10, range=None, normed=None, weights=None, density=False, *, histogram=None, storage=Double(), threads=None, split_every=None)[source]¶

Histogram Dask data in two dimensions.

Parameters

x (dask.array.Array or dask.dataframe.Series) – Array representing the x coordinates of the data to the histogrammed.
y (dask.array.Array or dask.dataframe.Series) – Array representing the y coordinates of the data to the histogrammed.
bins (int, (int, int), array, (array, array), optional) –
The bin specification:
- If a singe int, both dimensions will that that number of bins
- If a pair of ints, the first int is the total number of bins along the x-axis, and the second is the total number of bins along the y-axis.
- If a single array, the array represents the bin edges along each dimension.
- If a pair of arrays, the first array corresponds to the edges along x-axis, the second corresponds to the edges along the y-axis.
range (((float, float), (float, float)), optional) – If integers are passed to the bins argument, range is required to define the min and max of each axis, that is: ((xmin, xmax), (ymin, ymax)).
normed (bool, optional) – An unsupported argument that has been deprecated in the NumPy API (preserved to maintain calls dependent on argument order).
weights (dask.array.Array or dask.dataframe.Series, optional) – An array of values weighing each sample in the input data. The chunks of the weights must be identical to the chunking along the 0th (row) axis of the data sample.
density (bool) – If False (default), the returned array represents the number of samples in each bin. If True, the returned array represents the probability density function at each bin.
histogram (Any, optional) – If not None, a collection instance is returned instead of the array style return.
storage (boost_histogram.storage.Storage) – Define the storage used by the Histogram object.
threads (int, optional) – Ignored argument kept for compatibility with boost-histogram. We let Dask have complete control over threads.

Returns

The default return is the style of dask.array.histogram2d(): An array of bin contents, an array of the x-edges, and an array of the y-edges. If the histogram argument is used then the return is a dask_histogram.AggHistogram collection instance.

Return type

tuple(dask.array.Array, dask.array.Array, dask.array.Array) or AggHistogram

See also

histogram, histogramdd

Examples

Uniform distributions along each dimension with the array return style:

>>> import dask_histogram as dh
>>> import dask.array as da
>>> x = da.random.uniform(0.0, 1.0, size=(1000,), chunks=200)
>>> y = da.random.uniform(0.4, 0.6, size=(1000,), chunks=200)
>>> h, edgesx, edgesy = dh.histogram2d(x, y, bins=(12, 4), range=((0, 1), (0.4, 0.6)))

Now with the collection object return style:

>>> h = dh.histogram2d(
...     x, y, bins=(12, 4), range=((0, 1), (0.4, 0.6)), histogram=True
... )
>>> type(h)
<class 'dask_histogram.core.AggHistogram'>

With variable bins and sample weights from a dask.dataframe.Series originating from a dask.dataframe.DataFrame column (df below must have npartitions equal to the size of the chunks in x and y):

>>> x = da.random.uniform(0.0, 1.0, size=(1000,), chunks=200)
>>> y = da.random.uniform(0.4, 0.6, size=(1000,), chunks=200)
>>> df = dask_dataframe_factory()  
>>> w = df["weights"]              
>>> binsx = [0.0, 0.2, 0.6, 0.8, 1.0]
>>> binsy = [0.40, 0.45, 0.50, 0.55, 0.60]
>>> h, edges1, edges2 = dh.histogram2d(
...     x, y, bins=[binsx, binsy], weights=w
... ) 

dask_histogram.histogram

dask_histogram.histogramdd

dask-histogram 2023.10.0 documentation

dask_histogram.histogram2d

dask_histogram.histogram2d¶