dask_histogram.histogram2d

dask_histogram.histogram2d

dask_histogram.histogram2d(x, y, bins=10, range=None, normed=None, weights=None, density=False, *, histogram=None, storage=Double(), threads=None, split_every=None)[source]

Histogram Dask data in two dimensions.

Parameters
  • x (dask.array.Array or dask.dataframe.Series) – Array representing the x coordinates of the data to the histogrammed.

  • y (dask.array.Array or dask.dataframe.Series) – Array representing the y coordinates of the data to the histogrammed.

  • bins (int, (int, int), array, (array, array), optional) –

    The bin specification:

    • If a singe int, both dimensions will that that number of bins

    • If a pair of ints, the first int is the total number of bins along the x-axis, and the second is the total number of bins along the y-axis.

    • If a single array, the array represents the bin edges along each dimension.

    • If a pair of arrays, the first array corresponds to the edges along x-axis, the second corresponds to the edges along the y-axis.

  • range (((float, float), (float, float)), optional) – If integers are passed to the bins argument, range is required to define the min and max of each axis, that is: ((xmin, xmax), (ymin, ymax)).

  • normed (bool, optional) – An unsupported argument that has been deprecated in the NumPy API (preserved to maintain calls dependent on argument order).

  • weights (dask.array.Array or dask.dataframe.Series, optional) – An array of values weighing each sample in the input data. The chunks of the weights must be identical to the chunking along the 0th (row) axis of the data sample.

  • density (bool) – If False (default), the returned array represents the number of samples in each bin. If True, the returned array represents the probability density function at each bin.

  • histogram (Any, optional) – If not None, a collection instance is returned instead of the array style return.

  • storage (boost_histogram.storage.Storage) – Define the storage used by the Histogram object.

  • threads (int, optional) – Ignored argument kept for compatibility with boost-histogram. We let Dask have complete control over threads.

Returns

The default return is the style of dask.array.histogram2d(): An array of bin contents, an array of the x-edges, and an array of the y-edges. If the histogram argument is used then the return is a dask_histogram.AggHistogram collection instance.

Return type

tuple(dask.array.Array, dask.array.Array, dask.array.Array) or AggHistogram

Examples

Uniform distributions along each dimension with the array return style:

>>> import dask_histogram as dh
>>> import dask.array as da
>>> x = da.random.uniform(0.0, 1.0, size=(1000,), chunks=200)
>>> y = da.random.uniform(0.4, 0.6, size=(1000,), chunks=200)
>>> h, edgesx, edgesy = dh.histogram2d(x, y, bins=(12, 4), range=((0, 1), (0.4, 0.6)))

Now with the collection object return style:

>>> h = dh.histogram2d(
...     x, y, bins=(12, 4), range=((0, 1), (0.4, 0.6)), histogram=True
... )
>>> type(h)
<class 'dask_histogram.core.AggHistogram'>

With variable bins and sample weights from a dask.dataframe.Series originating from a dask.dataframe.DataFrame column (df below must have npartitions equal to the size of the chunks in x and y):

>>> x = da.random.uniform(0.0, 1.0, size=(1000,), chunks=200)
>>> y = da.random.uniform(0.4, 0.6, size=(1000,), chunks=200)
>>> df = dask_dataframe_factory()  
>>> w = df["weights"]              
>>> binsx = [0.0, 0.2, 0.6, 0.8, 1.0]
>>> binsy = [0.40, 0.45, 0.50, 0.55, 0.60]
>>> h, edges1, edges2 = dh.histogram2d(
...     x, y, bins=[binsx, binsy], weights=w
... )