dask_histogram.histogram

dask_histogram.histogram

dask_histogram.histogram(x, bins=10, range=None, normed=None, weights=None, density=False, *, histogram=None, storage=Double(), threads=None, split_every=None)[source]

Histogram Dask data in one dimension.

Parameters
  • x (dask.array.Array or dask.dataframe.Series) – Data to be histogrammed.

  • bins (int or sequence of scalars.) – If bins is an int, it defines the total number of bins to be used (this requires the range argument to be defined). If bins is a sequence of scalars (e.g. an array) then it defines the bin edges.

  • range ((float, float)) – The minimum and maximum of the histogram axis.

  • normed (bool, optional) – An unsupported argument that has been deprecated in the NumPy API (preserved to maintain calls dependent on argument order).

  • weights (dask.array.Array or dask.dataframe.Series, optional) – An array of values weighing each sample in the input data. The chunks of the weights must be identical to the chunking along the 0th (row) axis of the data sample.

  • density (bool) – If False (default), the returned array represents the number of samples in each bin. If True, the returned array represents the probability density function at each bin.

  • histogram (Any, optional) – If not None, a collection instance is returned instead of the array style return.

  • storage (boost_histogram.storage.Storage) – Define the storage used by the Histogram object.

  • threads (int, optional) – Ignored argument kept for compatibility with boost-histogram. We let Dask have complete control over threads.

Returns

The default return is the style of dask.array.histogram(): An array of bin contents and an array of bin edges. If the histogram argument is used then the return is a dask_histogram.AggHistogram collection instance.

Return type

tuple(dask.array.Array, dask.array.Array) or AggHistogram

Examples

Gaussian distribution with object return style and Weight storage:

>>> import dask_histogram as dh
>>> import dask.array as da
>>> import boost_histogram as bh
>>> x = da.random.standard_normal(size=(1000,), chunks=(250,))
>>> h = dh.histogram(
...     x, bins=10, range=(-3, 3), histogram=True, storage=bh.storage.Weight()
... )

Now with variable width bins and the array return style:

>>> bins = [-3, -2.2, -1.0, -0.2, 0.2, 1.2, 2.2, 3.2]
>>> h, edges = dh.histogram(x, bins=bins)

Now with weights and the object return style:

>>> w = da.random.uniform(0.0, 1.0, size=x.shape[0], chunks=x.chunksize[0])
>>> h = dh.histogram(x, bins=bins, weights=w, histogram=True)
>>> h
dask_histogram.AggHistogram<histreduce-agg, ndim=1, storage=Double()>