dask_histogram.histogram#
- dask_histogram.histogram(x, bins=10, range=None, normed=None, weights=None, density=False, *, histogram=None, storage=Double(), threads=None, split_every=None)[source]#
Histogram Dask data in one dimension.
- Parameters:
x (dask.array.Array or dask.dataframe.Series) – Data to be histogrammed.
bins (int or sequence of scalars.) – If bins is an int, it defines the total number of bins to be used (this requires the range argument to be defined). If bins is a sequence of scalars (e.g. an array) then it defines the bin edges.
range ((float, float)) – The minimum and maximum of the histogram axis.
normed (bool, optional) – An unsupported argument that has been deprecated in the NumPy API (preserved to maintain calls dependent on argument order).
weights (dask.array.Array or dask.dataframe.Series, optional) – An array of values weighing each sample in the input data. The chunks of the weights must be identical to the chunking along the 0th (row) axis of the data sample.
density (bool) – If
False(default), the returned array represents the number of samples in each bin. IfTrue, the returned array represents the probability density function at each bin.histogram (Any, optional) – If not
None, a collection instance is returned instead of the array style return.storage (boost_histogram.storage.Storage) – Define the storage used by the
Histogramobject.threads (int, optional) – Ignored argument kept for compatibility with boost-histogram. We let Dask have complete control over threads.
- Returns:
The default return is the style of
dask.array.histogram(): An array of bin contents and an array of bin edges. If the histogram argument is used then the return is adask_histogram.AggHistogramcollection instance.- Return type:
See also
Examples
Gaussian distribution with object return style and
Weightstorage:>>> import dask_histogram as dh >>> import dask.array as da >>> import boost_histogram as bh >>> x = da.random.standard_normal(size=(1000,), chunks=(250,)) >>> h = dh.histogram( ... x, bins=10, range=(-3, 3), histogram=True, storage=bh.storage.Weight() ... )
Now with variable width bins and the array return style:
>>> bins = [-3, -2.2, -1.0, -0.2, 0.2, 1.2, 2.2, 3.2] >>> h, edges = dh.histogram(x, bins=bins)
Now with weights and the object return style:
>>> w = da.random.uniform(0.0, 1.0, size=x.shape[0], chunks=x.chunksize[0]) >>> h = dh.histogram(x, bins=bins, weights=w, histogram=True) >>> h dask_histogram.AggHistogram<histreduce-agg, ndim=1, storage=Double()>