dask_histogram.histogram
dask_histogram.histogram¶
- dask_histogram.histogram(x, bins=10, range=None, normed=None, weights=None, density=False, *, histogram=None, storage=Double(), threads=None, split_every=None)[source]¶
Histogram Dask data in one dimension.
- Parameters
x (dask.array.Array or dask.dataframe.Series) – Data to be histogrammed.
bins (int or sequence of scalars.) – If bins is an int, it defines the total number of bins to be used (this requires the range argument to be defined). If bins is a sequence of scalars (e.g. an array) then it defines the bin edges.
range ((float, float)) – The minimum and maximum of the histogram axis.
normed (bool, optional) – An unsupported argument that has been deprecated in the NumPy API (preserved to maintain calls dependent on argument order).
weights (dask.array.Array or dask.dataframe.Series, optional) – An array of values weighing each sample in the input data. The chunks of the weights must be identical to the chunking along the 0th (row) axis of the data sample.
density (bool) – If
False
(default), the returned array represents the number of samples in each bin. IfTrue
, the returned array represents the probability density function at each bin.histogram (Any, optional) – If not
None
, a collection instance is returned instead of the array style return.storage (boost_histogram.storage.Storage) – Define the storage used by the
Histogram
object.threads (int, optional) – Ignored argument kept for compatibility with boost-histogram. We let Dask have complete control over threads.
- Returns
The default return is the style of
dask.array.histogram()
: An array of bin contents and an array of bin edges. If the histogram argument is used then the return is adask_histogram.AggHistogram
collection instance.- Return type
See also
Examples
Gaussian distribution with object return style and
Weight
storage:>>> import dask_histogram as dh >>> import dask.array as da >>> import boost_histogram as bh >>> x = da.random.standard_normal(size=(1000,), chunks=(250,)) >>> h = dh.histogram( ... x, bins=10, range=(-3, 3), histogram=True, storage=bh.storage.Weight() ... )
Now with variable width bins and the array return style:
>>> bins = [-3, -2.2, -1.0, -0.2, 0.2, 1.2, 2.2, 3.2] >>> h, edges = dh.histogram(x, bins=bins)
Now with weights and the object return style:
>>> w = da.random.uniform(0.0, 1.0, size=x.shape[0], chunks=x.chunksize[0]) >>> h = dh.histogram(x, bins=bins, weights=w, histogram=True) >>> h dask_histogram.AggHistogram<histreduce-agg, ndim=1, storage=Double()>