Data
Data¶
The top-level component data
contains an array of data sets in struct format. Each data set needs to contain the components type
and name
. Other components are dependent on the type of data set as demonstrated below:
name
: custom stringtype
: string that determines the format of the observations...
: each type of observations has different parameter keys. Some of these are optional and marked accordingly in the more detailed description below
A detailed description of the different types with examples can be found below. While data, in most settings, has no uncertainty attached to it, this standard does allow to provide data with uncertainty, as there are some settings in which uncorrelated or correlated errors need to be taken into account. These include cases such as
- generated data obtained from importance sampling, including a (potentially Gaussian) uncertainty from the frequency weights
- unfolded data, resulting from arbitrarily complex transformation functions involving statistical models folding some degree of uncertainty into the data points themselves
While it should always be preferred to publish "raw" data, allowing to include pre-processed data with corresponding uncertainties expands the possible applications considerably.
Point Data¶
Point data describes a measurement of a single number, with a possible uncertainty (error).
name
: custom stringtype
:point
value
: value of this data pointuncertainty
: (optional) uncertainty of this data point
"data":[
{
"name":"data1",
"type":"point",
"value":0.,
"uncertainty":1.
}
]
Unbinned Data¶
Unbinned data describes a measurement of multiple data points in a possibly multi-dimensional space of variables. These data points can be weighted.
name
: custom stringtype
:unbinned
entries
: array of arrays containing the coordinates/entries of the dataaxes
: array of structs representing the axes. Each struct must have the componentsname
as well asmax
andmin
.weights
: (optional) array of values containing the weights of the individual data points, to be used for \(\chi^2\) comparisons and fits. If this component is not given, weight 1 is assumed for all data points. If given, the array needs to be of the same length asentries
.entries_uncertainties
: (optional) array of arrays containing the errors/uncertainties of each entry. If given, the array needs to be of the same shape asentries
.Example: Unbinned Data"data":[ { "name":"data1", "type":"unbinned", "weights":[ 9.0, 18.4 ], "entries":[ [1,3], [2,9] ], "entries_uncertainties":[ [0.3], [0.6] ], "axes":[ { "name":"variable1", "min":1, "max":3 }, { "name":"variable2", "min":-10, "max":10 }, ... ] }, ... ]
Binned Data¶
Binned data describes a histogram of data points with bin contents in a possibly multi-dimensional space of variables. Whether entries that fall precisely on the bin boundaries are sorted into the smaller or larger bin is under the discretion of the creator of the model and thus not defined.
name
: custom stringtype
:binned
contents
: array of values representing the contents of the binned data setaxes
: array of structs representing the axes. Each struct must have the componentname
. Further, it must specify the binning through one of these two options:- regular binnings are specified through the components
max
,min
andnbins
- potentially irregular binnings are specified through the component
edges
, which contains an array of length \(n+1\), where the first and last entries denote the minimum and and maximum of the variable, and all entries between denote the intermediate bin boundaries.
- regular binnings are specified through the components
uncertainty
: (optional) struct representing the uncertainty of the contents. It consists of up to three components:type
: denoting the kind of uncertainty, for now only Gaussian distributed uncertainties denoted asgaussian_uncertainty
are supportedsigma
: array of the standard deviation of the entries incontents
. Needs to be of the same length ascontents
correlation
: (optional) array of arrays denoting the correlation between the contents in matrix format. Must be of dimension length ofcontents
\(\times\) length ofcontents
. It can also be set to 0 to indicate no correlation.
"data":[
{
"name":"data2",
"type":"binned",
"contents":[ 9.0, 18.4 ],
"axes":[ { "name":"variable1", "nbins":2, "min":1, "max":3 } ]
},
{
"name":"asimov_data2",
"type":"binned",
"contents":[ 9.0, 18.4, 13, 0. ],
"axes":[
{ "name":"variable1", "nbins":2, "min":1, "max":3 },
{ "name":"variable2", "edges"[0,10,100] }
]
},
...
]
This type can also be used to store pre-processed data utilizing the uncertainty
component
"data":[
{
"name":"data4",
"type":"binned",
"contents":[ 9.0, 18.4 ],
"uncertainty" : {
"type": "gaussian_uncertainty",
"correlation" : 0,
"sigma" : [ 3, 4 ]
},
"axes":[
{ "name":"variable1", "nbins":2, "min":1, "max":3 },
...
]
},
...
]