Data

Data¶

The top-level component data contains an array of data sets in struct format. Each data set needs to contain the components type and name. Other components are dependent on the type of data set as demonstrated below:

name: custom string
type: string that determines the format of the observations
...: each type of observations has different parameter keys. Some of these are optional and marked accordingly in the more detailed description below

A detailed description of the different types with examples can be found below. While data, in most settings, has no uncertainty attached to it, this standard does allow to provide data with uncertainty, as there are some settings in which uncorrelated or correlated errors need to be taken into account. These include cases such as

generated data obtained from importance sampling, including a (potentially Gaussian) uncertainty from the frequency weights
unfolded data, resulting from arbitrarily complex transformation functions involving statistical models folding some degree of uncertainty into the data points themselves

While it should always be preferred to publish "raw" data, allowing to include pre-processed data with corresponding uncertainties expands the possible applications considerably.

Point Data¶

Point data describes a measurement of a single number, with a possible uncertainty (error).

name: custom string
type: point
value: value of this data point
uncertainty: (optional) uncertainty of this data point

Example: Point Data

"data":[ 
    { 
        "name":"data1", 
        "type":"point", 
        "value":0., 
        "uncertainty":1. 
    } 
]

Unbinned Data¶

Unbinned data describes a measurement of multiple data points in a possibly multi-dimensional space of variables. These data points can be weighted.

name: custom string
type: unbinned
entries: array of arrays containing the coordinates/entries of the data
axes: array of structs representing the axes. Each struct must have the components name as well as max and min.
weights: (optional) array of values containing the weights of the individual data points, to be used for \(\chi^2\) comparisons and fits. If this component is not given, weight 1 is assumed for all data points. If given, the array needs to be of the same length as entries.

entries_uncertainties: (optional) array of arrays containing the errors/uncertainties of each entry. If given, the array needs to be of the same shape as entries.

Example: Unbinned Data

"data":[ 
  { 
    "name":"data1", 
    "type":"unbinned", 
    "weights":[ 9.0, 18.4 ], 
    "entries":[ [1,3], [2,9] ], 
    "entries_uncertainties":[ [0.3], [0.6] ], 
    "axes":[ 
      { "name":"variable1", "min":1, "max":3 }, 
      { "name":"variable2", "min":-10, "max":10 }, 
      ... 
    ] 
  }, 
  ... 
]

Binned Data¶

Binned data describes a histogram of data points with bin contents in a possibly multi-dimensional space of variables. Whether entries that fall precisely on the bin boundaries are sorted into the smaller or larger bin is under the discretion of the creator of the model and thus not defined.

name: custom string
type: binned
contents: array of values representing the contents of the binned data set
axes: array of structs representing the axes. Each struct must have the component name. Further, it must specify the binning through one of these two options:
1. regular binnings are specified through the components max, min and nbins
2. potentially irregular binnings are specified through the component edges, which contains an array of length \(n+1\), where the first and last entries denote the minimum and and maximum of the variable, and all entries between denote the intermediate bin boundaries.
uncertainty: (optional) struct representing the uncertainty of the contents. It consists of up to three components:
- type: denoting the kind of uncertainty, for now only Gaussian distributed uncertainties denoted as gaussian_uncertainty are supported
- sigma: array of the standard deviation of the entries in contents. Needs to be of the same length as contents
- correlation: (optional) array of arrays denoting the correlation between the contents in matrix format. Must be of dimension length of contents \(\times\) length of contents. It can also be set to 0 to indicate no correlation.

Example: Binned Data

"data":[ 
  { 
    "name":"data2", 
    "type":"binned", 
    "contents":[ 9.0, 18.4 ], 
    "axes":[ { "name":"variable1", "nbins":2, "min":1, "max":3 } ] 
  }, 
  { 
    "name":"asimov_data2",
    "type":"binned", 
    "contents":[ 9.0, 18.4, 13, 0. ], 
    "axes":[ 
      { "name":"variable1", "nbins":2, "min":1, "max":3 }, 
      { "name":"variable2", "edges"[0,10,100] } 
    ] 
  }, 
    ... 
  ]

This type can also be used to store pre-processed data utilizing the uncertainty component

Example: Pre-processed binned Data

"data":[ 
  { 
    "name":"data4", 
    "type":"binned", 
    "contents":[ 9.0, 18.4 ], 
    "uncertainty" : { 
      "type": "gaussian_uncertainty", 
      "correlation" : 0, 
      "sigma" : [ 3, 4 ]
     }, 
    "axes":[ 
      { "name":"variable1", "nbins":2, "min":1, "max":3 }, 
      ... 
    ] 
  }, 
  ... 
]