Skip to content

Data

Data

The top-level component data contains an array of data sets in struct format. Each data set needs to contain the components type and name. Other components are dependent on the type of data set as demonstrated below:

  • name: custom string
  • type: string that determines the format of the observations
  • ...: each type of observations has different parameter keys. Some of these are optional and marked accordingly in the more detailed description below

A detailed description of the different types with examples can be found below. While data, in most settings, has no uncertainty attached to it, this standard does allow to provide data with uncertainty, as there are some settings in which uncorrelated or correlated errors need to be taken into account. These include cases such as

  • generated data obtained from importance sampling, including a (potentially Gaussian) uncertainty from the frequency weights
  • unfolded data, resulting from arbitrarily complex transformation functions involving statistical models folding some degree of uncertainty into the data points themselves

While it should always be preferred to publish "raw" data, allowing to include pre-processed data with corresponding uncertainties expands the possible applications considerably.

Point Data

Point data describes a measurement of a single number, with a possible uncertainty (error).

  • name: custom string
  • type: point
  • value: value of this data point
  • uncertainty: (optional) uncertainty of this data point
Example: Point Data
"data":[ 
    { 
        "name":"data1", 
        "type":"point", 
        "value":0., 
        "uncertainty":1. 
    } 
]

Unbinned Data

Unbinned data describes a measurement of multiple data points in a possibly multi-dimensional space of variables. These data points can be weighted.

  • name: custom string
  • type: unbinned
  • entries: array of arrays containing the coordinates/entries of the data
  • axes: array of structs representing the axes. Each struct must have the components name as well as max and min.
  • weights: (optional) array of values containing the weights of the individual data points, to be used for \(\chi^2\) comparisons and fits. If this component is not given, weight 1 is assumed for all data points. If given, the array needs to be of the same length as entries.
  • entries_uncertainties: (optional) array of arrays containing the errors/uncertainties of each entry. If given, the array needs to be of the same shape as entries.
    Example: Unbinned Data
    "data":[ 
      { 
        "name":"data1", 
        "type":"unbinned", 
        "weights":[ 9.0, 18.4 ], 
        "entries":[ [1,3], [2,9] ], 
        "entries_uncertainties":[ [0.3], [0.6] ], 
        "axes":[ 
          { "name":"variable1", "min":1, "max":3 }, 
          { "name":"variable2", "min":-10, "max":10 }, 
          ... 
        ] 
      }, 
      ... 
    ]
    

Binned Data

Binned data describes a histogram of data points with bin contents in a possibly multi-dimensional space of variables. Whether entries that fall precisely on the bin boundaries are sorted into the smaller or larger bin is under the discretion of the creator of the model and thus not defined.

  • name: custom string
  • type: binned
  • contents: array of values representing the contents of the binned data set
  • axes: array of structs representing the axes. Each struct must have the component name. Further, it must specify the binning through one of these two options:
    1. regular binnings are specified through the components max, min and nbins
    2. potentially irregular binnings are specified through the component edges, which contains an array of length \(n+1\), where the first and last entries denote the minimum and and maximum of the variable, and all entries between denote the intermediate bin boundaries.
  • uncertainty: (optional) struct representing the uncertainty of the contents. It consists of up to three components:
    • type: denoting the kind of uncertainty, for now only Gaussian distributed uncertainties denoted as gaussian_uncertainty are supported
    • sigma: array of the standard deviation of the entries in contents. Needs to be of the same length as contents
    • correlation: (optional) array of arrays denoting the correlation between the contents in matrix format. Must be of dimension length of contents \(\times\) length of contents. It can also be set to 0 to indicate no correlation.
Example: Binned Data
"data":[ 
  { 
    "name":"data2", 
    "type":"binned", 
    "contents":[ 9.0, 18.4 ], 
    "axes":[ { "name":"variable1", "nbins":2, "min":1, "max":3 } ] 
  }, 
  { 
    "name":"asimov_data2",
    "type":"binned", 
    "contents":[ 9.0, 18.4, 13, 0. ], 
    "axes":[ 
      { "name":"variable1", "nbins":2, "min":1, "max":3 }, 
      { "name":"variable2", "edges"[0,10,100] } 
    ] 
  }, 
    ... 
  ]

This type can also be used to store pre-processed data utilizing the uncertainty component

Example: Pre-processed binned Data
"data":[ 
  { 
    "name":"data4", 
    "type":"binned", 
    "contents":[ 9.0, 18.4 ], 
    "uncertainty" : { 
      "type": "gaussian_uncertainty", 
      "correlation" : 0, 
      "sigma" : [ 3, 4 ]
     }, 
    "axes":[ 
      { "name":"variable1", "nbins":2, "min":1, "max":3 }, 
      ... 
    ] 
  }, 
  ... 
]