Project

General

Profile

First meeting summary (31/05/2023)

Datas to be saved (i.e. registered into the database and stored into the main repository):

  • Simulations
    1. raw output from simulators (tar+gzip of directory)
    2. Grand root files after convertion
      The intermediate raw root files will be kept during a first stage for testing but will be removed when all the chain will be validated.
      Models used to run simulations are part of simulators and do not need to be saved.
  • Detectors (experiment)
    1. Row data as bin files
    2. Grand root files after convertion by gtot
  • Models
    1. Antennas response model (npy files)
      Presently it seems that all extra informations about experiment is already stored into the raw root files. Some extra monitoring (electromagnetic environement, atmosphere, ...) should be available at later stage (but do not exists now) and may be saved in another database dedicated to the monitoring. Some data quality information may be stored at some stage (created during analysis) but not yet determined.

Grand root files structure

Simulation data will contains at least : Trun, Trunefieldsim, Tshower and Tshowersim
Experiment data will contains at least : Trun, Tadc, Trawvoltage
Datas will be stored in directories.
One directory will correspond to a observation run or to a simulation.
Each directory will contains one trun file describing the run parameters and some additional root files for the events.
To limit the size of the files, files containing trees with traces may be be splitted (e.g. events 1-1000 in file 1, events 1001 to 2000 in file 2 etc...).
Some trees should be grouped into the same file (e.g. Trun, TrunShowersim, Trunefieldsim for simulations).

Naming conventions

  • Directories name will match the following structure :
    [sim|exp]_[site]_[date]_[extra]_[serial]
    e.g.
    for experimental data : exp_nancay_20230531__1
    for simulation : sim_gp300_20230420_zhairesml_2
  • Files inside the directory will match the following structure:
    [grouptreename]_[events]_L[analysis level]_[serial]
    e.g.
    run.root for initial run trees
    shower_1-100_L0_0001.root for shower trees of event 1 to 100 at level 0.

Rules

Once created and registered a file is no longer modified. New analysis generate new files.

Remarks

Serial numbers can be determined only at the stage of registering the directory into the official repository !! This should be a problem if someone wants to create several files with the same parameters on his computer (that's why I was thinking initially about hash because it's almost universal).
For date do we use YYYYMMDD or something more accurate like YYYYMMDDhhmnss (this should potentially solve partially the problem mentioned before) ?
For directory naming, I suggests that if there is no extra info (production simulation lib for simulations) then we still keep the structure and use double _ to facilitate search by regex (not sure if this will be useful but it cost nothing).