Quickstart

This quickstart assumes fundamental knowledge about Pydantic, HDF5 files and the H5py library.

A convention to follow is to create your h5pydantic models in a model.py file.

The motivation for h5pydantic was a Synchrotron use case, so this quickstart will use a greatly simplified Synchrotron use case.

Specifying the Model

To get started, some imports:

from h5pydantic import H5Dataset, H5Group, H5Int64

Now, lets define a baseline measurement of our beamline:

class Baseline(H5Group):
    temperature: float
    humidity: float

Attributes of atomic types are stored as HDF5 attributes.

Next, lets have two baseline measurements:

class Metadata(H5Group):
    start: Baseline
    end: Baseline

Now, lets take some experimental measurements:

class Acquisition(H5Dataset, shape=(3,5), dtype=H5Int64):
    beamstop: H5Int64

H5Datasets map directly to HDF5 datasets, which can have a lot of options, h5pydantic supports these through extra arguments passed to the class. We’ve added a per acquisition metadata “beamstop” to the acquisition.

We now have all the bits and pieces to create our entire experiment:

class Experiment(H5Group):
    metadata: Metadata
    data: list[Acquisition] = []

which introduces our first container type, a list of Acquisitions; which gets mapped to HDF5 groups indexed by number e.g. /data/0, data/1 etc.

Using the Model

Now, lets use the model. In a real experiment the data would come from your beamline, for this example we’ll just use example values.

from model import Experiment, Acquisition, Baseline, Metadata

import numpy as np
from pathlib import Path

experiment = Experiment(data=[Acquisition(beamstop=11), Acquisition(beamstop=12)],
                        metadata=Metadata(start=Baseline(temperature=25.0, humidity=0.4),
                                          end=Baseline(temperature=26.0, humidity=0.4)))

Now, we’re ready to dump this experiment to a file, there’s a lot going on in this snippet. We begin by creating a h5pydantic.H5Group.dumper() context manager, this will open the output file experiment.pdf at the start of the context block, users can then write to the Datasets using the h5py array assignment, at the end of the block h5pydantic will close the output file.

with experiment.dumper(Path("experiment.hdf")):
    experiment.data[0][()] = np.random.randint(255, size=(3, 5))
    experiment.data[1][()] = np.random.randint(255, size=(3, 5))

Our example experiment hdf file is now created, an ascii form of it is as follows (the output of a call to h5dump):

HDF5 "experiment.hdf" {
GROUP "/" {
   GROUP "data" {
      DATASET "0" {
         DATATYPE  H5T_STD_I64LE
         DATASPACE  SIMPLE { ( 3, 5 ) / ( 3, 5 ) }
         DATA {
         (0,0): 73, 86, 221, 21, 119,
         (1,0): 213, 54, 196, 59, 139,
         (2,0): 1, 209, 76, 228, 145
         }
         ATTRIBUTE "beamstop" {
            DATATYPE  H5T_STD_I64LE
            DATASPACE  SCALAR
            DATA {
            (0): 11
            }
         }
      }
      DATASET "1" {
         DATATYPE  H5T_STD_I64LE
         DATASPACE  SIMPLE { ( 3, 5 ) / ( 3, 5 ) }
         DATA {
         (0,0): 240, 141, 170, 226, 241,
         (1,0): 118, 156, 10, 168, 23,
         (2,0): 95, 195, 100, 202, 200
         }
         ATTRIBUTE "beamstop" {
            DATATYPE  H5T_STD_I64LE
            DATASPACE  SCALAR
            DATA {
            (0): 12
            }
         }
      }
   }
   GROUP "metadata" {
      GROUP "end" {
         ATTRIBUTE "humidity" {
            DATATYPE  H5T_IEEE_F64LE
            DATASPACE  SCALAR
            DATA {
            (0): 0.4
            }
         }
         ATTRIBUTE "temperature" {
            DATATYPE  H5T_IEEE_F64LE
            DATASPACE  SCALAR
            DATA {
            (0): 26
            }
         }
      }
      GROUP "start" {
         ATTRIBUTE "humidity" {
            DATATYPE  H5T_IEEE_F64LE
            DATASPACE  SCALAR
            DATA {
            (0): 0.4
            }
         }
         ATTRIBUTE "temperature" {
            DATATYPE  H5T_IEEE_F64LE
            DATASPACE  SCALAR
            DATA {
            (0): 25
            }
         }
      }
   }
}
}

Now, when it comes to analysis, we want to load up the HDF5 file from disk. We use a context manager h5pydantic.H5Group.load() that will open the experiment.hdf file, allow users to access all the data, including datasets within the context block, and close the file at the end of the context block.

from model import Experiment
from pathlib import Path

with Experiment.load(Path("experiment.hdf")) as experiment:
    print(experiment.data[1][()])
    print(experiment.metadata.start.temperature)