Quickstart
This quickstart assumes fundamental knowledge about Pydantic, HDF5 files and the H5py library.
A convention to follow is to create your h5pydantic models in a
model.py file.
The motivation for h5pydantic was a Synchrotron use case, so this quickstart will use a greatly simplified Synchrotron use case.
Specifying the Model
To get started, some imports:
from h5pydantic import H5Dataset, H5Group, H5Int64
Now, lets define a baseline measurement of our beamline:
class Baseline(H5Group):
temperature: float
humidity: float
Attributes of atomic types are stored as HDF5 attributes.
Next, lets have two baseline measurements:
class Metadata(H5Group):
start: Baseline
end: Baseline
Now, lets take some experimental measurements:
class Acquisition(H5Dataset, shape=(3,5), dtype=H5Int64):
beamstop: H5Int64
H5Datasets map directly to HDF5 datasets, which can have a lot of options, h5pydantic supports these through extra arguments passed to the class. We’ve added a per acquisition metadata “beamstop” to the acquisition.
We now have all the bits and pieces to create our entire experiment:
class Experiment(H5Group):
metadata: Metadata
data: list[Acquisition] = []
which introduces our first container type, a list of Acquisitions; which gets mapped to HDF5 groups indexed by number e.g. /data/0, data/1 etc.
Using the Model
Now, lets use the model. In a real experiment the data would come from your beamline, for this example we’ll just use example values.
from model import Experiment, Acquisition, Baseline, Metadata
import numpy as np
from pathlib import Path
experiment = Experiment(data=[Acquisition(beamstop=11), Acquisition(beamstop=12)],
metadata=Metadata(start=Baseline(temperature=25.0, humidity=0.4),
end=Baseline(temperature=26.0, humidity=0.4)))
Now, we’re ready to dump this experiment to a file, there’s a lot
going on in this snippet. We begin by creating a h5pydantic.H5Group.dumper()
context manager, this will open the output file experiment.pdf at
the start of the context block, users can then write to the Datasets using
the h5py array assignment, at the end of the block h5pydantic will
close the output file.
with experiment.dumper(Path("experiment.hdf")):
experiment.data[0][()] = np.random.randint(255, size=(3, 5))
experiment.data[1][()] = np.random.randint(255, size=(3, 5))
Our example experiment hdf file is now created, an ascii form of it is as follows (the output of a call to h5dump):
HDF5 "experiment.hdf" {
GROUP "/" {
GROUP "data" {
DATASET "0" {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 3, 5 ) / ( 3, 5 ) }
DATA {
(0,0): 73, 86, 221, 21, 119,
(1,0): 213, 54, 196, 59, 139,
(2,0): 1, 209, 76, 228, 145
}
ATTRIBUTE "beamstop" {
DATATYPE H5T_STD_I64LE
DATASPACE SCALAR
DATA {
(0): 11
}
}
}
DATASET "1" {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 3, 5 ) / ( 3, 5 ) }
DATA {
(0,0): 240, 141, 170, 226, 241,
(1,0): 118, 156, 10, 168, 23,
(2,0): 95, 195, 100, 202, 200
}
ATTRIBUTE "beamstop" {
DATATYPE H5T_STD_I64LE
DATASPACE SCALAR
DATA {
(0): 12
}
}
}
}
GROUP "metadata" {
GROUP "end" {
ATTRIBUTE "humidity" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SCALAR
DATA {
(0): 0.4
}
}
ATTRIBUTE "temperature" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SCALAR
DATA {
(0): 26
}
}
}
GROUP "start" {
ATTRIBUTE "humidity" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SCALAR
DATA {
(0): 0.4
}
}
ATTRIBUTE "temperature" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SCALAR
DATA {
(0): 25
}
}
}
}
}
}
Now, when it comes to analysis, we want to load up the HDF5 file from
disk. We use a context manager h5pydantic.H5Group.load() that will open the
experiment.hdf file, allow users to access all the data, including
datasets within the context block, and close the file at the end of
the context block.
from model import Experiment
from pathlib import Path
with Experiment.load(Path("experiment.hdf")) as experiment:
print(experiment.data[1][()])
print(experiment.metadata.start.temperature)