Supporting a New VDIF EDV

Users may encounter VDIF files with unusual headers not currently supported by Baseband. These may either have novel EDV, or they may purport to be a supported EDV but not conform to its formal specification. To handle such situations, Baseband supports implementation of new EDVs and overriding of existing EDVs without the need to modify Baseband’s source code.

The tutorials below assumes the following modules have been imported:

>>> import numpy as np
>>> import astropy.units as u
>>> from baseband import vdif, base as vlbi

VDIF Headers

Each VDIF frame begins with a 32-byte, or eight 32-bit word, header that is structured as follows:

../_images/VDIFHeader.png

Schematic of the standard 32-bit VDIF header, from VDIF specification release 1.1.1 document, Fig. 3. 32-bit words are labelled on the left, while byte and bit numbers above indicate relative addresses within each word. Subscripts indicate field length in bits.

where the abbreviated labels are

  • \(\mathrm{I}_1\) - invalid data

  • \(\mathrm{L}_1\) - if 1, header is VDIF legacy

  • \(\mathrm{V}_3\) - VDIF version number

  • \(\mathrm{log}_2\mathrm{(\#chns)}_5\) - \(\mathrm{log}_2\) of the number of sub-bands in the frame

  • \(\mathrm{C}_1\) - if 1, complex data

  • \(\mathrm{EDV}_8\) - “extended data version” number; see below

Detailed definitions of terms are found on pages 5 to 7 of the VDIF specification document.

Words 4 - 7 hold optional extended user data, using a layout specified by the EDV, in word 4 of the header. EDV formats can be registered on the VDIF website; Baseband aims to support all registered formats (but does not currently support EDV = 4).

Implementing a New EDV

In this tutorial, we follow the implementation of an EDV=4 header. This would be a first and required step to support that format, but does not suffice, as it also needs a new frame class that allows the purpose of the EDV class, which is to independently store the validity of sub-band channels within a single data frame, rather than using the single invalid-data bit. From the EDV=4 specification, we see that we need to add the following to the standard VDIF header:

  • Validity header mask (word 4, bits 16 - 24): integer value between 1 and 64 inclusive indicating the number of validity bits. (This is different than \(\mathrm{log}_2\mathrm{(\#chns)}_5\), since some channels can be unused.)

  • Synchronization pattern (word 5): constant byte sequence 0xACABFEED, for finding the locations of headers in a data stream.

  • Validity mask (words 6 - 7): 64-bit binary mask indicating the validity of sub-bands. Any fraction of 64 sub-bands can be stored in this format, with any unused bands labelled as invalid (0) in the mask. If the number of bands exceeds 64, each bit indicates the validity of a group of sub-bands; see specification for details.

See Sec. 3.1 of the specification for best practices on using the invalid data bit \(\mathrm{I}_1\) in word 0.

In Baseband, a header is parsed using VDIFHeader, which returns a header instance of one of its subclasses, corresponding to the header EDV. This can be seen in the baseband.vdif.header module class inheritance diagram. To support a new EDV, we create a new subclass to baseband.vdif.VDIFHeader:

>>> class VDIFHeader4(vdif.header.VDIFHeader):
...     _edv = 4
...
...     _header_parser = vlbi.header.HeaderParser(
...         (('invalid_data', (0, 31, 1, False)),
...          ('legacy_mode', (0, 30, 1, False)),
...          ('seconds', (0, 0, 30)),
...          ('_1_30_2', (1, 30, 2, 0x0)),
...          ('ref_epoch', (1, 24, 6)),
...          ('frame_nr', (1, 0, 24, 0x0)),
...          ('vdif_version', (2, 29, 3, 0x1)),
...          ('lg2_nchan', (2, 24, 5)),
...          ('frame_length', (2, 0, 24)),
...          ('complex_data', (3, 31, 1)),
...          ('bits_per_sample', (3, 26, 5)),
...          ('thread_id', (3, 16, 10, 0x0)),
...          ('station_id', (3, 0, 16)),
...          ('edv', (4, 24, 8)),
...          ('validity_mask_length', (4, 16, 8, 0)),
...          ('sync_pattern', (5, 0, 32, 0xACABFEED)),
...          ('validity_mask', (6, 0, 64, 0))))

VDIFHeader has a metaclass that ensures that whenever it is subclassed, the subclass definition is inserted into the VDIF_HEADER_CLASSES dictionary using its EDV value as the dictionary key. Methods in VDIFHeader use this dictionary to determine the type of object to return for a particular EDV. How all this works is further discussed in the documentation of the VDIF baseband.vdif.header module.

The class must have a private _edv attribute for it to properly be registered in VDIF_HEADER_CLASSES. It must also feature a _header_parser that reads these words to return header properties. For this, we use baseband.base.header.HeaderParser. To initialize a header parser, we pass it a tuple of header properties, where each entry follows the syntax:

('property_name', (word_index, bit_index, bit_length, default))

where

  • property_name: name of the header property; this will be the key;

  • word_index: index into the header words for this key;

  • bit_index: index to the starting bit of the part used;

  • bit_length: number of bits used, normally between 1 and 32, but can be 64 for adding two words together; and

  • default: (optional) default value to use in initialization.

For further details, see the documentation of HeaderParser.

Once defined, we can use our new header like any other:

>>> myheader = vdif.header.VDIFHeader.fromvalues(
...     edv=4, seconds=14363767, nchan=1, samples_per_frame=1024,
...     station=65532, bps=2, complex_data=False,
...     thread_id=3, validity_mask_length=60,
...     validity_mask=(1 << 59) + 1)
>>> myheader
<VDIFHeader4 invalid_data: False,
             legacy_mode: False,
             seconds: 14363767,
             _1_30_2: 0,
             ref_epoch: 0,
             frame_nr: 0,
             vdif_version: 1,
             lg2_nchan: 0,
             frame_length: 36,
             complex_data: False,
             bits_per_sample: 1,
             thread_id: 3,
             station_id: 65532,
             edv: 4,
             validity_mask_length: 60,
             sync_pattern: 0xacabfeed,
             validity_mask: 576460752303423489>
>>> myheader['validity_mask'] == 2**59 + 1
True

There is an easier means of instantiating the header parser. As can be seen in the class inheritance diagram for the header module, many VDIF headers are subclassed from other VDIFHeader subclasses, namely VDIFBaseHeader and VDIFSampleRateHeader. This is because many EDV specifications share common header values, and so their functions and derived properties should be shared as well. Moreover, header parsers can be appended to one another [*], which saves repetitious coding because the first four words of any VDIF header are the same. Indeed, we can create the same header as above by subclassing VDIFBaseHeader:

>>> class VDIFHeader4Enhanced(vdif.header.VDIFBaseHeader):
...     _edv = 42
...
...     _header_parser = (vdif.header.VDIFBaseHeader._header_parser
...                       | vlbi.header.HeaderParser((
...                           ('validity_mask_length', (4, 16, 8, 0)),
...                           ('sync_pattern', (5, 0, 32, 0xACABFEED)),
...                           ('validity_mask', (6, 0, 64, 0)))))
...
...     _properties = vdif.header.VDIFBaseHeader._properties + ('validity',)
...
...     def verify(self):
...         """Basic checks of header integrity."""
...         super(VDIFHeader4Enhanced, self).verify()
...         assert 1 <= self['validity_mask_length'] <= 64
...
...     @property
...     def validity(self):
...         """Validity mask array with proper length.
...
...         If set, writes both ``validity_mask`` and ``validity_mask_length``.
...         """
...         bitmask = np.unpackbits(self['validity_mask'].astype('>u8')
...                                 .view('u1'))[::-1].astype(bool)
...         return bitmask[:self['validity_mask_length']]
...
...     @validity.setter
...     def validity(self, validity):
...         bitmask = np.zeros(64, dtype=bool)
...         bitmask[:len(validity)] = validity
...         self['validity_mask_length'] = len(validity)
...         self['validity_mask'] = np.packbits(bitmask[::-1]).view('>u8')

Here, we set edv = 42 because VDIFHeader’s metaclass is designed to prevent accidental overwriting of existing entries in VDIF_HEADER_CLASSES. If we had used _edv = 4, we would have gotten an exception:

ValueError: EDV 4 already registered in VDIF_HEADER_CLASSES

We shall see how to override header classes in the next section. Except for the EDV, VDIFHeader4Enhanced’s header structure is identical to VDIFHeader4. It also contains a few extra functions to enhance the header’s usability.

The verify function is an optional function that runs upon header initialization to check its veracity. Ours simply checks that the validity mask length is in the allowed range, but we also call the same function in the superclass (VDIFBaseHeader), which checks that the header is not in 4-word “legacy mode”, that the header’s EDV matches that read from the words, that there are eight words, and that the sync pattern matches 0xACABFEED.

The validity_mask is a bit mask, which is not necessarily the easiest to use directly. Hence, implement a derived validity property that generates a boolean mask of the right length (note that this is not right for cases whether the number of channels in the header exceeds 64). We also define a corresponding setter, and add this to the private _properties attribute, so that we can use validity as a keyword in fromvalues:

>>> myenhancedheader = vdif.header.VDIFHeader.fromvalues(
...     edv=42, seconds=14363767, nchan=1, samples_per_frame=1024,
...     station=65532, bps=2, complex_data=False,
...     thread_id=3, validity=[True]+[False]*58+[True])
>>> myenhancedheader
<VDIFHeader4Enhanced invalid_data: False,
                     legacy_mode: False,
                     seconds: 14363767,
                     _1_30_2: 0,
                     ref_epoch: 0,
                     frame_nr: 0,
                     vdif_version: 1,
                     lg2_nchan: 0,
                     frame_length: 36,
                     complex_data: False,
                     bits_per_sample: 1,
                     thread_id: 3,
                     station_id: 65532,
                     edv: 42,
                     validity_mask_length: 60,
                     sync_pattern: 0xacabfeed,
                     validity_mask: [576460752303423489]>
>>> assert myenhancedheader['validity_mask'] == 2**59 + 1
>>> assert (myenhancedheader.validity == [True]+[False]*58+[True]).all()
>>> myenhancedheader.validity = [True]*8
>>> myenhancedheader['validity_mask']
array([255], dtype=uint64)

Note

If you have implemented support for a new EDV that is widely used, we encourage you to make a pull request to Baseband’s GitHub repository, as well as to register it (if it is not already registered) with the VDIF consortium!

Replacing an Existing EDV

Above, we mentioned that VDIFHeader’s metaclass is designed to prevent accidental overwriting of existing entries in VDIF_HEADER_CLASSES, so attempting to assign two header classes to the same EDV results in an exception. There are situations such the one above, however, where we’d like to replace one header with another.

To get VDIFHeader to use VDIFHeader4Enhanced when edv=4, we can manually insert it in the dictionary (keeping a copy of the original dict so we can updo later):

>>> original_header_classes = vdif.header.VDIF_HEADER_CLASSES.copy()
>>> vdif.header.VDIF_HEADER_CLASSES[4] = VDIFHeader4Enhanced

Of course, we should then be sure that its _edv attribute is correct:

>>> VDIFHeader4Enhanced._edv = 4

VDIFHeader will now return instances of VDIFHeader4Enhanced when reading headers with edv = 4:

>>> myheader = vdif.header.VDIFHeader.fromvalues(
...     edv=4, seconds=14363767, nchan=1,
...     station=65532, bps=2, complex_data=False,
...     thread_id=3, validity=[True]*60)
>>> assert isinstance(myheader, VDIFHeader4Enhanced)

Note

Failing to modify _edv in the class definition will lead to an EDV mismatch when verify is called during header initialization.

This can also be used to override VDIFHeader’s behavior even for EDVs that are supported by Baseband, which may prove useful when reading data with corrupted or mislabelled headers. To illustrate this, we attempt to read in a corrupted VDIF file originally from the Dominion Radio Astrophysical Observatory. This file can be imported from the baseband data directory:

>>> from baseband.data import SAMPLE_DRAO_CORRUPT

Naively opening the file with

>>> fh = vdif.open(SAMPLE_DRAO_CORRUPT, 'rs')  

will lead to an AssertionError. This is because while the headers of the file use EDV=0, it deviates from that EDV standard by storing additional information an: an “eud2” parameter in word 5, which is related to the sample time. Furthermore, the bits_per_sample setting is incorrect (it should be 3 rather than 4 – the number is defined such that a one-bit sample has a bits_per_sample code of 0). Finally, though not an error, the thread_id in word 3 defines two parts, link and slot, which reflect the data acquisition computer node that wrote the data to disk.

To accommodate these changes, we design an alternate header. We first pop the EDV = 0 entry from VDIF_HEADER_CLASSES:

>>> vdif.header.VDIF_HEADER_CLASSES.pop(0)
<class 'baseband.vdif.header.VDIFHeader0'>

We then define a replacement class:

>>> class DRAOVDIFHeader(vdif.header.VDIFHeader0):
...     """DRAO VDIF Header
...
...     An extension of EDV=0 which uses the thread_id to store link
...     and slot numbers, and adds a user keyword (illegal in EDV0,
...     but whatever) that identifies data taken at the same time.
...
...     The header also corrects 'bits_per_sample' to be properly bps-1.
...     """
...
...     _header_parser = (vdif.header.VDIFHeader0._header_parser
...                       | vlbi.header.HeaderParser((
...                           ('link', (3, 16, 4)),
...                           ('slot', (3, 20, 6)),
...                           ('eud2', (5, 0, 32)))))
...
...     def verify(self):
...         pass  # this is a hack, don't bother with verification...
...
...     @classmethod
...     def fromfile(cls, fh, edv=0, verify=False):
...         self = super(DRAOVDIFHeader, cls).fromfile(fh, edv=0,
...                                                    verify=False)
...         # Correct wrong bps
...         self.mutable = True
...         self['bits_per_sample'] = 3
...         return self

We override verify because VDIFHeader0’s verify function checks that word 5 contains no data. We also override the fromfile class method such that the bits_per_sample property is reset to its proper value whenever a header is read from file.

We can now read in the corrupt file by manually reading in the header, then the payload, of each frame:

>>> fh = vdif.open(SAMPLE_DRAO_CORRUPT, 'rb')
>>> header0 = DRAOVDIFHeader.fromfile(fh)
>>> header0['eud2'] == 667235140
True
>>> header0['link'] == 2
True
>>> payload0 = vdif.payload.VDIFPayload.fromfile(fh, header0)
>>> payload0.shape == (header0.samples_per_frame, header0.nchan)
True
>>> fh.close()

Reading a frame using VDIFFrame will still fail, since its _header_class is VDIFHeader, and so VDIFHeader.fromfile, rather than the function we defined, is used to read in headers. If we wanted to use VDIFFrame, we would need to set

VDIFFrame._header_class = DRAOVDIFHeader

before using baseband.vdif.open(), so that header files are read using DRAOVDIFHeader.fromfile.

A more elegant solution that is compatible with baseband.vdif.base.VDIFStreamReader without hacking baseband.vdif.frame.VDIFFrame involves modifying the bits-per-sample code within __init__(). Let’s remove our previous custom class, and define a replacement:

>>> vdif.header.VDIF_HEADER_CLASSES.pop(0)
<class '__main__.DRAOVDIFHeader'>
>>> class DRAOVDIFHeaderEnhanced(vdif.header.VDIFHeader0):
...     """DRAO VDIF Header
...
...     An extension of EDV=0 which uses the thread_id to store link and slot
...     numbers, and adds a user keyword (illegal in EDV0, but whatever) that
...     identifies data taken at the same time.
...
...     The header also corrects 'bits_per_sample' to be properly bps-1.
...     """
...     _header_parser = (vdif.header.VDIFHeader0._header_parser
...                       | vlbi.header.HeaderParser((
...                           ('link', (3, 16, 4)),
...                           ('slot', (3, 20, 6)),
...                           ('eud2', (5, 0, 32)))))
...
...     def __init__(self, words, edv=None, verify=True, **kwargs):
...         super(DRAOVDIFHeaderEnhanced, self).__init__(
...                 words, verify=False, **kwargs)
...         self.mutable = True
...         self['bits_per_sample'] = 3
...
...     def verify(self):
...         pass

If we had the whole corrupt file, this might be enough to use the stream reader without further modification. It turns out, though, that the frame numbers are not monotonic and that the station ID changes between frames as well, so one would be better off making a new copy. Here, we can at least now read frames:

>>> fh2 = vdif.open(SAMPLE_DRAO_CORRUPT, 'rb')
>>> frame0 = fh2.read_frame()
>>> np.all(frame0.data == payload0.data)
True
>>> fh2.close()
>>> vdif.header.VDIF_HEADER_CLASSES = original_header_classes

Reading frames using VDIFFileReader.read_frame will now work as well, but reading frame sets using VDIFFileReader.read_frameset will still fail. This is because the frame and thread numbers that function relies on are meaningless for these headers, and grouping threads together using the link, slot and eud2 values should be manually performed by the user.