Baseband

Welcome to the Baseband documentation! Baseband is a package affiliated with the Astropy project for reading and writing VLBI and other radio baseband files, with the aim of simplifying and streamlining data conversion and standardization. It provides:

  • File input/output objects for supported radio baseband formats, enabling selective decoding of data into Numpy arrays, and encoding user-defined arrays into baseband formats. Supported formats are listed under specific file formats.

  • The ability to read from and write to an ordered sequence of files as if it was a single file.

If you used this package in your research, please cite it via DOI 10.5281/zenodo.1214268.

Overview

Installation

Requirements

Baseband requires:

Installing Baseband

To install Baseband with pip, run:

pip3 install baseband

Note

To run without pip potentially updating Numpy and Astropy, run, include the --no-deps flag.

Obtaining Source Code

The source code and latest development version of Baseband can found on its GitHub repo. You can get your own clone using:

git clone git@github.com:mhvk/baseband.git

Of course, it is even better to fork it on GitHub, and then clone your own repository, so that you can more easily contribute!

Running Code without Installing

As Baseband is purely Python, it can be used without being built or installed, by appending the directory it is located in to the PYTHON_PATH environment variable. Alternatively, you can use sys.path within Python to append the path:

import sys
sys.path.append(BASEBAND_PATH)

where BASEBAND_PATH is the directory you downloaded or cloned Baseband into.

Installing Source Code

If you want Baseband to be more broadly available, either to all users on a system, or within, say, a virtual environment, use setup.py in the root directory by calling:

python3 setup.py install

For general information on setup.py, see its documentation . Many of the setup.py options are inherited from Astropy (specifically, from Astropy -affiliated package manager) and are described further in Astropy’s installation documentation .

Testing the Installation

The root directory setup.py can also be used to test if Baseband can successfully be run on your system:

python3 setup.py test

or, inside of Python:

import baseband
baseband.test()

These tests require pytest to be installed. Further documentation can be found on the Astropy running tests documentation .

Building Documentation

Note

As with Astropy, building the documentation is unnecessary unless you are writing new documentation or do not have internet access, as Baseband’s documentation is available online at baseband.readthedocs.io.

The Baseband documentation can be built again using setup.py from the root directory:

python3 setup.py build_docs

This requires to have Sphinx installed (and its dependencies).

Getting Started with Baseband

This quickstart tutorial is meant to help the reader hit the ground running with Baseband. For more detail, including writing to files, see Using Baseband.

For installation instructions, please see Installing Baseband.

When using Baseband, we typically will also use numpy, astropy.units, and astropy.time.Time. Let’s import all of these:

>>> import baseband
>>> import numpy as np
>>> import astropy.units as u
>>> from astropy.time import Time

Opening Files

For this tutorial, we’ll use two sample files:

>>> from baseband.data import SAMPLE_VDIF, SAMPLE_MARK5B

The first file is a VDIF one created from EVN/VLBA observations of Black Widow pulsar PSR B1957+20, while the second is a Mark 5B from EVN/WSRT observations of the same pulsar.

To open the VDIF file:

>>> fh_vdif = baseband.open(SAMPLE_VDIF)

Opening the Mark 5B file is slightly more involved, as not all required metadata is stored in the file itself:

>>> fh_m5b = baseband.open(SAMPLE_MARK5B, nchan=8, sample_rate=32*u.MHz,
...                        ref_time=Time('2014-06-13 12:00:00'))

Here, we’ve manually passed in as keywords the number of channels, the sample rate (number of samples per channel per second) as an astropy.units.Quantity, and a reference time within 500 days of the start of the observation as an astropy.time.Time. That last keyword is needed to properly read timestamps from the Mark 5B file.

baseband.open tries to open files using all available formats, returning whichever is successful. If you know the format of your file, you can pass its name with the format keyword, or directly use its format opener (for VDIF, it is baseband.vdif.open). Also, the baseband.file_info function can help determine the format and any missing information needed by baseband.open - see Inspecting Files.

Do you have a sequence of files you want to read in? You can pass a list of filenames to baseband.open, and it will open them up as if they were a single file! See Reading or Writing to a Sequence of Files.

Reading Files

Radio baseband files are generally composed of blocks of binary data, or payloads, stored alongside corresponding metadata, or headers. Each header and payload combination is known as a data frame, and most formats feature files composed of a long series of frames.

Baseband file objects are frame-reading wrappers around Python file objects, and have the same interface, including seek for seeking to different parts of the file, tell for reporting the file pointer’s current position, and read for reading data. The main difference is that Baseband file objects read and navigate in units of samples.

Let’s read some samples from the VDIF file:

>>> data = fh_vdif.read(3)
>>> data  
array([[-1.      ,  1.      ,  1.      , -1.      , -1.      , -1.      ,
         3.316505,  3.316505],
       [-1.      ,  1.      , -1.      ,  1.      ,  1.      ,  1.      ,
         3.316505,  3.316505],
       [ 3.316505,  1.      , -1.      , -1.      ,  1.      ,  3.316505,
        -3.316505,  3.316505]], dtype=float32)
>>> data.shape
(3, 8)

Baseband decodes binary data into ndarray objects. Notice we input 3, and received an array of shape (3, 8); this is because there are 8 VDIF threads. Threads and channels represent different components of the data such as polarizations or frequency sub-bands, and the collection of all components at one point in time is referred to as a complete sample. Baseband reads in units of complete samples, and works with sample rates in units of complete samples per second (including with the Mark 5B example above). Like an ndarray, calling fh_vdif.shape returns the shape of the entire dataset:

>>> fh_vdif.shape
(40000, 8)

The first axis represents time, and all additional axes represent the shape of a complete sample. A labelled version of the complete sample shape is given by:

>>> fh_vdif.sample_shape
SampleShape(nthread=8)

Baseband extracts basic properties and header metadata from opened files. Notably, the start and end times of the file are given by:

>>> fh_vdif.start_time
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.000000000>
>>> fh_vdif.stop_time
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.001250000>

For an overview of the file, we can either print fh_vdif itself, or use the info method:

>>> fh_vdif
<VDIFStreamReader name=... offset=3
    sample_rate=32.0 MHz, samples_per_frame=20000,
    sample_shape=SampleShape(nthread=8),
    bps=2, complex_data=False, edv=3, station=65532,
    start_time=2014-06-16T05:56:07.000000000>
>>> fh_vdif.info
Stream information:
start_time = 2014-06-16T05:56:07.000000000
stop_time = 2014-06-16T05:56:07.001250000
sample_rate = 32.0 MHz
shape = (40000, 8)
format = vdif
bps = 2
complex_data = False
verify = fix
readable = True

checks:  decodable: True
         continuous: no obvious gaps

File information:
edv = 3
number_of_frames = 16
thread_ids = [0, 1, 2, 3, 4, 5, 6, 7]
number_of_framesets = 2
frame_rate = 1600.0 Hz
samples_per_frame = 20000
sample_shape = (8, 1)

Seeking is also done in units of complete samples, which is equivalent to seeking in timesteps. Let’s move forward 100 complete samples:

>>> fh_vdif.seek(100)
100

Seeking from the end or current position is also possible, using the same syntax as for typical file objects. It is also possible to seek in units of time:

>>> fh_vdif.seek(-1000, 2)    # Seek 1000 samples from end.
39000
>>> fh_vdif.seek(10*u.us, 1)    # Seek 10 us from current position.
39320

fh_vdif.tell returns the current offset in samples or in time:

>>> fh_vdif.tell()
39320
>>> fh_vdif.tell(unit=u.us)    # Time since start of file.
<Quantity 1228.75 us>
>>> fh_vdif.tell(unit='time')
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.001228750>

Finally, we close both files:

>>> fh_vdif.close()
>>> fh_m5b.close()

Using Baseband

For most file formats, one can simply import baseband and use baseband.open to access the file. This gives one a filehandle from which one can read decoded samples:

>>> import baseband
>>> from baseband.data import SAMPLE_DADA
>>> fh = baseband.open(SAMPLE_DADA)
>>> fh.read(3)
array([[ -38.-38.j,  -38.-38.j],
       [ -38.-38.j,  -40. +0.j],
       [-105.+60.j,   85.-15.j]], dtype=complex64)
>>> fh.close()

For other file formats, a bit more information is needed. Below, we cover the basics of inspecting files, reading from and writing to files, converting from one format to another, and diagnosing problems. We assume that Baseband as well as NumPy and the Astropy units module have been imported:

>>> import baseband
>>> import numpy as np
>>> import astropy.units as u

Inspecting Files

Baseband allows you to quickly determine basic properties of a file, including what format it is, using the baseband.file_info function. For instance, it shows that the sample VDIF file that comes with Baseband is very short (sample files can all be found in the baseband.data module):

>>> import baseband.data
>>> baseband.file_info(baseband.data.SAMPLE_VDIF)
Stream information:
start_time = 2014-06-16T05:56:07.000000000
stop_time = 2014-06-16T05:56:07.001250000
sample_rate = 32.0 MHz
shape = (40000, 8)
format = vdif
bps = 2
complex_data = False
verify = fix
readable = True

checks:  decodable: True
         continuous: no obvious gaps

File information:
edv = 3
number_of_frames = 16
thread_ids = [0, 1, 2, 3, 4, 5, 6, 7]
number_of_framesets = 2
frame_rate = 1600.0 Hz
samples_per_frame = 20000
sample_shape = (8, 1)

The same function will also tell you when more information is needed. For instance, for Mark 5B files one needs the number of channels used, as well as (roughly) when the data were taken:

>>> baseband.file_info(baseband.data.SAMPLE_MARK5B)
File information:
format = mark5b
number_of_frames = 4
frame_rate = 6400.0 Hz
bps = 2
complex_data = False
readable = False

missing:  nchan: needed to determine sample shape, frame rate, decode data.
          kday, ref_time: needed to infer full times.

>>> from astropy.time import Time
>>> baseband.file_info(baseband.data.SAMPLE_MARK5B, nchan=8, ref_time=Time('2014-01-01'))
Stream information:
start_time = 2014-06-13T05:30:01.000000000
stop_time = 2014-06-13T05:30:01.000625000
sample_rate = 32.0 MHz
shape = (20000, 8)
format = mark5b
bps = 2
complex_data = False
verify = fix
readable = True

checks:  decodable: True
         continuous: no obvious gaps

File information:
number_of_frames = 4
frame_rate = 6400.0 Hz
samples_per_frame = 5000
sample_shape = (8,)

The information is gleaned from info properties on the various file and stream readers (see below).

Note

The one format for which file_info works a bit differently is GSB, as this format requires separate time-stamp and raw data files. Only the timestamp file can be inspected usefully.

Reading Files

Opening Files

As shown at the very start, files can be opened with the general baseband.open function. This will try to determine the file type using file_info, load the corresponding baseband module, and then open the file using that module’s master input/output function.

Generally, if one knows the file type, one might as well work with the corresponding module directly. For instance, to explicitly use the DADA reader to open the sample DADA file included in Baseband, one can use the DADA module’s open function:

>>> from baseband import dada
>>> from baseband.data import SAMPLE_DADA
>>> fh = dada.open(SAMPLE_DADA, 'rs')
>>> fh.read(3)
array([[ -38.-38.j,  -38.-38.j],
       [ -38.-38.j,  -40. +0.j],
       [-105.+60.j,   85.-15.j]], dtype=complex64)
>>> fh.close()

In general, file I/O and data manipulation use the same syntax across all file formats. When opening Mark 4 and Mark 5B files, however, some additional arguments may need to be passed (as was the case above for inspecting a Mark 5B file, and indeed this is a good way to find out what is needed). Notes on such features and quirks of individual formats can be found in the API entries of their open functions, and within the Specific file format documentation.

For the rest of this section, we will stick to VDIF files.

Decoding Data and the Sample File Pointer

By giving the openers a 'rs' flag, which is the default, we open files in “stream reader” mode, where a file is accessed as if it were a stream of samples. For VDIF, open will then return an instance of VDIFStreamReader, which wraps a raw data file with methods to decode the binary data frames and seek to and read data samples. To decode the first 12 samples into a ndarray, we would use the read method:

>>> from baseband import vdif
>>> from baseband.data import SAMPLE_VDIF
>>> fh = vdif.open(SAMPLE_VDIF, 'rs')
>>> d = fh.read(12)
>>> type(d)
<... 'numpy.ndarray'>
>>> d.shape
(12, 8)
>>> d[:, 0].astype(int)    # First thread.
array([-1, -1,  3, -1,  1, -1,  3, -1,  1,  3, -1,  1])

As discussed in detail in the VDIF section, VDIF files are sequences of data frames, each of which is comprised of a header (which holds information like the time at which the data was taken) and a payload, or block of data. Multiple concurrent time streams can be stored within a single frame; each of these is called a “channel”. Moreover, groups of channels can be stored over multiple frames, each of which is called a “thread”. Our sample file is an “8-thread, single-channel file” (8 concurrent time streams with 1 stream per frame), and in the example above, fh.read decoded the first 12 samples from all 8 threads, mapping thread number to the second axis of the decoded data array. Reading files with multiple threads and channels will produce 3-dimensional arrays.

fh includes shape, size and ndim, which give the shape, total number of elements and dimensionality of the file’s entire dataset if it was decoded into an array. The number of complete samples - the set of samples from all available threads and channels for one point in time - in the file is given by the first element in shape:

>>> fh.shape    # Shape of all data from the file in decoded array form.
(40000, 8)
>>> fh.shape[0] # Number of complete samples.
40000
>>> fh.size
320000
>>> fh.ndim
2

The shape of a single complete sample, including names indicating the meaning of shape dimensions, is retrievable using:

>>> fh.sample_shape
SampleShape(nthread=8)

By default, dimensions of length unity are squeezed, or removed from the sample shape. To retain them, we can pass squeeze=False to open:

>>> fhns = vdif.open(SAMPLE_VDIF, 'rs', squeeze=False)
>>> fhns.sample_shape    # Sample shape now keeps channel dimension.
SampleShape(nthread=8, nchan=1)
>>> fhns.ndim            # fh.shape and fh.ndim also change with squeezing.
3
>>> d2 = fhns.read(12)
>>> d2.shape             # Decoded data has channel dimension.
(12, 8, 1)
>>> fhns.close()

Basic information about the file is obtained by either by fh.info or simply fh itself:

>>> fh.info
Stream information:
start_time = 2014-06-16T05:56:07.000000000
stop_time = 2014-06-16T05:56:07.001250000
sample_rate = 32.0 MHz
shape = (40000, 8)
format = vdif
bps = 2
complex_data = False
verify = fix
readable = True

checks:  decodable: True
         continuous: no obvious gaps

File information:
edv = 3
number_of_frames = 16
thread_ids = [0, 1, 2, 3, 4, 5, 6, 7]
number_of_framesets = 2
frame_rate = 1600.0 Hz
samples_per_frame = 20000
sample_shape = (8, 1)

>>> fh
<VDIFStreamReader name=... offset=12
    sample_rate=32.0 MHz, samples_per_frame=20000,
    sample_shape=SampleShape(nthread=8),
    bps=2, complex_data=False, edv=3, station=65532,
    start_time=2014-06-16T05:56:07.000000000>

Not coincidentally, the first is identical to what we found above using file_info.

The filehandle itself also shows the offset, the current location of the sample file pointer. Above, it is at 12 since we have read in 12 (complete) samples. If we called fh.read (12) again we would get the next 12 samples. If we instead called fh.read(), it would read from the pointer’s current position to the end of the file. If we wanted all the data in one array, we would move the file pointer back to the start of file, using fh.seek, before reading:

>>> fh.seek(0)      # Seek to sample 0.  Seek returns its offset in counts.
0
>>> d_complete = fh.read()
>>> d_complete.shape
(40000, 8)

We can also move the pointer with respect to the end of file by passing 2 as a second argument:

>>> fh.seek(-100, 2)    # Second arg is 0 (start of file) by default.
39900
>>> d_end = fh.read(100)
>>> np.array_equal(d_complete[-100:], d_end)
True

-100 means 100 samples before the end of file, so d_end is equal to the last 100 entries of d_complete. Baseband only keeps the most recently accessed data frame in memory, making it possible to analyze (normally large) files through selective decoding using seek and read.

Note

As with file pointers in general, fh.seek will not return an error if one seeks beyond the end of file. Attempting to read beyond the end of file, however, will result in an EOFError.

To determine where the pointer is located, we use fh.tell():

>>> fh.tell()
40000
>>> fh.close()

Caution should be used when decoding large blocks of data using fh.read. For typical files, the resulting arrays are far too large to hold in memory.

Seeking and Telling in Time With the Sample Pointer

We can use seek and tell with units of time rather than samples. To do this with tell, we can pass an appropriate astropy.units.Unit object to its optional unit parameter:

>>> fh = vdif.open(SAMPLE_VDIF, 'rs')
>>> fh.seek(40000)
40000
>>> fh.tell(unit=u.ms)
<Quantity 1.25 ms>

Passing the string 'time' reports the pointer’s location in absolute time:

>>> fh.tell(unit='time')
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.001250000>

We can also pass an absolute astropy.time.Time, or a positive or negative time difference TimeDelta or astropy.units.Quantity to seek. If the offset is a Time object, the second argument to seek is ignored.:

>>> from astropy.time.core import TimeDelta
>>> from astropy.time import Time
>>> fh.seek(TimeDelta(-5e-4, format='sec'), 2)  # Seek -0.5 ms from end.
24000
>>> fh.seek(0.25*u.ms, 1)  # Seek 0.25 ms from current position.
32000
>>> # Seek to specific time.
>>> fh.seek(Time('2014-06-16T05:56:07.001125'))
36000

We can retrieve the time of the first sample in the file using start_time, the time immediately after the last sample using stop_time, and the time of the pointer’s current location (equivalent to fh.tell(unit='time')) using time:

>>> fh.start_time
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.000000000>
>>> fh.stop_time
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.001250000>
>>> fh.time
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.001125000>
>>> fh.close()
Extracting Header Information

The first header of the file is stored as the header0 attribute of the stream reader object; it gives direct access to header properties via keyword lookup:

>>> with vdif.open(SAMPLE_VDIF, 'rs') as fh:
...     header0 = fh.header0
>>> header0['frame_length']
629

The full list of keywords is available by printing out header0:

>>> header0
<VDIFHeader3 invalid_data: False,
             legacy_mode: False,
             seconds: 14363767,
             _1_30_2: 0,
             ref_epoch: 28,
             frame_nr: 0,
             vdif_version: 1,
             lg2_nchan: 0,
             frame_length: 629,
             complex_data: False,
             bits_per_sample: 1,
             thread_id: 1,
             station_id: 65532,
             edv: 3,
             sampling_unit: True,
             sampling_rate: 16,
             sync_pattern: 0xacabfeed,
             loif_tuning: 859832320,
             _7_28_4: 15,
             dbe_unit: 2,
             if_nr: 0,
             subband: 1,
             sideband: True,
             major_rev: 1,
             minor_rev: 5,
             personality: 131>

A number of derived properties, such as the time (as a Time object), are also available through the header object:

>>> header0.time
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.000000000>

These are listed in the API for each header class. For example, the sample VDIF file’s headers are of class:

>>> type(header0)
<class 'baseband.vdif.header.VDIFHeader3'>

and so its attributes can be found here.

Reading Specific Components of the Data

By default, fh.read() returns complete samples, i.e. with all available threads, polarizations or channels. If we were only interested in decoding a subset of the complete sample, we can select specific components by passing indexing objects to the subset keyword in open. For example, if we only wanted thread 3 of the sample VDIF file:

>>> fh = vdif.open(SAMPLE_VDIF, 'rs', subset=3)
>>> fh.sample_shape
()
>>> d = fh.read(20000)
>>> d.shape
(20000,)
>>> fh.subset
(3,)
>>> fh.close()

Since by default data are squeezed, one obtains a data stream with just a single dimension. If one would like to keep all information, one has to pass squeeze=False and also make subset a list (or slice):

>>> fh = vdif.open(SAMPLE_VDIF, 'rs', subset=[3], squeeze=False)
>>> fh.sample_shape
SampleShape(nthread=1, nchan=1)
>>> d = fh.read(20000)
>>> d.shape
(20000, 1, 1)
>>> fh.close()

Data with multi-dimensional samples can be subset by passing a tuple of indexing objects with the same dimensional ordering as the (possibly squeezed) sample shape; in the case of the sample VDIF with squeeze=False, this is threads, then channels. For example, if we wished to select threads 1 and 3, and channel 0:

>>> fh = vdif.open(SAMPLE_VDIF, 'rs', subset=([1, 3], 0), squeeze=False)
>>> fh.sample_shape
SampleShape(nthread=2)
>>> fh.close()

Generally, subset accepts any object that can be used to index a numpy.ndarray, including advanced indexing (as done above, with subset=([1, 3], 0)). If possible, slices should be used instead of list of integers, since indexing with them returns a view rather than a copy and thus avoid unnecessary processing and memory allocation. (An exception to this is VDIF threads, where the subset is used to selectively read specific threads, and thus is not used for actual slicing of the data.)

Writing to Files and Format Conversion

Writing to a File

To write data to disk, we again use open. Writing data in a particular format requires both the header and data samples. For modifying an existing file, we have both the old header and old data handy.

As a simple example, let’s read in the 8-thread, single-channel sample VDIF file and rewrite it as an single-thread, 8-channel one, which, for example, may be necessary for compatibility with DSPSR:

>>> import baseband.vdif as vdif
>>> from baseband.data import SAMPLE_VDIF
>>> fr = vdif.open(SAMPLE_VDIF, 'rs')
>>> fw = vdif.open('test_vdif.vdif', 'ws',
...                sample_rate=fr.sample_rate,
...                samples_per_frame=fr.samples_per_frame // 8,
...                nthread=1, nchan=fr.sample_shape.nthread,
...                complex_data=fr.complex_data, bps=fr.bps,
...                edv=fr.header0.edv, station=fr.header0.station,
...                time=fr.start_time)

The minimal parameters needed to generate a file are listed under the documentation for each format’s open, though comprehensive lists can be found in the documentation for each format’s stream writer class (eg. for VDIF, it’s under VDIFStreamWriter). In practice we specify as many relevant header properties as available to obtain a particular file structure. If we possess the exact first header of the file, it can simply be passed to open via the header keyword. In the example above, though, we manually switch the values of nthread and nchan. Because VDIF EDV = 3 requires each frame’s payload to contain 5000 bytes, and nchan is now a factor of 8 larger, we decrease samples_per_frame, the number of complete (i.e. all threads and channels included) samples per frame, by a factor of 8.

Encoding samples and writing data to file is done by passing data arrays into fw’s write method. The first dimension of the arrays is sample number, and the remaining dimensions must be as given by fw.sample_shape:

>>> fw.sample_shape
SampleShape(nchan=8)

In this case, the required dimensions are the same as the arrays from fr.read. We can thus write the data to file using:

>>> while fr.tell() < fr.shape[0]:
...     fw.write(fr.read(fr.samples_per_frame))
>>> fr.close()
>>> fw.close()

For our sample file, we could simply have written

fw.write(fr.read())

instead of the loop, but for large files, reading and writing should be done in smaller chunks to minimize memory usage. Baseband stores only the data frame or frame set being read or written to in memory.

We can check the validity of our new file by re-opening it:

>>> fr = vdif.open(SAMPLE_VDIF, 'rs')
>>> fh = vdif.open('test_vdif.vdif', 'rs')
>>> fh.sample_shape
SampleShape(nchan=8)
>>> np.all(fr.read() == fh.read())
True
>>> fr.close()
>>> fh.close()

Note

One can also use the top-level open function for writing, with the file format passed in via its format argument.

File Format Conversion

It is often preferable to convert data from one file format to another that offers wider compatibility, or better fits the structure of the data. As an example, we convert the sample Mark 4 data to VDIF.

Since we don’t have a VDIF header handy, we pass the relevant Mark 4 header values into vdif.open to create one:

>>> import baseband.mark4 as mark4
>>> from baseband.data import SAMPLE_MARK4
>>> fr = mark4.open(SAMPLE_MARK4, 'rs', ntrack=64, decade=2010)
>>> spf = 640       # fanout * 160 = 640 invalid samples per Mark 4 frame
>>> fw = vdif.open('m4convert.vdif', 'ws', sample_rate=fr.sample_rate,
...                samples_per_frame=spf, nthread=1,
...                nchan=fr.sample_shape.nchan,
...                complex_data=fr.complex_data, bps=fr.bps,
...                edv=1, time=fr.start_time)

We choose edv = 1 since it’s the simplest VDIF EDV whose header includes a sampling rate. The concept of threads does not exist in Mark 4, so the file effectively has nthread = 1. As discussed in the Mark 4 documentation, the data at the start of each frame is effectively overwritten by the header and are represented by invalid samples in the stream reader. We set samples_per_frame to 640 so that each section of invalid data is captured in a single frame.

We now write the data to file, manually flagging each invalid data frame:

>>> while fr.tell() < fr.shape[0]:
...     d = fr.read(fr.samples_per_frame)
...     fw.write(d[:640], valid=False)
...     fw.write(d[640:])
>>> fr.close()
>>> fw.close()

Lastly, we check our new file:

>>> fr = mark4.open(SAMPLE_MARK4, 'rs', ntrack=64, decade=2010)
>>> fh = vdif.open('m4convert.vdif', 'rs')
>>> np.all(fr.read() == fh.read())
True
>>> fr.close()
>>> fh.close()

For file format conversion in general, we have to consider how to properly scale our data to make the best use of the dynamic range of the new encoded format. For VLBI formats like VDIF, Mark 4 and Mark 5B, samples of the same size have the same scale, which is why we did not have to rescale our data when writing 2-bits-per-sample Mark 4 data to a 2-bits-per-sample VDIF file. Rescaling is necessary, though, to convert DADA or GSB to VDIF. For examples of rescaling, see the baseband/tests/test_conversion.py file.

Reading or Writing to a Sequence of Files

Data from one continuous observation is sometimes spread over a sequence of files. Baseband includes the sequentialfile module for reading in a sequence as if it were one contiguous file. This module is called when a list, tuple or filename template is passed to eg. baseband.open or baseband.vdif.open, making the syntax for handling multiple files nearly identical to that for single ones.

As an example, we write the data from the sample VDIF file baseband/data/sample.vdif into a sequence of two files and then read the files back in. We first load the required data:

>>> from baseband import vdif
>>> from baseband.data import SAMPLE_VDIF
>>> import numpy as np
>>> fh = vdif.open(SAMPLE_VDIF, 'rs')
>>> d = fh.read()

We then create a sequence of filenames:

>>> filenames = ["seqvdif_{0}".format(i) for i in range(2)]

When passing filenames to open, we must also pass file_size, the file size in bytes, in addition to the usual kwargs for writing a file. Since we wish to split the sample file in two, and the file consists of two framesets, we set file_size to the byte size of one frameset (we could have equivalently set it to fh.fh_raw.seek(0, 2) // 2):

>>> file_size = 8 * fh.header0.frame_nbytes
>>> fw = vdif.open(filenames, 'ws', header0=fh.header0,
...                file_size=file_size, sample_rate=fh.sample_rate,
...                nthread=fh.sample_shape.nthread)
>>> fw.write(d)
>>> fw.close()    # This implicitly closes fwr.

Note

file_size sets the maximum size a file can reach before the writer writes to the next one, so setting file_size to a larger value than above will lead to the two files having different sizes. By default, file_size=None, meaning it can be arbitrarily large, in which case only one file will be created.

We now read the sequence and confirm their contents are identical to those of the sample file:

>>> fr = vdif.open(filenames, 'rs', sample_rate=fh.sample_rate)
>>> fr.header0.time == fh.header0.time
True
>>> np.all(fr.read() == d)
True
>>> fr.close()

When reading, the filename sequence must be ordered in time.

We can also open the second file on its own and confirm it contains the second frameset of the sample file:

>>> fsf = vdif.open(filenames[1], mode='rs', sample_rate=fh.sample_rate)
>>> fh.seek(fh.shape[0] // 2)    # Seek to start of second frameset.
20000
>>> fsf.header0.time == fh.time
True
>>> np.all(fsf.read() == fh.read())
True
>>> fsf.close()

In situations where the file_size is known, but not the total number of files to write, one may use the FileNameSequencer class to create an iterable without a user-defined size. The class is initialized with a template string that can be formatted with keywords, and a optional header that can either be an actual header or a dict with the relevant keywords. The template may also contain the special keyword ‘{file_nr}’, which is equal to the indexing value (instead of a header entry).

As an example, let us create a sequencer:

>>> from baseband.helpers import sequentialfile as sf
>>> filenames = sf.FileNameSequencer('f.edv{edv:d}.{file_nr:03d}.vdif',
...                                  header=fh.header0)

Indexing the sequencer using square brackets returns a filename:

>>> filenames[0]
'f.edv3.000.vdif'
>>> filenames[42]
'f.edv3.042.vdif'

The sequencer has extracted the EDV from the header we passed in, and the file number from the index. We can use the sequencer to write a VDIF file sequence:

>>> fw = vdif.open(filenames, 'ws', header0=fh.header0,
...                file_size=file_size, sample_rate=fh.sample_rate,
...                nthread=fh.sample_shape.nthread)
>>> d = np.concatenate([d, d, d])
>>> fw.write(d)
>>> fw.close()

This creates 6 files:

>>> import glob
>>> len(glob.glob("f.edv*.vdif"))
6

We can read the file sequence using the same sequencer. In reading mode, the sequencer determines the number of files by finding the largest file available that fits the template:

>>> fr = vdif.open(filenames, 'rs', sample_rate=fh.sample_rate)
>>> fr.header0.time == fh.header0.time
True
>>> np.all(fr.read() == d)
True
>>> fr.close()
>>> fh.close()  # Close sample file as well.

Because DADA and GUPPI data are usually stored in file sequences with names derived from header values - eg. ‘puppi_58132_J1810+1744_2176.0010.raw’, their format openers have template support built-in. For usage details, please see the API entries for baseband.dada.open and baseband.guppi.open.

Diagnosing problems with baseband files

Little is more annoying than starting a very long analysis script only to find the reader crashed with an error near the end. Unfortunately, while there is only one way for success, there are many for failure. Some, though, can be found by inspecting files. To see what would show up for a file that misses a frame, we first construct one:

>>> from astropy.time import Time
>>> from baseband import vdif
>>> fc = vdif.open('corrupt.vdif', 'ws', edv=1, nthread=2,
...                bps=8, samples_per_frame=16,
...                time=Time('J2010'), sample_rate=16*u.kHz)
>>> fc.write(np.zeros((8000, 2)))
>>> fc.fh_raw.seek(-100, 1)
47900
>>> fc.write(np.zeros((8000, 2)))
>>> fc.close()

Here, rewinding the internal raw file pointer a bit to simulate “missing bytes” is an implementation detail that one should not rely on!

Now check its info:

>>> fh = baseband.vdif.open('corrupt.vdif', 'rs', verify=True)
>>> fh.info.readable
False
>>> fh.info
Stream information:
start_time = 2009-12-31T23:58:53.816000000
stop_time = 2009-12-31T23:58:54.816000000
sample_rate = 0.016 MHz
shape = (16000, 2)
format = vdif
bps = 8
complex_data = False
verify = True
readable = False

checks:  decodable: True
         continuous: False

errors:  continuous: While reading at 7968: AssertionError()

warnings:  number_of_frames: file contains non-integer number (1997.9166666666667) of frames

File information:
edv = 1
thread_ids = [0, 1]
frame_rate = 1000.0 Hz
samples_per_frame = 16
sample_shape = (2, 1)
>>> fh.close()

In detail, the error is given for a position earlier than the one we corrupted, because internally baseband reads a frame ahead since a corrupted frame typically means something is bad before as well.

This particular problem is not bad, since the VDIF reader can deal with missing frames. Indeed, when one opens the file with the default verify='fix', one gets:

>>> fh = baseband.vdif.open('corrupt.vdif', 'rs')
>>> fh.info
Stream information:
start_time = 2009-12-31T23:58:53.816000000
stop_time = 2009-12-31T23:58:54.816000000
sample_rate = 0.016 MHz
shape = (16000, 2)
format = vdif
bps = 8
complex_data = False
verify = fix
readable = True

checks:  decodable: True
         continuous: fixable gaps

warnings:  number_of_frames: file contains non-integer number (1997.9166666666667) of frames
           continuous: While reading at 7968: problem loading frame set 498. Thread(s) [1] missing; set to invalid.

File information:
edv = 1
thread_ids = [0, 1]
frame_rate = 1000.0 Hz
samples_per_frame = 16
sample_shape = (2, 1)
>>> fh.close()

Glossary

channel

A single component of the complete sample, or a stream thereof. They typically represent one frequency sub-band, the output from a single antenna, or (for channelized data) one spectral or Fourier channel, ie. one part of a Fourier spectrum.

complete sample

Set of all component samples - ie. from all threads, polarizations, channels, etc. - for one point in time. Its dimensions are given by the sample shape.

component

One individual thread and channel, or one polarization and channel, etc. Component samples each occupy one element in decoded data arrays. A component sample is composed of one elementary sample if it is real, and two if it is complex.

data frame

A block of time-sampled data, or payload, accompanied by a header. “Frame” for short.

data frameset

In the VDIF format, the set of all data frames representing the same segment of time. Each data frame consists of sets of channels from different threads.

elementary sample

The smallest subdivision of a complete sample, i.e. the real / imaginary part of one component of a complete sample.

header

Metadata accompanying a data frame.

payload

The data within a data frame.

sample

Data from one point in time. Complete samples contain samples from all components, while elementary samples are one part of one component.

sample rate

Rate of complete samples.

sample shape

The lengths of the dimensions of the complete sample.

squeezing

The removal of any dimensions of length unity from decoded data.

stream

Timeseries of samples; may refer to all of, or a subsection of, the dataset.

subset

A subset of a complete sample, in particular one defined by the user for selective decoding.

thread

A collection of channels from the complete sample, or a stream thereof. For VDIF, each thread is carried by a separate (set of) data frame(s).

Specific File Formats

Baseband’s code is subdivided into its supported file formats, and the following sections contain format specifications, usage notes, troubleshooting help and APIs for each.

VDIF

The VLBI Data Interchange Format (VDIF) was introduced in 2009 to standardize VLBI data transfer and storage. Detailed specifications are found in VDIF’s specification document.

File Structure

A VDIF file is composed of data frames. Each has a header of eight 32-bit words (32 bytes; the exception is the “legacy VDIF” format, which is four words, or 16 bytes, long), and a payload that ranges from 32 bytes to ~134 megabytes. Both are little-endian. The first four words of a VDIF header hold the same information in all VDIF files, but the last four words hold optional user-defined data. The layout of these four words is specified by the file’s extended-data version, or EDV. More detailed information on the header can be found in the tutorial for supporting a new VDIF EDV.

A data frame may carry one or multiple channels, and a stream of data frames all carrying the same (set of) channels is known as a thread and denoted by its thread ID. The collection of frames representing the same time segment (and all possible thread IDs) is called a data frameset (or just “frameset”).

Strict time and thread ID ordering of frames in the stream, while considered part of VDIF best practices, is not mandated, and cannot be guaranteed during data transmission over the internet.

Usage Notes

This section covers reading and writing VDIF files with Baseband; general usage can be found under the Using Baseband section. For situations in which one is unsure of a file’s format, Baseband features the general baseband.open and baseband.file_info functions, which are also discussed in Using Baseband. The examples below use the small sample file baseband/data/sample.vdif, and the numpy, astropy.units, and baseband.vdif modules:

>>> import numpy as np
>>> from baseband import vdif
>>> import astropy.units as u
>>> from baseband.data import SAMPLE_VDIF

Simple reading and writing of VDIF files can be done entirely using open. Opening in binary mode provides a normal file reader, but extended with methods to read a VDIFFrameSet data container for storing a frame set as well as VDIFFrame one for storing a single frame:

>>> fh = vdif.open(SAMPLE_VDIF, 'rb')
>>> fs = fh.read_frameset()
>>> fs.data.shape
(20000, 8, 1)
>>> fr = fh.read_frame()
>>> fr.data.shape
(20000, 1)
>>> fh.close()

(As with other formats, fr.data is a read-only property of the frame.)

Opening in stream mode wraps the low-level routines such that reading and writing is in units of samples. It also provides access to header information:

>>> fh = vdif.open(SAMPLE_VDIF, 'rs')
>>> fh
<VDIFStreamReader name=... offset=0
    sample_rate=32.0 MHz, samples_per_frame=20000,
    sample_shape=SampleShape(nthread=8),
    bps=2, complex_data=False, edv=3, station=65532,
    start_time=2014-06-16T05:56:07.000000000>
>>> d = fh.read(12)
>>> d.shape
(12, 8)
>>> d[:, 0].astype(int)  # first thread
array([-1, -1,  3, -1,  1, -1,  3, -1,  1,  3, -1,  1])
>>> fh.close()

To set up a file for writing needs quite a bit of header information. Not coincidentally, what is given by the reader above suffices:

>>> from astropy.time import Time
>>> fw = vdif.open('try.vdif', 'ws', sample_rate=32*u.MHz,
...                samples_per_frame=20000, nchan=1, nthread=2,
...                complex_data=False, bps=2, edv=3, station=65532,
...                time=Time('2014-06-16T05:56:07.000000000'))
>>> with vdif.open(SAMPLE_VDIF, 'rs', subset=[1, 3]) as fh:
...    d = fh.read(20000)  # Get some data to write
>>> fw.write(d)
>>> fw.close()
>>> fh = vdif.open('try.vdif', 'rs')
>>> d2 = fh.read(12)
>>> np.all(d[:12] == d2)
True
>>> fh.close()

Here is a simple example to copy a VDIF file. We use the sort=False option to ensure the frames are written exactly in the same order, so the files should be identical:

>>> with vdif.open(SAMPLE_VDIF, 'rb') as fr, vdif.open('try.vdif', 'wb') as fw:
...     while True:
...         try:
...             fw.write_frameset(fr.read_frameset(sort=False))
...         except:
...             break

For small files, one could just do:

>>> with vdif.open(SAMPLE_VDIF, 'rs') as fr, \
...         vdif.open('try.vdif', 'ws', header0=fr.header0,
...                   sample_rate=fr.sample_rate,
...                   nthread=fr.sample_shape.nthread) as fw:
...     fw.write(fr.read())

This copies everything to memory, though, and some header information is lost.

Troubleshooting

In situations where the VDIF files being handled are corrupted or modified in an unusual way, using open will likely lead to an exception being raised or to unexpected behavior. In such cases, it may still be possible to read in the data. Below, we provide a few solutions and workarounds to do so.

Note

This list is certainly incomplete. If you have an issue (solved or otherwise) you believe should be on this list, please e-mail the contributors.

AssertionError when checking EDV in header verify function

All VDIF header classes (other than VDIFLegacyHeader) check, using their verify function, that the EDV read from file matches the class EDV. If they do not, the following line

assert self.edv is None or self.edv == self['edv']

returns an AssertionError. If this occurs because the VDIF EDV is not yet supported by Baseband, support can be added by implementing a custom header class. If the EDV is supported, but the header deviates from the format found in the VLBI.org EDV registry, the best solution is to create a custom header class, then override the subclass selector in VDIFHeader. Tutorials for doing either can be found here.

EOFError encountered in _get_frame_rate when reading

When the sample rate is not input by the user and cannot be deduced from header information (if EDV = 1 or, the sample rate is found in the header), Baseband tries to determine the frame rate using the private method _get_frame_rate in VDIFStreamReader (and then multiply by the samples per frame to obtain the sample rate). This function raises EOFError if the file contains less than one second of data, or is corrupt. In either case the file can be opened still by explicitly passing in the sample rate to open via the sample_rate keyword.

Reference/API

baseband.vdif Package

VLBI Data Interchange Format (VDIF) reader/writer

For the VDIF specification, see https://vlbi.org/vlbi-standards/vdif/

Functions

open(name[, mode])

Open VDIF file(s) for reading or writing.

Classes

VDIFFrame(header, payload[, valid, verify])

Representation of a VDIF data frame, consisting of a header and payload.

VDIFFrameSet(frames[, header0])

Representation of a set of VDIF frames, combining different threads.

VDIFHeader(words[, edv, verify])

VDIF Header, supporting different Extended Data Versions.

VDIFPayload(words[, header, nchan, bps, …])

Container for decoding and encoding VDIF payloads.

Class Inheritance Diagram

Inheritance diagram of baseband.vdif.frame.VDIFFrame, baseband.vdif.frame.VDIFFrameSet, baseband.vdif.header.VDIFHeader, baseband.vdif.payload.VDIFPayload

baseband.vdif.header Module

Definitions for VLBI VDIF Headers.

Implements a VDIFHeader class used to store header words, and decode/encode the information therein.

For the VDIF specification, see https://www.vlbi.org/vdif

Classes

VDIFHeader(words[, edv, verify])

VDIF Header, supporting different Extended Data Versions.

VDIFBaseHeader(words[, edv, verify])

Base for non-legacy VDIF headers that use 8 32-bit words.

VDIFSampleRateHeader(words[, edv, verify])

Base for VDIF headers that include the sample rate (EDV= 1, 3, 4).

VDIFLegacyHeader(words[, edv, verify])

Legacy VDIF header that uses only 4 32-bit words.

VDIFHeader0(words[, edv, verify])

VDIF Header for EDV=0.

VDIFHeader1(words[, edv, verify])

VDIF Header for EDV=1.

VDIFHeader2(words[, edv, verify])

VDIF Header for EDV=2.

VDIFHeader3(words[, edv, verify])

VDIF Header for EDV=3.

VDIFMark5BHeader(words[, edv, verify])

Mark 5B over VDIF (EDV=0xab).

Variables

VDIF_HEADER_CLASSES

Dict for storing VDIF header class definitions, indexed by their EDV.

Class Inheritance Diagram

Inheritance diagram of baseband.vdif.header.VDIFHeader, baseband.vdif.header.VDIFBaseHeader, baseband.vdif.header.VDIFSampleRateHeader, baseband.vdif.header.VDIFLegacyHeader, baseband.vdif.header.VDIFHeader0, baseband.vdif.header.VDIFHeader1, baseband.vdif.header.VDIFHeader2, baseband.vdif.header.VDIFHeader3, baseband.vdif.header.VDIFMark5BHeader

baseband.vdif.payload Module

Definitions for VLBI VDIF payloads.

Implements a VDIFPayload class used to store payload words, and decode to or encode from a data array.

See the VDIF specification page for payload specifications.

Functions

init_luts()

Sets up the look-up tables for levels as a function of input byte.

decode_1bit(words)

decode_2bit(words)

Decodes data stored using 2 bits per sample.

decode_4bit(words)

Decodes data stored using 4 bits per sample.

encode_1bit(values)

Encodes values using 1 bit per sample, packing the result into bytes.

encode_2bit(values)

Encodes values using 2 bits per sample, packing the result into bytes.

encode_4bit(values)

Encodes values using 4 bits per sample, packing the result into bytes.

Classes

VDIFPayload(words[, header, nchan, bps, …])

Container for decoding and encoding VDIF payloads.

Class Inheritance Diagram

Inheritance diagram of baseband.vdif.payload.VDIFPayload

baseband.vdif.frame Module

Definitions for VLBI VDIF frames and frame sets.

Implements a VDIFFrame class that can be used to hold a header and a payload, providing access to the values encoded in both. Also, define a VDIFFrameSet class that combines a set of frames from different threads.

For the VDIF specification, see https://www.vlbi.org/vdif

Classes

VDIFFrame(header, payload[, valid, verify])

Representation of a VDIF data frame, consisting of a header and payload.

VDIFFrameSet(frames[, header0])

Representation of a set of VDIF frames, combining different threads.

Class Inheritance Diagram

Inheritance diagram of baseband.vdif.frame.VDIFFrame, baseband.vdif.frame.VDIFFrameSet

baseband.vdif.file_info Module

The VDIFFileReaderInfo property.

Includes information about threads and frame sets.

Classes

VDIFFileReaderInfo([parent])

Class Inheritance Diagram

Inheritance diagram of baseband.vdif.file_info.VDIFFileReaderInfo

baseband.vdif.base Module
Functions

open(name[, mode])

Open VDIF file(s) for reading or writing.

Classes

VDIFFileReader(fh_raw)

Simple reader for VDIF files.

VDIFFileWriter(fh_raw)

Simple writer for VDIF files.

VDIFStreamBase(fh_raw, header0[, …])

Base for VDIF streams.

VDIFStreamReader(fh_raw[, sample_rate, …])

VLBI VDIF format reader.

VDIFStreamWriter(fh_raw[, header0, …])

VLBI VDIF format writer.

Class Inheritance Diagram

Inheritance diagram of baseband.vdif.base.VDIFFileReader, baseband.vdif.base.VDIFFileWriter, baseband.vdif.base.VDIFStreamBase, baseband.vdif.base.VDIFStreamReader, baseband.vdif.base.VDIFStreamWriter

MARK 5B

The Mark 5B format is the output format of the Mark 5B disk-based VLBI data system. It is described in its design specifications.

File Structure

Each data frame consists of a header consisting of four 32-bit words (16 bytes) followed by a payload of 2500 32-bit words (10000 bytes). The header contains a sync word, frame number, and timestamp (accurate to 1 ms), as well as user-specified data; see Sec. 1 of the design specifications for details. The payload supports \(2^n\) bit streams, for \(0 \leq n \leq 5\), and the first sample of each stream corresponds precisely to the header time. elementary samples may be 1 or 2 bits in size, with the latter being stored in two successive bit streams. The number of channels is equal to the number of bit-streams divided by the number of bits per elementary sample (Baseband currently only supports files where all bit-streams are active). Files begin at a header (unlike for Mark 4), and an integer number of frames fit within 1 second.

The Mark 5B system also outputs files with the active bit-stream mask, number of frames per second, and observational metadata (Sec. 1.3 of the design specifications). Baseband does not yet use these files, and instead requires the user specify, for example, the sample rate.

Usage

This section covers reading and writing Mark 5B files with Baseband; general usage can be found under the Using Baseband section. For situations in which one is unsure of a file’s format, Baseband features the general baseband.open and baseband.file_info functions, which are also discussed in Using Baseband. The examples below use the small sample file baseband/data/sample.m5b, and the numpy, astropy.units, astropy.time.Time, and baseband.mark5b modules:

>>> import numpy as np
>>> import astropy.units as u
>>> from astropy.time import Time
>>> from baseband import mark5b
>>> from baseband.data import SAMPLE_MARK5B

Opening a Mark 5B file with open in binary mode provides a normal file reader extended with methods to read a Mark5BFrame. The number of channels, kiloday (thousands of MJD) and number of bits per sample must all be passed when using read_frame:

>>> fb = mark5b.open(SAMPLE_MARK5B, 'rb', kday=56000, nchan=8)
>>> frame = fb.read_frame()
>>> frame.shape
(5000, 8)
>>> fb.close()

Our sample file has 2-bit component samples, which is also the default for read_frame, so it does not need to be passed. Also, we may pass a reference Time object within 500 days of the observation start time to ref_time, rather than kday.

Opening as a stream wraps the low-level routines such that reading and writing is in units of samples. It also provides access to header information. Here, we also must provide nchan, sample_rate, and ref_time or kday:

>>> fh = mark5b.open(SAMPLE_MARK5B, 'rs', sample_rate=32*u.MHz, nchan=8,
...                  ref_time=Time('2014-06-13 12:00:00'))
>>> fh
<Mark5BStreamReader name=... offset=0
    sample_rate=32.0 MHz, samples_per_frame=5000,
    sample_shape=SampleShape(nchan=8), bps=2,
    start_time=2014-06-13T05:30:01.000000000>
>>> header0 = fh.header0    # To be used for writing, below.
>>> d = fh.read(10000)
>>> d.shape
(10000, 8)
>>> d[0, :3]    
array([-3.316505, -1.      ,  1.      ], dtype=float32)
>>> fh.close()

When writing to file, we again need to pass in sample_rate and nchan, though time can either be passed explicitly or inferred from the header:

>>> fw = mark5b.open('test.m5b', 'ws', header0=header0,
...                  sample_rate=32*u.MHz, nchan=8)
>>> fw.write(d)
>>> fw.close()
>>> fh = mark5b.open('test.m5b', 'rs', sample_rate=32*u.MHz,
...                  kday=57000, nchan=8)
>>> np.all(fh.read() == d)
True
>>> fh.close()

Reference/API

baseband.mark5b Package

Mark5B VLBI data reader.

Code inspired by Walter Brisken’s mark5access. See https://github.com/demorest/mark5access.

Also, for the Mark5B design, see https://www.haystack.mit.edu/tech/vlbi/mark5/mark5_memos/019.pdf

Functions

open(name[, mode])

Open Mark5B file(s) for reading or writing.

Classes

Mark5BFrame(header, payload[, valid, verify])

Representation of a Mark 5B frame, consisting of a header and payload.

Mark5BHeader(words[, kday, ref_time, verify])

Decoder/encoder of a Mark5B Frame Header.

Mark5BPayload(words[, nchan, bps, complex_data])

Container for decoding and encoding VDIF payloads.

Class Inheritance Diagram

Inheritance diagram of baseband.mark5b.frame.Mark5BFrame, baseband.mark5b.header.Mark5BHeader, baseband.mark5b.payload.Mark5BPayload

baseband.mark5b.header Module

Definitions for VLBI Mark5B Headers.

Implements a Mark5BHeader class used to store header words, and decode/encode the information therein.

For the specification, see https://www.haystack.mit.edu/tech/vlbi/mark5/docs/Mark%205B%20users%20manual.pdf

Classes

Mark5BHeader(words[, kday, ref_time, verify])

Decoder/encoder of a Mark5B Frame Header.

Variables

CRC16

CRC polynomial used for Mark 5B Headers, as a check on the time code.

crc16(stream)

Cyclic Redundancy Check.

Class Inheritance Diagram

Inheritance diagram of baseband.mark5b.header.Mark5BHeader

baseband.mark5b.payload Module

Definitions for VLBI Mark 5B payloads.

Implements a Mark5BPayload class used to store payload words, and decode to or encode from a data array.

For the specification, see https://www.haystack.mit.edu/tech/vlbi/mark5/docs/Mark%205B%20users%20manual.pdf

Functions

init_luts()

Set up the look-up tables for levels as a function of input byte.

decode_1bit(words)

decode_2bit(words)

encode_1bit(values)

Encodes values using 1 bit per sample, packing the result into bytes.

encode_2bit(values)

Generic encoder for data stored using two bits.

Classes

Mark5BPayload(words[, nchan, bps, complex_data])

Container for decoding and encoding VDIF payloads.

Class Inheritance Diagram

Inheritance diagram of baseband.mark5b.payload.Mark5BPayload

baseband.mark5b.frame Module

Definitions for VLBI Mark 5B frames.

Implements a Mark5BFrame class that can be used to hold a header and a payload, providing access to the values encoded in both.

For the specification, see https://www.haystack.mit.edu/tech/vlbi/mark5/docs/Mark%205B%20users%20manual.pdf

Classes

Mark5BFrame(header, payload[, valid, verify])

Representation of a Mark 5B frame, consisting of a header and payload.

Class Inheritance Diagram

Inheritance diagram of baseband.mark5b.frame.Mark5BFrame

baseband.mark5b.file_info Module

The Mark5BFileReaderInfo property.

Includes information about what is needed to calcuate times.

Classes

Mark5BFileReaderInfo([parent])

Class Inheritance Diagram

Inheritance diagram of baseband.mark5b.file_info.Mark5BFileReaderInfo

baseband.mark5b.base Module
Functions

open(name[, mode])

Open Mark5B file(s) for reading or writing.

Classes

Mark5BFileReader(fh_raw[, kday, ref_time, …])

Simple reader for Mark 5B files.

Mark5BFileWriter(fh_raw)

Simple writer for Mark 5B files.

Mark5BStreamBase(fh_raw, header0[, …])

Base for Mark 5B streams.

Mark5BStreamReader(fh_raw[, sample_rate, …])

VLBI Mark 5B format reader.

Mark5BStreamWriter(fh_raw[, header0, …])

VLBI Mark 5B format writer.

Class Inheritance Diagram

Inheritance diagram of baseband.mark5b.base.Mark5BFileReader, baseband.mark5b.base.Mark5BFileWriter, baseband.mark5b.base.Mark5BStreamBase, baseband.mark5b.base.Mark5BStreamReader, baseband.mark5b.base.Mark5BStreamWriter

MARK 4

The Mark 4 format is the output format of the MIT Haystack Observatory’s Mark 4 VLBI magnetic tape-based data acquisition system, and one output format of its successor, the Mark 5A hard drive-based system. The format’s specification is in the Mark IIIA/IV/VLBA design specifications.

Baseband currently only supports files that have been parity-stripped and corrected for barrel roll and data modulation.

File Structure

Mark 4 files contain up to 64 concurrent data “tracks”. Tracks are divided into 22500-bit “tape frames”, each of which consists of a 160-bit header followed by a 19840-bit payload. The header includes a timestamp (accurate to 1.25 ms), track ID, sideband, and fan-out/in factor (see below); the details of these can be found in 2.1.1 - 2.1.3 in the design specifications. The payload consists of a 1-bit stream. When recording 2-bit elementary samples, the data is split into two tracks, with one carrying the sign bit, and the other the magnitude bit.

The header takes the place of the first 160 bits of payload data, so that the first sample occurs fanout * 160 sample times after the header time. This means that a Mark 4 stream is not contiguous in time. The length of one frame ranges from 1.25 ms to 160 ms in octave steps (which ensures an integer number of frames falls within 1 minute), setting the maximum sample rate per track to 18 megabits/track/s.

Data from a single channel may be distributed to multiple tracks - “fan-out” - or multiple channels fed to one track - “fan-in”. Fan-out is used when sampling at rates higher than 18 megabits/track/s. Baseband currently only supports tracks using fan-out (“longitudinal data format”).

Baseband reconstructs the tracks into channels (reconstituting 2-bit data from two tracks into a single channel if necessary) and combines tape frame headers into a single data frame header.

Usage

This section covers reading and writing Mark 4 files with Baseband; general usage can be found under the Using Baseband section. For situations in which one is unsure of a file’s format, Baseband features the general baseband.open and baseband.file_info functions, which are also discussed in Using Baseband. The examples below use the small sample file baseband/data/sample.m4, and the numpy, astropy.units, astropy.time.Time, and baseband.mark4 modules:

>>> import numpy as np
>>> import astropy.units as u
>>> from astropy.time import Time
>>> from baseband import mark4
>>> from baseband.data import SAMPLE_MARK4

Opening a Mark 4 file with open in binary mode provides a normal file reader but extended with methods to read a Mark4Frame. Mark 4 files generally do not start (or end) at a frame boundary, so in binary mode one has to find the first header using find_header (which will also determine the number of Mark 4 tracks, if not given explicitly). Since Mark 4 files do not store the full time information, one must pass either the the decade the data was taken, or an equivalent reference Time object:

>>> fb = mark4.open(SAMPLE_MARK4, 'rb', decade=2010)
>>> fb.find_header()  # Locate first header and determine ntrack.
<Mark4Header bcd_headstack1: [0x3344]*64,
             bcd_headstack2: [0x1122]*64,
             headstack_id: [0, ..., 1],
             bcd_track_id: [0x2, ..., 0x33],
             fan_out: [0, ..., 3],
             magnitude_bit: [False, ..., True],
             lsb_output: [True]*64,
             converter_id: [0, ..., 7],
             time_sync_error: [False]*64,
             internal_clock_error: [False]*64,
             processor_time_out_error: [False]*64,
             communication_error: [False]*64,
             _1_11_1: [False]*64,
             _1_10_1: [False]*64,
             track_roll_enabled: [False]*64,
             sequence_suspended: [False]*64,
             system_id: [108]*64,
             _1_0_1_sync: [False]*64,
             sync_pattern: [0xffffffff]*64,
             bcd_unit_year: [0x4]*64,
             bcd_day: [0x167]*64,
             bcd_hour: [0x7]*64,
             bcd_minute: [0x38]*64,
             bcd_second: [0x12]*64,
             bcd_fraction: [0x475]*64,
             crc: [0xea6, ..., 0x212]>
>>> fb.ntrack
64
>>> fb.tell()
2696
>>> frame = fb.read_frame()
>>> frame.shape
(80000, 8)
>>> frame.header.time
<Time object: scale='utc' format='yday' value=2014:167:07:38:12.47500>
>>> fb.close()

Opening in stream mode automatically finds the first frame, and wraps the low-level routines such that reading and writing is in units of samples. It also provides access to header information. Here we pass a reference Time object within 4 years of the observation start time to ref_time, rather than a decade:

>>> fh = mark4.open(SAMPLE_MARK4, 'rs', ref_time=Time('2013:100:23:00:00'))
>>> fh
<Mark4StreamReader name=... offset=0
    sample_rate=32.0 MHz, samples_per_frame=80000,
    sample_shape=SampleShape(nchan=8), bps=2,
    start_time=2014-06-16T07:38:12.47500>
>>> d = fh.read(6400)
>>> d.shape
(6400, 8)
>>> d[635:645, 0].astype(int)  # first channel
array([ 0,  0,  0,  0,  0, -1,  1,  3,  1, -1])
>>> fh.close()

As mentioned in the File Structure section, because the header takes the place of the first 160 samples of each track, the first payload sample occurs fanout * 160 sample times after the header time. The stream reader includes these overwritten samples as invalid data (zeros, by default):

>>> np.array_equal(d[:640], np.zeros((640,) + d.shape[1:]))
True

When writing to file, we need to pass in the sample rate in addition to decade. The number of tracks can be inferred from the header:

>>> fw = mark4.open('sample_mark4_segment.m4', 'ws', header0=frame.header,
...                 sample_rate=32*u.MHz, decade=2010)
>>> fw.write(frame.data)
>>> fw.close()
>>> fh = mark4.open('sample_mark4_segment.m4', 'rs',
...                 sample_rate=32.*u.MHz, decade=2010)
>>> np.all(fh.read(80000) == frame.data)
True
>>> fh.close()

Note that above we had to pass in the sample rate even when opening the file for reading; this is because there is only a single frame in the file, and hence the sample rate cannot be inferred automatically.

Reference/API

baseband.mark4 Package

Mark 4 VLBI data reader.

Code inspired by Walter Brisken’s mark5access. See https://github.com/demorest/mark5access.

The format itself is described in detail in https://www.haystack.mit.edu/tech/vlbi/mark5/docs/230.3.pdf

Functions

open(name[, mode])

Open Mark4 file(s) for reading or writing.

Classes

Mark4Frame(header, payload[, valid, verify])

Representation of a Mark 4 frame, consisting of a header and payload.

Mark4Header(words[, ntrack, decade, …])

Decoder/encoder of a Mark 4 Header, containing all streams.

Mark4Payload(words[, header, nchan, bps, fanout])

Container for decoding and encoding Mark 4 payloads.

Class Inheritance Diagram

Inheritance diagram of baseband.mark4.frame.Mark4Frame, baseband.mark4.header.Mark4Header, baseband.mark4.payload.Mark4Payload

baseband.mark4.header Module

Definitions for VLBI Mark 4 Headers.

Implements a Mark4Header class used to store header words, and decode/encode the information therein.

For the specification of tape Mark 4 format, see https://www.haystack.mit.edu/tech/vlbi/mark5/docs/230.3.pdf

A little bit on the disk representation is at https://ui.adsabs.harvard.edu/abs/2003ASPC..306..123W

Functions

stream2words(stream[, track])

Convert a stream of integers to uint32 header words.

words2stream(words)

Convert a set of uint32 header words to a stream of integers.

Classes

Mark4TrackHeader(words[, decade, ref_time, …])

Decoder/encoder of a Mark 4 Track Header.

Mark4Header(words[, ntrack, decade, …])

Decoder/encoder of a Mark 4 Header, containing all streams.

Variables

CRC12

CRC polynomial used for Mark 4 Headers.

crc12(stream)

Cyclic Redundancy Check for a bitstream.

Class Inheritance Diagram

Inheritance diagram of baseband.mark4.header.Mark4TrackHeader, baseband.mark4.header.Mark4Header

baseband.mark4.payload Module

Definitions for VLBI Mark 4 payloads.

Implements a Mark4Payload class used to store payload words, and decode to or encode from a data array.

For the specification, see https://www.haystack.mit.edu/tech/vlbi/mark5/docs/230.3.pdf

Functions

reorder32(x)

Reorder 32-track bits to bring signs & magnitudes together.

reorder64(x)

Reorder 64-track bits to bring signs & magnitudes together.

init_luts()

Set up the look-up tables for levels as a function of input byte.

decode_8chan_2bit_fanout4(frame)

Decode payload for 8 channels using 2 bits, fan-out 4 (64 tracks).

encode_8chan_2bit_fanout4(values)

Encode payload for 8 channels using 2 bits, fan-out 4 (64 tracks).

Classes

Mark4Payload(words[, header, nchan, bps, fanout])

Container for decoding and encoding Mark 4 payloads.

Class Inheritance Diagram

Inheritance diagram of baseband.mark4.payload.Mark4Payload

baseband.mark4.frame Module

Definitions for VLBI Mark 4 payloads.

Implements a Mark4Payload class used to store payload words, and decode to or encode from a data array.

For the specification, see https://www.haystack.mit.edu/tech/vlbi/mark5/docs/230.3.pdf

Classes

Mark4Frame(header, payload[, valid, verify])

Representation of a Mark 4 frame, consisting of a header and payload.

Class Inheritance Diagram

Inheritance diagram of baseband.mark4.frame.Mark4Frame

baseband.mark4.file_info Module

The Mark4FileReaderInfo property.

Includes information about what is needed to calcuate times, number of tracks and offset of first header.

Classes

Mark4FileReaderInfo([parent])

Standardized information on Mark 4 file readers.

Class Inheritance Diagram

Inheritance diagram of baseband.mark4.file_info.Mark4FileReaderInfo

baseband.mark4.base Module
Functions

open(name[, mode])

Open Mark4 file(s) for reading or writing.

Classes

Mark4FileReader(fh_raw[, ntrack, decade, …])

Simple reader for Mark 4 files.

Mark4FileWriter(fh_raw)

Simple writer for Mark 4 files.

Mark4StreamBase(fh_raw, header0[, …])

Base for Mark 4 streams.

Mark4StreamReader(fh_raw[, sample_rate, …])

VLBI Mark 4 format reader.

Mark4StreamWriter(fh_raw[, header0, …])

VLBI Mark 4 format writer.

Class Inheritance Diagram

Inheritance diagram of baseband.mark4.base.Mark4FileReader, baseband.mark4.base.Mark4FileWriter, baseband.mark4.base.Mark4StreamBase, baseband.mark4.base.Mark4StreamReader, baseband.mark4.base.Mark4StreamWriter

DADA

Distributed Acquisition and Data Analysis (DADA) format data files contain a single data frame consisting of an ASCII header of typically 4096 bytes followed by a payload. DADA is defined by its software specification and actual usage; files are described by an ASCII header.

Usage

This section covers reading and writing DADA files with Baseband; general usage is covered in the Using Baseband section. For situations in which one is unsure of a file’s format, Baseband features the general baseband.open and baseband.file_info functions, which are also discussed in Using Baseband. The examples below use the sample file baseband/data/sample.dada, and the the astropy.units and baseband.dada modules:

>>> from baseband import dada
>>> import astropy.units as u
>>> from baseband.data import SAMPLE_DADA

Single files can be opened with open in binary mode. DADA files typically consist of just a single header and payload, and can be read into a single DADAFrame.

>>> fb = dada.open(SAMPLE_DADA, 'rb')
>>> frame = fb.read_frame()
>>> frame.shape
(16000, 2, 1)
>>> frame[:3].squeeze()
array([[ -38.-38.j,  -38.-38.j],
       [ -38.-38.j,  -40. +0.j],
       [-105.+60.j,   85.-15.j]], dtype=complex64)
>>> fb.close()

Since the files can be quite large, the payload is mapped (with numpy.memmap), so that if one accesses part of the data, only the corresponding parts of the encoded payload are loaded into memory (since the sample file is encoded using 8 bits, the above example thus loads 12 bytes into memory).

Opening in stream mode wraps the low-level routines such that reading and writing is in units of samples, and provides access to header information:

>>> fh = dada.open(SAMPLE_DADA, 'rs')
>>> fh
<DADAStreamReader name=... offset=0
    sample_rate=16.0 MHz, samples_per_frame=16000,
    sample_shape=SampleShape(npol=2), bps=8,
    start_time=2013-07-02T01:39:20.000>
>>> d = fh.read(10000)
>>> d.shape
(10000, 2)
>>> d[:3]
array([[ -38.-38.j,  -38.-38.j],
       [ -38.-38.j,  -40. +0.j],
       [-105.+60.j,   85.-15.j]], dtype=complex64)
>>> fh.close()

To set up a file for writing as a stream is possible as well:

>>> from astropy.time import Time
>>> fw = dada.open('{utc_start}.{obs_offset:016d}.000000.dada', 'ws',
...                sample_rate=16*u.MHz, samples_per_frame=5000,
...                npol=2, nchan=1, bps=8, complex_data=True,
...                time=Time('2013-07-02T01:39:20.000'))
>>> fw.write(d)
>>> fw.close()
>>> import os
>>> [f for f in sorted(os.listdir('.')) if f.startswith('2013')]
['2013-07-02-01:39:20.0000000000000000.000000.dada',
 '2013-07-02-01:39:20.0000000000020000.000000.dada']
>>> fr = dada.open('2013-07-02-01:39:20.{obs_offset:016d}.000000.dada', 'rs')
>>> d2 = fr.read()
>>> (d == d2).all()
True
>>> fr.close()

Here, we have used an even smaller size of the payload, to show how one can define multiple files. DADA data are typically stored in sequences of files. If one passes a time-ordered list or tuple of filenames to open, it uses sequentialfile.open to access the sequence. If, as above, one passes a template string, open uses DADAFileNameSequencer to create and use a filename sequencer. (See API links for further details.)

Further details

DADA Headers

The specification of “Distributed Acquisition and Data Analysis” (DADA) headers is part of the DADA software specification. In particular, its appendix B.3 defines expected header keywords, which we reproduce below. We separate those for which the meaning has been taken from comments in an actual DADA header from Effelsberg, as well as additional keywords found in that header that do not appear in the specification.

Keyword

Description

Primary (from appendix B.3 [Default])

HEADER

name of the header [DADA]

HDR_VERSION

version of the header [1.0]

HDR_SIZE

size of the header in bytes [4096]

INSTRUMENT

name of the instrument

PRIMARY

host name of the primary node on which the data were acquired

HOSTNAME

host name of the machine on which data were written

FILE_NAME

full path of the file to which data were written

FILE_SIZE

requested size of data files

FILE_NUMBER

number of data files written prior to this one

OBS_ID

the identifier for the observations

UTC_START

rising edge of the first sample (yyyy-mm-dd-hh:mm:ss)

MJD_START

the MJD of the first sample in the observation

OBS_OFFSET

the number of bytes from the start of the observation

OBS_OVERLAP

the amount by which neighbouring files overlap

Secondary (description from Effelsberg sample file)

TELESCOPE

name of the telescope

SOURCE

source name

FREQ

observation frequency

BW

bandwidth in MHz (-ve lower sb)

NPOL

number of polarizations observed

NBIT

number of bits per sample

NDIM

dimension of samples (2=complex, 1=real)

TSAMP

sampling interval in microseconds

RA

J2000 Right ascension of the source (hh:mm:ss.ss)

DEC

J2000 Declination of the source (ddd:mm:ss.s)

Other (found in Effelsberg sample file)

PIC_VERSION

Version of the PIC FPGA Software [1.0]

RECEIVER

frontend receiver

SECONDARY

secondary host name

NCHAN

number of channels here

RESOLUTION

a parameter that is unclear

DSB

(no description)

Reference/API

baseband.dada Package

Distributed Acquisition and Data Analysis (DADA) format reader/writer.

Functions

open(name[, mode])

Open DADA file(s) for reading or writing.

Classes

DADAFrame(header, payload[, valid, verify])

Representation of a DADA file, consisting of a header and payload.

DADAHeader(*args[, verify, mutable])

DADA baseband file format header.

DADAPayload(words[, header, sample_shape, …])

Container for decoding and encoding DADA payloads.

Class Inheritance Diagram

Inheritance diagram of baseband.dada.frame.DADAFrame, baseband.dada.header.DADAHeader, baseband.dada.payload.DADAPayload

baseband.dada.header Module

Definitions for DADA pulsar baseband headers.

Implements a DADAHeader class used to store header definitions in a FITS header, and read & write these from files.

The DADA headers are described in the DADA software specification, at http://psrdada.sourceforge.net/manuals/Specification.pdf

See also DADA Headers.

Classes

DADAHeader(*args[, verify, mutable])

DADA baseband file format header.

Class Inheritance Diagram

Inheritance diagram of baseband.dada.header.DADAHeader

baseband.dada.payload Module

Payload for DADA format.

Classes

DADAPayload(words[, header, sample_shape, …])

Container for decoding and encoding DADA payloads.

Class Inheritance Diagram

Inheritance diagram of baseband.dada.payload.DADAPayload

baseband.dada.frame Module
Classes

DADAFrame(header, payload[, valid, verify])

Representation of a DADA file, consisting of a header and payload.

Class Inheritance Diagram

Inheritance diagram of baseband.dada.frame.DADAFrame

baseband.dada.base Module
Functions

open(name[, mode])

Open DADA file(s) for reading or writing.

Classes

DADAFileNameSequencer(template[, header])

List-like generator of DADA filenames using a template.

DADAFileReader(fh_raw)

Simple reader for DADA files.

DADAFileWriter(fh_raw)

Simple writer/mapper for DADA files.

DADAStreamBase(fh_raw, header0[, squeeze, …])

Base for DADA streams.

DADAStreamReader(fh_raw[, squeeze, subset, …])

DADA format reader.

DADAStreamWriter(fh_raw, header0[, squeeze])

DADA format writer.

Class Inheritance Diagram

Inheritance diagram of baseband.dada.base.DADAFileNameSequencer, baseband.dada.base.DADAFileReader, baseband.dada.base.DADAFileWriter, baseband.dada.base.DADAStreamBase, baseband.dada.base.DADAStreamReader, baseband.dada.base.DADAStreamWriter

GUPPI

The GUPPI format is the output of the Green Bank Ultimate Pulsar Processing Instrument and any clones operating at other telescopes, such as PUPPI at the Arecibo Observatory. Baseband specifically supports GUPPI data taken in baseband mode, and is based off of DSPSR’s implementation. While general format specifications can be found on Paul Demorest’s site, some of the header information could be invalid or not applicable, particularly with older files.

Baseband currently only supports 8-bit elementary samples.

File Structure

Each GUPPI file contains multiple (typically 128) frames, with each frame consisting of an ASCII header composed of 80-character entries, followed by a binary payload (or “block”). The header’s length is variable, but always ends with “END” followed by 77 spaces.

How samples are stored in the payload depends on whether or not it is channels-first. A channels-first payload stores each channel’s stream in a contiguous data block, while a non-channels-first one groups the components of a complete sample together (like with other formats). In either case, for each channel polarization samples from the same point in time are stored adjacent to one another. At the end of each channel’s data is a section of overlap samples identical to the first samples in the next payload. Baseband retains these redundant samples when reading individual GUPPI frames, but removes them when reading files as a stream.

Usage

This section covers reading and writing GUPPI files with Baseband; general usage is covered in the Using Baseband section. For situations in which one is unsure of a file’s format, Baseband features the general baseband.open and baseband.file_info functions, which are also discussed in Using Baseband. The examples below use the sample PUPPI file baseband/data/sample_puppi.raw, and the the astropy.units and baseband.guppi modules:

>>> from baseband import guppi
>>> import astropy.units as u
>>> from baseband.data import SAMPLE_PUPPI

Single files can be opened with open in binary mode, which provides a normal file reader, but extended with methods to read a GUPPIFrame:

>>> fb = guppi.open(SAMPLE_PUPPI, 'rb')
>>> frame = fb.read_frame()
>>> frame.shape
(1024, 2, 4)
>>> frame[:3, 0, 1]    
array([-32.-10.j, -15.-14.j,   9.-13.j], dtype=complex64)
>>> fb.close()

Since the files can be quite large, the payload is mapped (with numpy.memmap), so that if one accesses part of the data, only the corresponding parts of the encoded payload are loaded into memory (since the sample file is encoded using 8 bits, the above example thus loads 6 bytes into memory).

Opening in stream mode wraps the low-level routines such that reading and writing is in units of samples, and provides access to header information:

>>> fh = guppi.open(SAMPLE_PUPPI, 'rs')
>>> fh
<GUPPIStreamReader name=... offset=0
    sample_rate=250.0 Hz, samples_per_frame=960,
    sample_shape=SampleShape(npol=2, nchan=4), bps=8,
    start_time=2018-01-14T14:11:33.000>
>>> d = fh.read()
>>> d.shape
(3840, 2, 4)
>>> d[:3, 0, 1]    
array([-32.-10.j, -15.-14.j,   9.-13.j], dtype=complex64)
>>> fh.close()

Note that fh.samples_per_frame represents the number of samples per frame excluding overlap samples, since the stream reader works on a linearly increasing sequence of samples. Frames themselves have access to the overlap, and fh.header0.samples_per_frame returns the number of samples per frame including overlap.

To set up a file for writing as a stream is possible as well. Overlap must be zero when writing (so we set samples_per_frame to its stream reader value from above):

>>> from astropy.time import Time
>>> fw = guppi.open('puppi_test.{file_nr:04d}.raw', 'ws',
...                 frames_per_file=2, sample_rate=250*u.Hz,
...                 samples_per_frame=960, pktsize=1024,
...                 time=Time(58132.59135416667, format='mjd'),
...                 npol=2, nchan=4)
>>> fw.write(d)
>>> fw.close()
>>> fr = guppi.open('puppi_test.{file_nr:04d}.raw', 'rs')
>>> d2 = fr.read()
>>> (d == d2).all()
True
>>> fr.close()

Here we show how to write a sequence of files by passing a string template to open, which prompts it to create and use a filename sequencer generated with GUPPIFileNameSequencer. One may also pass a time-ordered list or tuple of filenames to open. Unlike when writing DADA files, which have one frame per file, we specify the number of frames in one file using``frames_per_file``. Note that typically one does not have to pass PKTSIZE, the UDP data packet size (set by the observing mode), but the sample file has small enough frames that the default of 8192 bytes is too large. Baseband only uses PKTSIZE to double-check the sample offset of the frame, so PKTSIZE must be set to a value such that each payload, excluding overlap samples, contains an integer number of packets. (See API links for further details on how to read and write file sequences.)

Reference/API

baseband.guppi Package

Green Bank Ultimate Pulsar Processing Instrument (GUPPI) format reader/writer.

Functions

open(name[, mode])

Open GUPPI file(s) for reading or writing.

Classes

GUPPIFrame(header, payload[, valid, verify])

Representation of a GUPPI file, consisting of a header and payload.

GUPPIHeader(*args[, verify, mutable])

GUPPI baseband file format header.

GUPPIPayload(words[, header, sample_shape, …])

Container for decoding and encoding GUPPI payloads.

Class Inheritance Diagram

Inheritance diagram of baseband.guppi.frame.GUPPIFrame, baseband.guppi.header.GUPPIHeader, baseband.guppi.payload.GUPPIPayload

baseband.guppi.header Module

Definitions for GUPPI headers.

Implements a GUPPIHeader class that reads & writes FITS-like headers from file.

Classes

GUPPIHeader(*args[, verify, mutable])

GUPPI baseband file format header.

Class Inheritance Diagram

Inheritance diagram of baseband.guppi.header.GUPPIHeader

baseband.guppi.payload Module

Payload for GUPPI format.

Classes

GUPPIPayload(words[, header, sample_shape, …])

Container for decoding and encoding GUPPI payloads.

Class Inheritance Diagram

Inheritance diagram of baseband.guppi.payload.GUPPIPayload

baseband.guppi.frame Module
Classes

GUPPIFrame(header, payload[, valid, verify])

Representation of a GUPPI file, consisting of a header and payload.

Class Inheritance Diagram

Inheritance diagram of baseband.guppi.frame.GUPPIFrame

baseband.guppi.file_info Module

The GuppiFileReaderInfo property.

Overrides what can be gotten from the first header.

Classes

GUPPIFileReaderInfo([parent])

Class Inheritance Diagram

Inheritance diagram of baseband.guppi.file_info.GUPPIFileReaderInfo

baseband.guppi.base Module
Functions

open(name[, mode])

Open GUPPI file(s) for reading or writing.

Classes

GUPPIFileNameSequencer(template[, header])

List-like generator of GUPPI filenames using a template.

GUPPIFileReader(fh_raw)

Simple reader for GUPPI files.

GUPPIFileWriter(fh_raw)

Simple writer/mapper for GUPPI files.

GUPPIStreamBase(fh_raw, header0[, squeeze, …])

Base for GUPPI streams.

GUPPIStreamReader(fh_raw[, squeeze, subset, …])

GUPPI format reader.

GUPPIStreamWriter(fh_raw, header0[, squeeze])

GUPPI format writer.

Class Inheritance Diagram

Inheritance diagram of baseband.guppi.base.GUPPIFileNameSequencer, baseband.guppi.base.GUPPIFileReader, baseband.guppi.base.GUPPIFileWriter, baseband.guppi.base.GUPPIStreamBase, baseband.guppi.base.GUPPIStreamReader, baseband.guppi.base.GUPPIStreamWriter

GSB

The GMRT software backend (GSB) file format is the standard output of the initial correlator of the Giant Metrewave Radio Telescope (GMRT). The GSB design is described by Roy et al. (2010, Exper. Astron. 28:25-60) with further specifications and operating procedures given on the relevant GMRT/GSB pages.

File Structure

A GSB dataset consists of an ASCII file with a sequence of headers, and one or more accompanying binary data files. Each line in the header and its corresponding data comprise a data frame, though these do not have explicit divisions in the data files.

Baseband currently supports two forms of GSB data: rawdump, for storing real-valued raw voltage timestreams, and phased, for storing complex pre-channelized data from the GMRT in phased array baseband mode.

Data in rawdump format is stored in a binary file representing the voltage stream from one polarization of a single dish. Each such file is accompanied by a header file which contains GPS timestamps, in the form:

YYYY MM DD HH MM SS 0.SSSSSSSSS

In the default rawdump observing setup, samples are recorded at a rate of 33.3333… megasamples per second (Msps). Each sample is 4 bits in size, and two samples are grouped into bytes such that the oldest sample occupies the least significant bit. Each frame consists of 4 megabytes of data, or \(2^{23}\), samples; as such, the timespan of one frame is exactly 0.25165824 s.

Data in phased format is normally spread over four binary files and one accompanying header file. The binary files come in two pairs, one for each polarization, with the pair contain the first and second half of the data of each frame.

When recording GSB in phased array voltage beam (ie. baseband) mode, the “raw”, or pre-channelized, sample rate is either 33.3333… Msps at 8 bits per sample or 66.6666… Msps at 4 bits per sample (in the latter case, sample bit-ordering is the same as for rawdump). Channelization via fast Fourier transform sets the channelized complete sample rate to the raw rate divided by \(2N_\mathrm{F}\), where \(N_\mathrm{F}\) is the number of Fourier channels (either 256 or 512). The timespan of one frame is 0.25165824 s, and one frame is 8 megabytes in size, for either raw sample rate.

The phased header’s structure is:

<PC TIME> <GPS TIME> <SEQ NUMBER> <MEM BLOCK>

where <PC TIME> and <GPS TIME> are the less accurate computer-based and exact GPS-based timestamps, respectively, with the same format as the rawdump timestamp; <SEQ NUMBER> is the frame number; and <MEM BLOCK> a redundant modulo-8 shared memory block number.

Usage Notes

This section covers reading and writing GSB files with Baseband; general usage is covered in the Using Baseband section. While Baseband features the general baseband.open and baseband.file_info functions, these cannot read GSB binary files without the accompanying timestamp file (at which point it is obvious the files are GSB). baseband.file_info, however, can be used on the timestamp file to determine if it is in rawdump or phased format.

The examples below use the samplefiles in the baseband/data/gsb/ directory, and the numpy, astropy.units and baseband.gsb modules:

>>> import numpy as np
>>> import astropy.units as u
>>> from baseband import gsb
>>> from baseband.data import (
...     SAMPLE_GSB_RAWDUMP, SAMPLE_GSB_RAWDUMP_HEADER,
...     SAMPLE_GSB_PHASED, SAMPLE_GSB_PHASED_HEADER)

A single timestamp file can be opened with open in text mode:

>>> ft = gsb.open(SAMPLE_GSB_RAWDUMP_HEADER, 'rt')
>>> ft.read_timestamp()
<GSBRawdumpHeader gps: 2015 04 27 18 45 00 0.000000240>
>>> ft.close()

Reading payloads requires the samples per frame or sample rate. For phased the sample rate is:

sample_rate = raw_sample_rate / (2 * nchan)

where the raw sample rate is the pre-channelized one, and nchan the number of Fourier channels. The samples per frame for both rawdump and phased is:

samples_per_frame = timespan_of_frame * sample_rate

Note

Since the number of samples per frame is an integer number while both the frame timespan and sample rate are not, it is better to separately caculate samples_per_frame rather than multiplying timespan_of_frame with sample_rate in order to avoid rounding issues.

Alternatively, if the size of the frame buffer and the frame rate are known, the former can be used to determine samples_per_frame, and the latter used to determine sample_rate by inverting the above equation.

If samples_per_frame is not given, Baseband assumes it is the equivalent of 4 megabytes of data for rawdump, or 8 megabytes if phased. If sample_rate is not given, it is calculated from samples_per_frame assuming timespan_of_frame = 0.25165824 (see File Structure above).

A single payload file can be opened with open in binary mode. Here, for our sample file, we have to take into account that in order to keep these files small, their sample size has been reduced to only 4 or 8 kilobytes worth of samples per frame (for the default timespan). So, we define their sample rate here, and use that to calculate payload_nbytes, the size of one frame in bytes. Since rawdump samples are 4 bits, payload_nbytes is just samples_per_frame / 2:

>>> rawdump_samples_per_frame = 2**13
>>> payload_nbytes = rawdump_samples_per_frame // 2
>>> fb = gsb.open(SAMPLE_GSB_RAWDUMP, 'rb', payload_nbytes=payload_nbytes,
...               nchan=1, bps=4, complex_data=False)
>>> payload = fb.read_payload()
>>> payload[:4]
array([[ 0.],
       [-2.],
       [-2.],
       [ 0.]], dtype=float32)
>>> fb.close()

(payload_nbytes for phased data is the size of one frame divided by the number of binary files.)

Opening in stream mode allows timestamp and binary files to be read in concert to create data frames, and also wraps the low-level routines such that reading and writing is in units of samples, and provides access to header information.

When opening a rawdump file in stream mode, we pass the timestamp file as the first argument, and the binary file to the raw keyword. As per above, we also pass samples_per_frame:

>>> fh_rd = gsb.open(SAMPLE_GSB_RAWDUMP_HEADER, mode='rs',
...                  raw=SAMPLE_GSB_RAWDUMP,
...                  samples_per_frame=rawdump_samples_per_frame)
>>> fh_rd.header0
<GSBRawdumpHeader gps: 2015 04 27 18 45 00 0.000000240>
>>> dr = fh_rd.read()
>>> dr.shape
(81920,)
>>> dr[:3]
array([ 0., -2., -2.], dtype=float32)
>>> fh_rd.close()

To open a phased fileset in stream mode, we package the binary files into a nested tuple with the format:

((L pol stream 1, L pol stream 2), (R pol stream 1, R pol stream 2))

The nested tuple is passed to raw (note that we again have to pass a non-default sample rate):

>>> phased_samples_per_frame = 2**3
>>> fh_ph = gsb.open(SAMPLE_GSB_PHASED_HEADER, mode='rs',
...                  raw=SAMPLE_GSB_PHASED,
...                  samples_per_frame=phased_samples_per_frame)
>>> header0 = fh_ph.header0     # To be used for writing, below.
>>> dp = fh_ph.read()
>>> dp.shape
(80, 2, 512)
>>> dp[0, 0, :3]    
array([30.+12.j, -1. +8.j,  7.+19.j], dtype=complex64)
>>> fh_ph.close()

To set up a file for writing, we need to pass names for both timestamp and raw files, as well as sample_rate, samples_per_frame, and either the first header or a time object. We first calculate sample_rate:

>>> timespan = 0.25165824 * u.s
>>> rawdump_sample_rate = (rawdump_samples_per_frame / timespan).to(u.MHz)
>>> phased_sample_rate = (phased_samples_per_frame / timespan).to(u.MHz)

To write a rawdump file:

>>> from astropy.time import Time
>>> fw_rd = gsb.open('test_rawdump.timestamp',
...                  mode='ws', raw='test_rawdump.dat',
...                  sample_rate=rawdump_sample_rate,
...                  samples_per_frame=rawdump_samples_per_frame,
...                  time=Time('2015-04-27T13:15:00'))
>>> fw_rd.write(dr)
>>> fw_rd.close()
>>> fh_rd = gsb.open('test_rawdump.timestamp', mode='rs',
...                  raw='test_rawdump.dat',
...                  sample_rate=rawdump_sample_rate,
...                  samples_per_frame=rawdump_samples_per_frame)
>>> np.all(dr == fh_rd.read())
True
>>> fh_rd.close()

To write a phased file, we need to pass a nested tuple of filenames or filehandles:

>>> test_phased_bin = (('test_phased_pL1.dat', 'test_phased_pL2.dat'),
...                    ('test_phased_pR1.dat', 'test_phased_pR2.dat'))
>>> fw_ph = gsb.open('test_phased.timestamp',
...                  mode='ws', raw=test_phased_bin,
...                  sample_rate=phased_sample_rate,
...                  samples_per_frame=phased_samples_per_frame,
...                  header0=header0)
>>> fw_ph.write(dp)
>>> fw_ph.close()
>>> fh_ph = gsb.open('test_phased.timestamp', mode='rs',
...                  raw=test_phased_bin,
...                  sample_rate=phased_sample_rate,
...                  samples_per_frame=phased_samples_per_frame)
>>> np.all(dp == fh_ph.read())
True
>>> fh_ph.close()

Baseband does not use the PC time in the phased header, and, when writing, simply uses the same time for both GPS and PC times. Since the PC time can drift from the GPS time by several tens of milliseconds, test_phased.timestamp will not be identical to SAMPLE_GSB_PHASED, even though we have written the exact same data to file.

Reference/API

baseband.gsb Package

GMRT Software Backend (GSB) data reader.

See http://gmrt.ncra.tifr.res.in/gmrt_hpage/sub_system/gmrt_gsb/index.htm

Functions

open(name[, mode])

Open GSB file(s) for reading or writing.

Classes

GSBFrame(header, payload[, valid, verify])

Frame encapsulating GSB rawdump or phased data.

GSBHeader(words[, mode, nbytes, utc_offset, …])

GSB Header, based on a line from a timestamp file.

GSBPayload(words[, sample_shape, bps, …])

Container for decoding and encoding GSB payloads.

Class Inheritance Diagram

Inheritance diagram of baseband.gsb.frame.GSBFrame, baseband.gsb.header.GSBHeader, baseband.gsb.payload.GSBPayload

baseband.gsb.header Module

Definitions for GSB Headers, using the timestamp files.

Somewhat out of data description for phased data: http://gmrt.ncra.tifr.res.in/gmrt_hpage/sub_system/gmrt_gsb/GSB_beam_timestamp_note_v1.pdf and for rawdump data http://gmrt.ncra.tifr.res.in/gmrt_hpage/sub_system/gmrt_gsb/GSB_rawdump_data_format_v2.pdf

Classes

TimeGSB(val1, val2, scale, precision, …[, …])

GSB header date-time format YYYY MM DD HH MM SS 0.SSSSSSSSS.

GSBHeader(words[, mode, nbytes, utc_offset, …])

GSB Header, based on a line from a timestamp file.

GSBRawdumpHeader(words[, mode, nbytes, …])

GSB rawdump header.

GSBPhasedHeader(words[, mode, nbytes, …])

GSB phased header.

Class Inheritance Diagram

Inheritance diagram of baseband.gsb.header.TimeGSB, baseband.gsb.header.GSBHeader, baseband.gsb.header.GSBRawdumpHeader, baseband.gsb.header.GSBPhasedHeader

baseband.gsb.payload Module

Definitions for GSB payloads.

Implements a GSBPayload class used to store payload blocks, and decode to or encode from a data array.

See http://gmrt.ncra.tifr.res.in/gmrt_hpage/sub_system/gmrt_gsb/index.htm

Classes

GSBPayload(words[, sample_shape, bps, …])

Container for decoding and encoding GSB payloads.

Class Inheritance Diagram

Inheritance diagram of baseband.gsb.payload.GSBPayload

baseband.gsb.frame Module
Classes

GSBFrame(header, payload[, valid, verify])

Frame encapsulating GSB rawdump or phased data.

Class Inheritance Diagram

Inheritance diagram of baseband.gsb.frame.GSBFrame

baseband.gsb.base Module
Functions

open(name[, mode])

Open GSB file(s) for reading or writing.

Classes

GSBTimeStampIO(fh_raw)

Simple reader/writer for GSB time stamp files.

GSBFileReader(fh_raw, payload_nbytes[, …])

Simple reader for GSB data files.

GSBFileWriter(fh_raw)

Simple writer for GSB data files.

GSBStreamBase(fh_ts, fh_raw, header0[, …])

Base for GSB streams.

GSBStreamReader(fh_ts, fh_raw[, …])

GSB format reader.

GSBStreamWriter(fh_ts, fh_raw[, header0, …])

GSB format writer.

Class Inheritance Diagram

Inheritance diagram of baseband.gsb.base.GSBTimeStampIO, baseband.gsb.base.GSBFileReader, baseband.gsb.base.GSBFileWriter, baseband.gsb.base.GSBStreamBase, baseband.gsb.base.GSBStreamReader, baseband.gsb.base.GSBStreamWriter

Core Framework and Utilities

These sections contain APIs and usage notes for the sequential file opener, the API for the set of core utility functions and classes located in vlbi_base, and sample data that come with baseband (mostly used for testing).

Baseband Helpers

Helpers assist with reading and writing all file formats. Currently, they only include the sequentialfile module for reading a sequence of files as a single one.

Sequential File

The sequentialfile module is for reading from and writing to a sequence of files as if they were a single, contiguous one. Like with file formats, there is a master sequentialfile.open function to open sequences either for reading or writing. It returns sequential file objects that have read, write, seek, tell, and close methods that work identically to their single file object counterparts. They additionally have memmap methods to read or write to files through numpy.memmap.

It is usually unnecessary to directly access sequentialfile, since it is used by baseband.open and all format openers (except GSB) whenever a sequence of files is passed - see the Using Baseband documentation for details. For finer control of file opening, however, one may manually create a sequentialfile object, then pass it to an opener.

To illustrate, we rewrite the multi-file example from Using Baseband. We first load the required data:

>>> from baseband import vdif
>>> from baseband.data import SAMPLE_VDIF
>>> import numpy as np
>>> fh = vdif.open(SAMPLE_VDIF, 'rs')
>>> d = fh.read()

We now create a sequence of filenames and calculate the byte size per file, then pass these to open:

>>> from baseband.helpers import sequentialfile as sf
>>> filenames = ["seqvdif_{0}".format(i) for i in range(2)]
>>> file_size = fh.fh_raw.seek(0, 2) // 2
>>> fwr = sf.open(filenames, mode='w+b', file_size=file_size)

The first argument passed to open must be a time-ordered sequence of filenames in a list, tuple, or other container that returns IndexError when the index is out of bounds. The read mode is ‘w+b’ (a requirement of all format openers just in case they use numpy.memmap), and file_size determines the largest size a file may reach before the next one in the sequence is opened for writing. We set file_size such that each file holds exactly one frameset.

To write the data, we pass fwr to vdif.open:

>>> fw = vdif.open(fwr, 'ws', header0=fh.header0,
...                sample_rate=fh.sample_rate,
...                nthread=fh.sample_shape.nthread)
>>> fw.write(d)
>>> fw.close()    # This implicitly closes fwr.

To read the sequence and confirm their contents are identical to the sample file’s, we may again use open:

>>> frr = sf.open(filenames, mode='rb')
>>> fr = vdif.open(frr, 'rs', sample_rate=fh.sample_rate)
>>> fr.header0.time == fh.header0.time
True
>>> np.all(fr.read() == d)
True
>>> fr.close()
>>> fh.close()  # Close sample file.

Reference/API

baseband.helpers Package
baseband.helpers.sequentialfile Module
Functions

open(files[, mode, file_size, opener])

Read or write several files as if they were one contiguous one.

Classes

FileNameSequencer(template[, header])

List-like generator of filenames using a template.

SequentialFileBase(files[, mode, opener])

Deal with several files as if they were one contiguous one.

SequentialFileReader(files[, mode, opener])

Read several files as if they were one contiguous one.

SequentialFileWriter(files[, mode, …])

Write several files as if they were one contiguous one.

Class Inheritance Diagram

Inheritance diagram of baseband.helpers.sequentialfile.FileNameSequencer, baseband.helpers.sequentialfile.SequentialFileBase, baseband.helpers.sequentialfile.SequentialFileReader, baseband.helpers.sequentialfile.SequentialFileWriter

VLBI Base

Routines on which the readers and writers for specific VLBI formats are based.

Reference/API

baseband.vlbi_base Package
baseband.vlbi_base.header Module

Base definitions for VLBI Headers, used for VDIF and Mark 5B.

Defines a header class VLBIHeaderBase that can be used to hold the words corresponding to a frame header, providing access to the values encoded in via a dict-like interface. Definitions for headers are constructed using the HeaderParser class.

Functions

make_parser(word_index, bit_index, bit_length)

Construct a function that converts specific bits from a header.

make_setter(word_index, bit_index, bit_length)

Construct a function that uses a value to set specific bits in a header.

get_default(word_index, bit_index, bit_length)

Return the default value from a header keyword.

Classes

fixedvalue(fget[, doc, lazy])

Property that is fixed for all instances of a class.

ParserDict(method[, name, doc])

Create a lazily evaluated dictionary of parsers, setters, or defaults.

HeaderParser(*args, **kwargs)

Parser & setter for VLBI header keywords.

VLBIHeaderBase(words[, verify])

Base class for all VLBI headers.

Class Inheritance Diagram

Inheritance diagram of baseband.vlbi_base.header.fixedvalue, baseband.vlbi_base.header.ParserDict, baseband.vlbi_base.header.HeaderParser, baseband.vlbi_base.header.VLBIHeaderBase

baseband.vlbi_base.payload Module

Base definitions for VLBI payloads, used for VDIF and Mark 5B.

Defines a payload class VLBIPayloadBase that can be used to hold the words corresponding to a frame payload, providing access to the values encoded in it as a numpy array.

Classes

VLBIPayloadBase(words[, sample_shape, bps, …])

Container for decoding and encoding VLBI payloads.

Class Inheritance Diagram

Inheritance diagram of baseband.vlbi_base.payload.VLBIPayloadBase

baseband.vlbi_base.frame Module

Base definitions for VLBI frames, used for VDIF and Mark 5B.

Defines a frame class VLBIFrameBase that can be used to hold a header and a payload, providing access to the values encoded in both.

Classes

VLBIFrameBase(header, payload[, valid, verify])

Representation of a VLBI data frame, consisting of a header and payload.

Class Inheritance Diagram

Inheritance diagram of baseband.vlbi_base.frame.VLBIFrameBase

baseband.vlbi_base.base Module
Functions

make_opener(fmt, classes[, doc, append_doc])

Create a baseband file opener.

Classes

HeaderNotFoundError

Error in finding a header in a stream.

VLBIFileBase(fh_raw)

VLBI file wrapper, used to add frame methods to a binary data file.

VLBIFileReaderBase(fh_raw)

VLBI wrapped file reader base class.

VLBIStreamBase(fh_raw, header0, sample_rate, …)

VLBI file wrapper, allowing access as a stream of data.

VLBIStreamReaderBase(fh_raw, header0, …)

VLBIStreamWriterBase(fh_raw, header0, …)

Class Inheritance Diagram

Inheritance diagram of baseband.vlbi_base.base.HeaderNotFoundError, baseband.vlbi_base.base.VLBIFileBase, baseband.vlbi_base.base.VLBIFileReaderBase, baseband.vlbi_base.base.VLBIStreamBase, baseband.vlbi_base.base.VLBIStreamReaderBase, baseband.vlbi_base.base.VLBIStreamWriterBase

baseband.vlbi_base.file_info Module

Provide a base class for “info” properties.

Loosely based on DataInfo.

Classes

info_item(attr[, needs, default, doc, …])

Like a lazy property, evaluated only once.

VLBIInfoMeta(name, bases, dct)

VLBIInfoBase([parent])

Container providing a standardized interface to file information.

VLBIFileReaderInfo([parent])

Standardized information on file readers.

VLBIStreamReaderInfo([parent])

Standardized information on stream readers.

Class Inheritance Diagram

Inheritance diagram of baseband.vlbi_base.file_info.info_item, baseband.vlbi_base.file_info.VLBIInfoMeta, baseband.vlbi_base.file_info.VLBIInfoBase, baseband.vlbi_base.file_info.VLBIFileReaderInfo, baseband.vlbi_base.file_info.VLBIStreamReaderInfo

baseband.vlbi_base.encoding Module

Encoders and decoders for generic VLBI data formats.

Functions

encode_1bit_base(values)

Generic encoder for data stored using one bit.

encode_2bit_base(values)

Generic encoder for data stored using two bits.

encode_4bit_base(values)

Generic encoder for data stored using four bits.

decode_8bit(words)

Generic decoder for data stored using 8 bits.

encode_8bit(values)

Encode 8 bit VDIF data.

Variables

OPTIMAL_2BIT_HIGH

Optimal high value for a 2-bit digitizer for which the low value is 1.

TWO_BIT_1_SIGMA

Optimal level between low and high for the above OPTIMAL_2BIT_HIGH.

FOUR_BIT_1_SIGMA

Scaling for four-bit encoding that makes it look like 2 bit.

EIGHT_BIT_1_SIGMA

Scaling for eight-bit encoding that makes it look like 2 bit.

decoder_levels

Levels for data encoded with different numbers of bits..

baseband.vlbi_base.utils Module
Functions

lcm(a, b)

Calculate the least common multiple of a and b.

bcd_decode(value)

bcd_encode(value)

byte_array(pattern)

Convert the pattern to a byte array.

Classes

CRC(polynomial)

Cyclic Redundancy Check.

CRCStack(polynomial)

Cyclic Redundancy Check for a bitstream.

Class Inheritance Diagram

Inheritance diagram of baseband.vlbi_base.utils.CRC, baseband.vlbi_base.utils.CRCStack

Sample Data Files

baseband.data Package

Sample files with baseband data recorded in different formats.

Variables

SAMPLE_AROCHIME_VDIF

VDIF sample from ARO, written by CHIME backend.

SAMPLE_BPS1_VDIF

VDIF sample from Christian Ploetz.

SAMPLE_DADA

DADA sample from Effelsberg, with header adapted to shortened size.

SAMPLE_DRAO_CORRUPT

Corrupted VDIF sample.

SAMPLE_GSB_PHASED

GSB phased sample.

SAMPLE_GSB_PHASED_HEADER

GSB phased header sample.

SAMPLE_GSB_RAWDUMP

GSB rawdump sample.

SAMPLE_GSB_RAWDUMP_HEADER

GSB rawdump header sample.

SAMPLE_MARK4

Mark 4 sample.

SAMPLE_MARK4_16TRACK

Mark 4 sample.

SAMPLE_MARK4_32TRACK

Mark 4 sample.

SAMPLE_MARK4_32TRACK_FANOUT2

Mark 4 sample.

SAMPLE_MARK5B

Mark 5B sample.

SAMPLE_MWA_VDIF

VDIF sample from MWA.

SAMPLE_PUPPI

GUPPI/PUPPI sample, npol=2, nchan=4.

SAMPLE_VDIF

VDIF sample.

SAMPLE_VLBI_VDIF

VDIF sample.

Developer Documentation

The developer documentation feature tutorials for supporting new formats or format extensions such as VDIF EDV. It also contains instructions for publishing new code releases.

Supporting a New VDIF EDV

Users may encounter VDIF files with unusual headers not currently supported by Baseband. These may either have novel EDV, or they may purport to be a supported EDV but not conform to its formal specification. To handle such situations, Baseband supports implementation of new EDVs and overriding of existing EDVs without the need to modify Baseband’s source code.

The tutorials below assumes the following modules have been imported:

>>> import numpy as np
>>> import astropy.units as u
>>> from baseband import vdif, vlbi_base as vlbi

VDIF Headers

Each VDIF frame begins with a 32-byte, or eight 32-bit word, header that is structured as follows:

_images/VDIFHeader.png

Schematic of the standard 32-bit VDIF header, from VDIF specification release 1.1.1 document, Fig. 3. 32-bit words are labelled on the left, while byte and bit numbers above indicate relative addresses within each word. Subscripts indicate field length in bits.

where the abbreviated labels are

  • \(\mathrm{I}_1\) - invalid data

  • \(\mathrm{L}_1\) - if 1, header is VDIF legacy

  • \(\mathrm{V}_3\) - VDIF version number

  • \(\mathrm{log}_2\mathrm{(\#chns)}_5\) - \(\mathrm{log}_2\) of the number of sub-bands in the frame

  • \(\mathrm{C}_1\) - if 1, complex data

  • \(\mathrm{EDV}_8\) - “extended data version” number; see below

Detailed definitions of terms are found on pages 5 to 7 of the VDIF specification document.

Words 4 - 7 hold optional extended user data, using a layout specified by the EDV, in word 4 of the header. EDV formats can be registered on the VDIF website; Baseband aims to support all registered formats (but does not currently support EDV = 4).

Implementing a New EDV

In this tutorial, we follow the implementation of an EDV=4 header. This would be a first and required step to support that format, but does not suffice, as it also needs a new frame class that allows the purpose of the EDV class, which is to independently store the validity of sub-band channels within a single data frame, rather than using the single invalid-data bit. From the EDV=4 specification, we see that we need to add the following to the standard VDIF header:

  • Validity header mask (word 4, bits 16 - 24): integer value between 1 and 64 inclusive indicating the number of validity bits. (This is different than \(\mathrm{log}_2\mathrm{(\#chns)}_5\), since some channels can be unused.)

  • Synchronization pattern (word 5): constant byte sequence 0xACABFEED, for finding the locations of headers in a data stream.

  • Validity mask (words 6 - 7): 64-bit binary mask indicating the validity of sub-bands. Any fraction of 64 sub-bands can be stored in this format, with any unused bands labelled as invalid (0) in the mask. If the number of bands exceeds 64, each bit indicates the validity of a group of sub-bands; see specification for details.

See Sec. 3.1 of the specification for best practices on using the invalid data bit \(\mathrm{I}_1\) in word 0.

In Baseband, a header is parsed using VDIFHeader, which returns a header instance of one of its subclasses, corresponding to the header EDV. This can be seen in the baseband.vdif.header module class inheritance diagram. To support a new EDV, we create a new subclass to baseband.vdif.VDIFHeader:

>>> class VDIFHeader4(vdif.header.VDIFHeader):
...     _edv = 4
...
...     _header_parser = vlbi.header.HeaderParser(
...         (('invalid_data', (0, 31, 1, False)),
...          ('legacy_mode', (0, 30, 1, False)),
...          ('seconds', (0, 0, 30)),
...          ('_1_30_2', (1, 30, 2, 0x0)),
...          ('ref_epoch', (1, 24, 6)),
...          ('frame_nr', (1, 0, 24, 0x0)),
...          ('vdif_version', (2, 29, 3, 0x1)),
...          ('lg2_nchan', (2, 24, 5)),
...          ('frame_length', (2, 0, 24)),
...          ('complex_data', (3, 31, 1)),
...          ('bits_per_sample', (3, 26, 5)),
...          ('thread_id', (3, 16, 10, 0x0)),
...          ('station_id', (3, 0, 16)),
...          ('edv', (4, 24, 8)),
...          ('validity_mask_length', (4, 16, 8, 0)),
...          ('sync_pattern', (5, 0, 32, 0xACABFEED)),
...          ('validity_mask', (6, 0, 64, 0))))

VDIFHeader has a metaclass that ensures that whenever it is subclassed, the subclass definition is inserted into the VDIF_HEADER_CLASSES dictionary using its EDV value as the dictionary key. Methods in VDIFHeader use this dictionary to determine the type of object to return for a particular EDV. How all this works is further discussed in the documentation of the VDIF baseband.vdif.header module.

The class must have a private _edv attribute for it to properly be registered in VDIF_HEADER_CLASSES. It must also feature a _header_parser that reads these words to return header properties. For this, we use baseband.vlbi_base.header.HeaderParser. To initialize a header parser, we pass it a tuple of header properties, where each entry follows the syntax:

('property_name', (word_index, bit_index, bit_length, default))

where

  • property_name: name of the header property; this will be the key;

  • word_index: index into the header words for this key;

  • bit_index: index to the starting bit of the part used;

  • bit_length: number of bits used, normally between 1 and 32, but can be 64 for adding two words together; and

  • default: (optional) default value to use in initialization.

For further details, see the documentation of HeaderParser.

Once defined, we can use our new header like any other:

>>> myheader = vdif.header.VDIFHeader.fromvalues(
...     edv=4, seconds=14363767, nchan=1, samples_per_frame=1024,
...     station=65532, bps=2, complex_data=False,
...     thread_id=3, validity_mask_length=60,
...     validity_mask=(1 << 59) + 1)
>>> myheader
<VDIFHeader4 invalid_data: False,
             legacy_mode: False,
             seconds: 14363767,
             _1_30_2: 0,
             ref_epoch: 0,
             frame_nr: 0,
             vdif_version: 1,
             lg2_nchan: 0,
             frame_length: 36,
             complex_data: False,
             bits_per_sample: 1,
             thread_id: 3,
             station_id: 65532,
             edv: 4,
             validity_mask_length: 60,
             sync_pattern: 0xacabfeed,
             validity_mask: 576460752303423489>
>>> myheader['validity_mask'] == 2**59 + 1
True

There is an easier means of instantiating the header parser. As can be seen in the class inheritance diagram for the header module, many VDIF headers are subclassed from other VDIFHeader subclasses, namely VDIFBaseHeader and VDIFSampleRateHeader. This is because many EDV specifications share common header values, and so their functions and derived properties should be shared as well. Moreover, header parsers can be appended to one another, which saves repetitious coding because the first four words of any VDIF header are the same. Indeed, we can create the same header as above by subclassing VDIFBaseHeader:

>>> class VDIFHeader4Enhanced(vdif.header.VDIFBaseHeader):
...     _edv = 42
...
...     _header_parser = vdif.header.VDIFBaseHeader._header_parser +\
...                      vlbi.header.HeaderParser((
...                             ('validity_mask_length', (4, 16, 8, 0)),
...                             ('sync_pattern', (5, 0, 32, 0xACABFEED)),
...                             ('validity_mask', (6, 0, 64, 0))))
...
...     _properties = vdif.header.VDIFBaseHeader._properties + ('validity',)
...
...     def verify(self):
...         """Basic checks of header integrity."""
...         super(VDIFHeader4Enhanced, self).verify()
...         assert 1 <= self['validity_mask_length'] <= 64
...
...     @property
...     def validity(self):
...         """Validity mask array with proper length.
...
...         If set, writes both ``validity_mask`` and ``validity_mask_length``.
...         """
...         bitmask = np.unpackbits(self['validity_mask'].astype('>u8')
...                                 .view('u1'))[::-1].astype(bool)
...         return bitmask[:self['validity_mask_length']]
...
...     @validity.setter
...     def validity(self, validity):
...         bitmask = np.zeros(64, dtype=bool)
...         bitmask[:len(validity)] = validity
...         self['validity_mask_length'] = len(validity)
...         self['validity_mask'] = np.packbits(bitmask[::-1]).view('>u8')

Here, we set edv = 42 because VDIFHeader’s metaclass is designed to prevent accidental overwriting of existing entries in VDIF_HEADER_CLASSES. If we had used _edv = 4, we would have gotten an exception:

ValueError: EDV 4 already registered in VDIF_HEADER_CLASSES

We shall see how to override header classes in the next section. Except for the EDV, VDIFHeader4Enhanced’s header structure is identical to VDIFHeader4. It also contains a few extra functions to enhance the header’s usability.

The verify function is an optional function that runs upon header initialization to check its veracity. Ours simply checks that the validity mask length is in the allowed range, but we also call the same function in the superclass (VDIFBaseHeader), which checks that the header is not in 4-word “legacy mode”, that the header’s EDV matches that read from the words, that there are eight words, and that the sync pattern matches 0xACABFEED.

The validity_mask is a bit mask, which is not necessarily the easiest to use directly. Hence, implement a derived validity property that generates a boolean mask of the right length (note that this is not right for cases whether the number of channels in the header exceeds 64). We also define a corresponding setter, and add this to the private _properties attribute, so that we can use validity as a keyword in fromvalues:

>>> myenhancedheader = vdif.header.VDIFHeader.fromvalues(
...     edv=42, seconds=14363767, nchan=1, samples_per_frame=1024,
...     station=65532, bps=2, complex_data=False,
...     thread_id=3, validity=[True]+[False]*58+[True])
>>> myenhancedheader
<VDIFHeader4Enhanced invalid_data: False,
                     legacy_mode: False,
                     seconds: 14363767,
                     _1_30_2: 0,
                     ref_epoch: 0,
                     frame_nr: 0,
                     vdif_version: 1,
                     lg2_nchan: 0,
                     frame_length: 36,
                     complex_data: False,
                     bits_per_sample: 1,
                     thread_id: 3,
                     station_id: 65532,
                     edv: 42,
                     validity_mask_length: 60,
                     sync_pattern: 0xacabfeed,
                     validity_mask: [576460752303423489]>
>>> assert myenhancedheader['validity_mask'] == 2**59 + 1
>>> assert (myenhancedheader.validity == [True]+[False]*58+[True]).all()
>>> myenhancedheader.validity = [True]*8
>>> myenhancedheader['validity_mask']
array([255], dtype=uint64)

Note

If you have implemented support for a new EDV that is widely used, we encourage you to make a pull request to Baseband’s GitHub repository, as well as to register it (if it is not already registered) with the VDIF consortium!

Replacing an Existing EDV

Above, we mentioned that VDIFHeader’s metaclass is designed to prevent accidental overwriting of existing entries in VDIF_HEADER_CLASSES, so attempting to assign two header classes to the same EDV results in an exception. There are situations such the one above, however, where we’d like to replace one header with another.

To get VDIFHeader to use VDIFHeader4Enhanced when edv=4, we can manually insert it in the dictionary:

>>> vdif.header.VDIF_HEADER_CLASSES[4] = VDIFHeader4Enhanced

Of course, we should then be sure that its _edv attribute is correct:

>>> VDIFHeader4Enhanced._edv = 4

VDIFHeader will now return instances of VDIFHeader4Enhanced when reading headers with edv = 4:

>>> myheader = vdif.header.VDIFHeader.fromvalues(
...     edv=4, seconds=14363767, nchan=1,
...     station=65532, bps=2, complex_data=False,
...     thread_id=3, validity=[True]*60)
>>> assert isinstance(myheader, VDIFHeader4Enhanced)

Note

Failing to modify _edv in the class definition will lead to an EDV mismatch when verify is called during header initialization.

This can also be used to override VDIFHeader’s behavior even for EDVs that are supported by Baseband, which may prove useful when reading data with corrupted or mislabelled headers. To illustrate this, we attempt to read in a corrupted VDIF file originally from the Dominion Radio Astrophysical Observatory. This file can be imported from the baseband data directory:

>>> from baseband.data import SAMPLE_DRAO_CORRUPT

Naively opening the file with

>>> fh = vdif.open(SAMPLE_DRAO_CORRUPT, 'rs')  

will lead to an AssertionError. This is because while the headers of the file use EDV=0, it deviates from that EDV standard by storing additional information an: an “eud2” parameter in word 5, which is related to the sample time. Furthermore, the bits_per_sample setting is incorrect (it should be 3 rather than 4 – the number is defined such that a one-bit sample has a bits_per_sample code of 0). Finally, though not an error, the thread_id in word 3 defines two parts, link and slot, which reflect the data acquisition computer node that wrote the data to disk.

To accommodate these changes, we design an alternate header. We first pop the EDV = 0 entry from VDIF_HEADER_CLASSES:

>>> vdif.header.VDIF_HEADER_CLASSES.pop(0)
<class 'baseband.vdif.header.VDIFHeader0'>

We then define a replacement class:

>>> class DRAOVDIFHeader(vdif.header.VDIFHeader0):
...     """DRAO VDIF Header
...
...     An extension of EDV=0 which uses the thread_id to store link
...     and slot numbers, and adds a user keyword (illegal in EDV0,
...     but whatever) that identifies data taken at the same time.
...
...     The header also corrects 'bits_per_sample' to be properly bps-1.
...     """
...
...     _header_parser = vdif.header.VDIFHeader0._header_parser + \
...         vlbi.header.HeaderParser((('link', (3, 16, 4)),
...                                   ('slot', (3, 20, 6)),
...                                   ('eud2', (5, 0, 32))))
...
...     def verify(self):
...         pass  # this is a hack, don't bother with verification...
...
...     @classmethod
...     def fromfile(cls, fh, edv=0, verify=False):
...         self = super(DRAOVDIFHeader, cls).fromfile(fh, edv=0,
...                                                    verify=False)
...         # Correct wrong bps
...         self.mutable = True
...         self['bits_per_sample'] = 3
...         return self

We override verify because VDIFHeader0’s verify function checks that word 5 contains no data. We also override the fromfile class method such that the bits_per_sample property is reset to its proper value whenever a header is read from file.

We can now read in the corrupt file by manually reading in the header, then the payload, of each frame:

>>> fh = vdif.open(SAMPLE_DRAO_CORRUPT, 'rb')
>>> header0 = DRAOVDIFHeader.fromfile(fh)
>>> header0['eud2'] == 667235140
True
>>> header0['link'] == 2
True
>>> payload0 = vdif.payload.VDIFPayload.fromfile(fh, header0)
>>> payload0.shape == (header0.samples_per_frame, header0.nchan)
True
>>> fh.close()

Reading a frame using VDIFFrame will still fail, since its _header_class is VDIFHeader, and so VDIFHeader.fromfile, rather than the function we defined, is used to read in headers. If we wanted to use VDIFFrame, we would need to set

VDIFFrame._header_class = DRAOVDIFHeader

before using baseband.vdif.open(), so that header files are read using DRAOVDIFHeader.fromfile.

A more elegant solution that is compatible with baseband.vdif.base.VDIFStreamReader without hacking baseband.vdif.frame.VDIFFrame involves modifying the bits-per-sample code within __init__(). Let’s remove our previous custom class, and define a replacement:

>>> vdif.header.VDIF_HEADER_CLASSES.pop(0)
<class '__main__.DRAOVDIFHeader'>
>>> class DRAOVDIFHeaderEnhanced(vdif.header.VDIFHeader0):
...     """DRAO VDIF Header
...
...     An extension of EDV=0 which uses the thread_id to store link and slot
...     numbers, and adds a user keyword (illegal in EDV0, but whatever) that
...     identifies data taken at the same time.
...
...     The header also corrects 'bits_per_sample' to be properly bps-1.
...     """
...     _header_parser = vdif.header.VDIFHeader0._header_parser + \
...         vlbi.header.HeaderParser((('link', (3, 16, 4)),
...                                   ('slot', (3, 20, 6)),
...                                   ('eud2', (5, 0, 32))))
...
...     def __init__(self, words, edv=None, verify=True, **kwargs):
...         super(DRAOVDIFHeaderEnhanced, self).__init__(
...                 words, verify=False, **kwargs)
...         self.mutable = True
...         self['bits_per_sample'] = 3
...
...     def verify(self):
...         pass

If we had the whole corrupt file, this might be enough to use the stream reader without further modification. It turns out, though, that the frame numbers are not monotonic and that the station ID changes between frames as well, so one would be better off making a new copy. Here, we can at least now read frames:

>>> fh2 = vdif.open(SAMPLE_DRAO_CORRUPT, 'rb')
>>> frame0 = fh2.read_frame()
>>> np.all(frame0.data == payload0.data)
True
>>> fh2.close()

Reading frames using VDIFFileReader.read_frame will now work as well, but reading frame sets using VDIFFileReader.read_frameset will still fail. This is because the frame and thread numbers that function relies on are meaningless for these headers, and grouping threads together using the link, slot and eud2 values should be manually performed by the user.

Release Procedure

This procedure is based off of Astropy’s, and additionally uses information from the PyPI packaging tutorial.

Prerequisites

To make releases, you will need

  • The twine package.

  • An account on PyPI.

  • Collaborator status on Baseband’s repository at mhvk/baseband to push new branches.

  • An account on Read the Docs that has access to Baseband.

  • Optionally, a GPG signing key associated with your GitHub account. While releases do not need to be signed, we recommend doing so to ensure they are trustworthy. To make a GPG key and associate it with your GitHub account, see the Astropy documentation.

Versioning

Baseband follows the semantic versioning specification:

major.minor.patch

where

  • major number represents backward incompatible API changes.

  • minor number represents feature updates to last major version.

  • patch number represents bugfixes from last minor version.

Major and minor versions have their own release branches on GitHub that end with “x” (eg. v1.0.x, v1.1.x), while specific releases are tagged commits within their corresponding branch (eg. v1.1.0 and v1.1.1 are tagged commits within v1.1.x).

Procedure

The first two steps of the release procedure are different for major and minor releases than it is for patch releases. Steps specifically for major/minor releases are labelled “m”, and patch ones labelled “p”.

1m. Preparing major/minor code for release

We begin in the main development branch (the local equivalent to mhvk/baseband:master). First, check the following:

  • Ensure tests pass. Run the test suite by running python3 setup.py test in the Baseband root directory.

  • Update CHANGES.rst. All merge commits to master since the last release should be documented (except trivial ones such as typo corrections). Since CHANGES.rst is updated for each merge commit, in practice it is only necessary to change the date of the release you are working on from “unreleased” to the current date.

  • Add authors and contributors to AUTHORS.rst. To list contributors, one can use:

    git shortlog -n -s -e
    

    This will also list contributors to astropy-helpers and the astropy template, who should not be added. If in doubt, cross-reference with the authors of pull requests.

Once finished, git add any changes and make a commit:

git commit -m "Finalizing changelog and author list for v<version>"

For major/minor releases, the patch number is 0.

Submit the commit as a pull request to master.

1p. Cherry-pick code for a patch release

We begin by checking out the appropriate release branch:

git checkout v<version branch>.x

Bugfix merge commits are backported to this branch from master by way of git cherry-pick. First, find the SHA hashes of the relevant merge commits in the main development branch. Then, for each:

git cherry-pick -m 1 <SHA-1>

For more information, see Astropy’s documentation.

Once you have cherry-picked, check the following:

  • Ensure tests pass and documentation builds. Run the test suite by running python3 setup.py test, and build documentation by running python3 setup.py build_docs, in the Baseband root directory.

  • Update CHANGES.rst. Typically, merge commits record their changes, including any backported bugfixes, in CHANGES.rst. Cherry-picking should add these records to this branch’s CHANGES.rst, but if not, manually add them before making the commit (and manually remove any changes not relevant to this branch). Also, change the date of the release you are working on from “unreleased” to the current date.

Commit your changes:

git commit -m "Finalizing changelog for v<version>"
2m. Create a new release branch

Still in the main development branch, change the version keyword under the [[metadata]] section of setup.cfg to:

version = <version>

and make a commmit:

git commit -m "Preparing v<version>."

Submit the commit as a pull request to master.

Once the pull request has been merged, make and enter a new release branch:

git checkout -b v<version branch>.x
2p. Append to the release branch

In the release branch, prepare the patch release commit by changing the version keyword under the [[metadata]] section of setup.cfg to:

version = <version>

then make a new commmit:

git commit -m "Preparing v<version>."
3. Tag the release

Tag the commit made in step 2 as:

git tag -s v<version> -m "Tagging v<version>"
4. Clean and package the release

Checkout the tag:

git checkout v<version>

Clean the repository:

git clean -dfx
cd astropy_helpers; git clean -dfx; cd ..

and ensure the repository has the proper permissions:

umask 0022
chmod -R a+Xr .

Finally, package the release’s source code:

python setup.py build sdist
5. Test the release

We now test installing and running Baseband in clean virtual environments, to ensure there are no subtle bugs that come from your customized development environment. Before creating the virtualenvs, we recommend checking if the $PYTHONPATH environmental variable is set. If it is, set it to a null value (in bash, PYTHONPATH=) before proceeding.

To create the environments:

python3 -m venv --no-site-packages test_release

Now, for each environment, activate it, navigate to the Baseband root directory, and run the tests:

source <name_of_virtualenv>/bin/activate
cd <baseband_directory>
pip install dist/baseband-<version>.tar.gz
pip install pytest-astropy
cd ~/
python -c 'import baseband; baseband.test()'
deactivate

If the test suite raises any errors (at this point, likely dependency issues), delete the release tag:

git tag -d v<version>

For a major/minor release, delete the v<version branch>.x branch as well. Then, make the necessary changes directly on the main development branch. Once the issues are fixed, repeat steps 2 - 6.

If the tests succeed, you may optionally re-run the cleaning and packaging code above following the tests:

git clean -dfx
cd astropy_helpers; git clean -dfx; cd ..
umask 0022
chmod -R a+Xr .
python setup.py build sdist

You may optionally sign the source as well:

gpg --detach-sign -a dist/baseband-<version>.tar.gz
7. Publish the release on GitHub

If you are working a major/minor release, first push the branch to upstream (assuming upstream is mhvk/baseband):

git push upstream v<version branch>.x

Push the tag to GitHub as well:

git push upstream v<version>

Go to the mhvk/baseband Releases section. Here, published releases are in shown in blue, and unpublished tags in grey and in a much smaller font. To publish a release, click on the v<version> tag you just pushed, then click “Edit tag” (on the upper right). This takes you to a form where you can customize the release title and description. Leave the title blank, in which case it is set to “v<version>”; you can leave the description blank as well if you wish. Finally, click on “Publish release”. This takes you back to Releases, where you should see our new release in blue.

The Baseband GitHub repo automatically updates Baseband’s Zenodo repository for each published release. Check if your release has made it to Zenodo by clicking the badge in Readme.rst.

8. Build the release wheel for PyPI

To build the release:

python setup.py bdist_wheel --universal
9. (Optional) test uploading the release

PyPI provides a test environment to safely try uploading new releases. To take advantage of this, use:

twine upload --repository-url https://test.pypi.org/legacy/ dist/baseband-<version>*

To test if this was successful, create a new virtualenv as above:

virtualenv --no-site-packages --python=python3 pypitest

Then (pip install pytest-astropy comes first because test.pypi does not contain recent versions of Astropy):

source <name_of_virtualenv>/bin/activate
pip install pytest-astropy
pip install --index-url https://test.pypi.org/simple/ baseband
python -c 'import baseband; baseband.test()'
deactivate
10. Upload to PyPI

Finally, upload the package to PyPI:

twine upload dist/baseband-<version>*
11. Check if Readthedocs has updated

Go to Read the Docs and check that the stable version points to the latest stable release. Each minor release has its own version as well, which should be pointing to its latest patch release.

12m. Clean up master

In the main development branch, add the next major/minor release to CHANGES.rst. Also update the version keyword in setup.cfg to:

version = <next major/minor version>.dev

Make a commmit:

git commit -m "Add v<next major/minor version> to the changelog."

Then submit a pull request to master.

12p. Update CHANGES.rst on master

Change the release date of the patch release in CHANGES.rst on master to the current date, then:

git commit -m "Added release date for v<version> to the changelog."

(Alternatively, git cherry-pick the changelog fix from the release branch back to the main development one.)

Then submit a pull request to master.

Project Details

Powered by Astropy Badge https://zenodo.org/badge/DOI/10.5281/zenodo.1214268.svg https://travis-ci.org/mhvk/baseband.svg?branch=master https://coveralls.io/repos/github/mhvk/baseband/badge.svg Documentation Status

Authors and Credits

Powered by Astropy Badge https://zenodo.org/badge/DOI/10.5281/zenodo.1214268.svg

If you used this package in your research, please cite it via DOI 10.5281/zenodo.1214268.

Authors

  • Marten van Kerkwijk (@mhvk)

  • Chenchong Charles Zhu (@cczhu)

Other contributors (alphabetical)

  • Rebecca Lin (@00rebe)

  • Nikhil Mahajan (@theXYZT)

  • Robert Main (@ramain)

  • Dana Simard (@danasimard)

  • George Stein (@georgestein)

If you have contributed to Baseband but are not listed above, please send one of the authors an e-mail, or open a pull request for this page.

Full Changelog

3.1.1 (2020-04-05)

Bug Fixes
  • Mark 5B is fixed so that writing files is now also possible on big-endian architectures.

3.1 (2020-01-23)

Bug Fixes
  • Frame rates are now calculated correctly also for Mark 4 data in which the first frame is the last within a second. [#341]

  • Fixed a bug where a VDIF header was not found correctly if the file pointer was very close to the start of a header already. [#346]

  • In VDIF header verification, include that the implied payload must have non-negative size. [#348]

  • Mark 4 now checks by default (verify=True) that frames are ordered correctly. [#349]

  • find_header will now always check that the frame corresponding to a header is complete (i.e., fits within the file). [#354]

  • The count argument to .read() no longer is changed in-place, making it safe to pass in array scalars or dimensionless quantities. [#373]

Other Changes and Additions
  • The Mark 4, Mark 5B, and VDIF stream readers are now able to replace missing pieces of files with zeros using verify='fix'. This is also the new default; use verify=True for the old behaviour of raising an error on any inconsistency. [#357]

  • The VDIFFileReader gained a new get_thread_ids() method, which will scan through frames to determine the threads present in the file. This is now used inside VDIFStreamReader and, combined with the above, allows reading of files that have missing threads in their first frame set. [#361]

  • The stream reader info now also checks whether streams are continuous by reading the first and last sample, allowing a simple way to check whether the file will likely pose problems before possibly spending a lot of time reading it. [#364]

  • Much faster localization of Mark 5B frames. [#351]

  • VLBI file readers have gained a new method locate_frames that finds frame starts near the current location. [#354]

  • For VLBI file readers, find_header now raises an exception if no frame is found (rather than return None).

  • The Mark 4 file reader’s locate_frame has been deprecated. Its functionality is replaced by locate_frames and find_header. [#354]

  • Custom stream readers can now override only part of reading a given frame and testing that it is the right one. [#355]

  • The HeaderParser class was refactored and simplified, making setting keys faster. [#356]

  • info now also provides the number of frames in a file. [#364]

3.0 (2019-08-28)

  • This version only supports python3.

New Features
  • File information now includes whether a file can be read and decoded. The readable() method on stream readers also includes whether the data in a file can be decoded. [#316]

Bug Fixes
  • Empty GUPPI headers can now be created without having to pass in verify=False. This is needed for astropy 3.2, which initializes an empty header in its revamped .fromstring method. [#314]

  • VDIF multichannel headers and payloads are now forced to have power-of-two bits per sample. [#315]

  • Bits per complete sample for VDIF payloads are now calculated correctly also for non power-of-two bits per sample. [#315]

  • Guppi raw file info now presents the correct sample rate, corrected for overlap. [#319]

  • All headers now check that samples_per_frame are set to possible numbers. [#325]

  • Getting .info on closed files no longer leads to an error (though no information can be retrieved). [#326]

Other Changes and Additions
  • Increased speed of VDIF stream reading by removing redundant verification. Reduces the overhead for verification for VDIF CHIME data from 50% (factor 1.5) to 13%. [#321]

2.0 (2018-12-12)

  • VDIF and Mark 5B readers and writers now support 1 bit per sample. [#277, #278]

Bug Fixes
  • VDIF reader will now properly ignore corrupt last frames. [#273]

  • Mark5B reader more robust against headers not being parsed correctly in Mark5BFileReader.find_header. [#275]

  • All stream readers now have a proper dtype attribute, not a corresponding np.float32 or np.complex64. [#280]

  • GUPPI stream readers no longer emit warnings on not quite FITS compliant headers. [#283]

Other Changes and Additions
  • Added release procedure to the documentation. [#268]

1.2 (2018-07-27)

New Features
Other Changes and Additions

1.1.1 (2018-07-24)

Bug Fixes

1.1 (2018-06-06)

New Features
  • Added a new baseband.file_info function, which can be used to inspect data files. [#200]

  • Added a general file opener, baseband.open which for a set of formats will check whether the file is of that format, and then load it using the corresponding module. [#198]

  • Allow users to pass a verify keyword to file openers reading streams. [#233]

  • Added support for the GUPPI format. [#212]

  • Enabled baseband.dada.open to read streams where the last frame has an incomplete payload. [#228]

API Changes
  • In analogy with Mark 5B, VDIF header time getting and setting now requires a frame rate rather than a sample rate. [#217, #218]

  • DADA and GUPPI now support passing either a start_time or offset (in addition to time) to set the start time in the header. [#240]

Bug Fixes
Other Changes and Additions
  • The baseband.data module with sample data files now has an explicit entry in the documentation. [#198]

  • Increased speed of VLBI stream reading by changing the way header sync patterns are stored, and removing redundant verification steps. VDIF sequential decode is now 5 - 10% faster (depending on the number of threads). [#241]

1.0.1 (2018-06-04)

Bug Fixes
  • Fixed a bug in baseband.dada.open where passing a squeeze setting is ignored when also passing header keywords in ‘ws’ mode. [#211]

  • Raise an exception rather than return incorrect times for Mark 5B files in which the fractional seconds are not set. [#216]

Other Changes and Additions
  • Fixed broken links and typos in the documentation. [#211]

1.0.0 (2018-04-09)

  • Initial release.

Licenses

Baseband License

Baseband is licensed under the GNU General Public License v3.0. The full text of the license can be found in LICENSE under Baseband’s root directory.

Reference/API

baseband Package

Radio baseband I/O.

Functions

file_info(name[, format])

Get format and other information from a baseband file.

open(name[, mode, format])

Open a baseband file (or sequence of files) for reading or writing.

test(\*\*kwargs)

Run the tests for the package.