Baseband¶
Welcome to the Baseband documentation! Baseband is a package affiliated with the Astropy project for reading and writing VLBI and other radio baseband files, with the aim of simplifying and streamlining data conversion and standardization. It provides:
File input/output objects for supported radio baseband formats, enabling selective decoding of data into
Numpy arrays
, and encoding user-defined arrays into baseband formats. Supported formats are listed under specific file formats.The ability to read from and write to an ordered sequence of files as if it was a single file.
If you used this package in your research, please cite it via DOI 10.5281/zenodo.1214268.
Overview¶
Installation¶
Installing Baseband¶
To install Baseband with pip, run:
pip3 install baseband
Note
To run without pip potentially updating Numpy and Astropy, run, include the
--no-deps
flag.
Obtaining Source Code¶
The source code and latest development version of Baseband can found on its GitHub repo. You can get your own clone using:
git clone git@github.com:mhvk/baseband.git
Of course, it is even better to fork it on GitHub, and then clone your own repository, so that you can more easily contribute!
Running Code without Installing¶
As Baseband is purely Python, it can be used without being built or installed,
by appending the directory it is located in to the PYTHON_PATH
environment
variable. Alternatively, you can use sys.path
within Python to append
the path:
import sys
sys.path.append(BASEBAND_PATH)
where BASEBAND_PATH
is the directory you downloaded or cloned Baseband into.
Installing Source Code¶
If you want Baseband to be more broadly available, either to all users on a
system, or within, say, a virtual environment, use setup.py
in
the root directory by calling:
python3 setup.py install
For general information on setup.py
, see its documentation . Many of the
setup.py
options are inherited from Astropy (specifically, from Astropy
-affiliated package manager) and
are described further in Astropy’s installation documentation .
Testing the Installation¶
The root directory setup.py
can also be used to test if Baseband can
successfully be run on your system:
python3 setup.py test
or, inside of Python:
import baseband
baseband.test()
These tests require pytest to be installed. Further documentation can be found on the Astropy running tests documentation .
Building Documentation¶
Note
As with Astropy, building the documentation is unnecessary unless you are writing new documentation or do not have internet access, as Baseband’s documentation is available online at baseband.readthedocs.io.
The Baseband documentation can be built again using setup.py
from
the root directory:
python3 setup.py build_docs
This requires to have Sphinx installed (and its dependencies).
Getting Started with Baseband¶
This quickstart tutorial is meant to help the reader hit the ground running with Baseband. For more detail, including writing to files, see Using Baseband.
For installation instructions, please see Installing Baseband.
When using Baseband, we typically will also use numpy
, astropy.units
, and
astropy.time.Time
. Let’s import all of these:
>>> import baseband
>>> import numpy as np
>>> import astropy.units as u
>>> from astropy.time import Time
Opening Files¶
For this tutorial, we’ll use two sample files:
>>> from baseband.data import SAMPLE_VDIF, SAMPLE_MARK5B
The first file is a VDIF one created from EVN/VLBA observations of Black Widow pulsar PSR B1957+20, while the second is a Mark 5B from EVN/WSRT observations of the same pulsar.
To open the VDIF file:
>>> fh_vdif = baseband.open(SAMPLE_VDIF)
Opening the Mark 5B file is slightly more involved, as not all required metadata is stored in the file itself:
>>> fh_m5b = baseband.open(SAMPLE_MARK5B, nchan=8, sample_rate=32*u.MHz,
... ref_time=Time('2014-06-13 12:00:00'))
Here, we’ve manually passed in as keywords the number of channels, the
sample rate (number of samples per channel per second) as an
astropy.units.Quantity
, and a reference time within 500 days of the start of
the observation as an astropy.time.Time
. That last keyword is needed to
properly read timestamps from the Mark 5B file.
baseband.open
tries to open files using all available formats, returning
whichever is successful. If you know the format of your file, you can pass
its name with the format
keyword, or directly use its format opener (for
VDIF, it is baseband.vdif.open
). Also, the baseband.file_info
function can
help determine the format and any missing information needed by baseband.open
- see Inspecting Files.
Do you have a sequence of files you want to read in? You can pass a list of
filenames to baseband.open
, and it will open them up as if they were a single
file! See Reading or Writing to a Sequence of Files.
Reading Files¶
Radio baseband files are generally composed of blocks of binary data, or payloads, stored alongside corresponding metadata, or headers. Each header and payload combination is known as a data frame, and most formats feature files composed of a long series of frames.
Baseband file objects are frame-reading wrappers around Python file objects,
and have the same interface, including
seek
for seeking to different parts of the file,
tell
for reporting the file
pointer’s current position, and
read
for reading data. The
main difference is that Baseband file objects read and navigate in units of
samples.
Let’s read some samples from the VDIF file:
>>> data = fh_vdif.read(3)
>>> data
array([[-1. , 1. , 1. , -1. , -1. , -1. ,
3.316505, 3.316505],
[-1. , 1. , -1. , 1. , 1. , 1. ,
3.316505, 3.316505],
[ 3.316505, 1. , -1. , -1. , 1. , 3.316505,
-3.316505, 3.316505]], dtype=float32)
>>> data.shape
(3, 8)
Baseband decodes binary data into ndarray
objects. Notice we
input 3
, and received an array of shape (3, 8)
; this is because
there are 8 VDIF threads. Threads and channels represent different
components of the data such as polarizations or frequency sub-bands, and the
collection of all components at one point in time is referred to as a
complete sample. Baseband reads in units of complete samples,
and works with sample rates in units of complete samples per second (including
with the Mark 5B example above). Like an ndarray
, calling
fh_vdif.shape
returns the shape of the entire dataset:
>>> fh_vdif.shape
(40000, 8)
The first axis represents time, and all additional axes represent the shape of a complete sample. A labelled version of the complete sample shape is given by:
>>> fh_vdif.sample_shape
SampleShape(nthread=8)
Baseband extracts basic properties and header metadata from opened files. Notably, the start and end times of the file are given by:
>>> fh_vdif.start_time
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.000000000>
>>> fh_vdif.stop_time
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.001250000>
For an overview of the file, we can either print fh_vdif
itself, or use the
info
method:
>>> fh_vdif
<VDIFStreamReader name=... offset=3
sample_rate=32.0 MHz, samples_per_frame=20000,
sample_shape=SampleShape(nthread=8),
bps=2, complex_data=False, edv=3, station=65532,
start_time=2014-06-16T05:56:07.000000000>
>>> fh_vdif.info
Stream information:
start_time = 2014-06-16T05:56:07.000000000
stop_time = 2014-06-16T05:56:07.001250000
sample_rate = 32.0 MHz
shape = (40000, 8)
format = vdif
bps = 2
complex_data = False
verify = fix
readable = True
checks: decodable: True
continuous: no obvious gaps
File information:
edv = 3
number_of_frames = 16
thread_ids = [0, 1, 2, 3, 4, 5, 6, 7]
number_of_framesets = 2
frame_rate = 1600.0 Hz
samples_per_frame = 20000
sample_shape = (8, 1)
Seeking is also done in units of complete samples, which is equivalent to seeking in timesteps. Let’s move forward 100 complete samples:
>>> fh_vdif.seek(100)
100
Seeking from the end or current position is also possible, using the same syntax as for typical file objects. It is also possible to seek in units of time:
>>> fh_vdif.seek(-1000, 2) # Seek 1000 samples from end.
39000
>>> fh_vdif.seek(10*u.us, 1) # Seek 10 us from current position.
39320
fh_vdif.tell
returns the current offset in samples or in time:
>>> fh_vdif.tell()
39320
>>> fh_vdif.tell(unit=u.us) # Time since start of file.
<Quantity 1228.75 us>
>>> fh_vdif.tell(unit='time')
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.001228750>
Finally, we close both files:
>>> fh_vdif.close()
>>> fh_m5b.close()
Using Baseband¶
For most file formats, one can simply import baseband and use baseband.open
to access the file. This gives one a filehandle from which one can read
decoded samples:
>>> import baseband
>>> from baseband.data import SAMPLE_DADA
>>> fh = baseband.open(SAMPLE_DADA)
>>> fh.read(3)
array([[ -38.-38.j, -38.-38.j],
[ -38.-38.j, -40. +0.j],
[-105.+60.j, 85.-15.j]], dtype=complex64)
>>> fh.close()
For other file formats, a bit more information is needed. Below, we cover the basics of inspecting files, reading from and writing to files, converting from one format to another, and diagnosing problems. We assume that Baseband as well as NumPy and the Astropy units module have been imported:
>>> import baseband
>>> import numpy as np
>>> import astropy.units as u
Inspecting Files¶
Baseband allows you to quickly determine basic properties of a file, including
what format it is, using the baseband.file_info
function. For instance, it
shows that the sample VDIF file that comes with Baseband is very short (sample
files can all be found in the baseband.data
module):
>>> import baseband.data
>>> baseband.file_info(baseband.data.SAMPLE_VDIF)
Stream information:
start_time = 2014-06-16T05:56:07.000000000
stop_time = 2014-06-16T05:56:07.001250000
sample_rate = 32.0 MHz
shape = (40000, 8)
format = vdif
bps = 2
complex_data = False
verify = fix
readable = True
checks: decodable: True
continuous: no obvious gaps
File information:
edv = 3
number_of_frames = 16
thread_ids = [0, 1, 2, 3, 4, 5, 6, 7]
number_of_framesets = 2
frame_rate = 1600.0 Hz
samples_per_frame = 20000
sample_shape = (8, 1)
The same function will also tell you when more information is needed. For instance, for Mark 5B files one needs the number of channels used, as well as (roughly) when the data were taken:
>>> baseband.file_info(baseband.data.SAMPLE_MARK5B)
File information:
format = mark5b
number_of_frames = 4
frame_rate = 6400.0 Hz
bps = 2
complex_data = False
readable = False
missing: nchan: needed to determine sample shape, frame rate, decode data.
kday, ref_time: needed to infer full times.
>>> from astropy.time import Time
>>> baseband.file_info(baseband.data.SAMPLE_MARK5B, nchan=8, ref_time=Time('2014-01-01'))
Stream information:
start_time = 2014-06-13T05:30:01.000000000
stop_time = 2014-06-13T05:30:01.000625000
sample_rate = 32.0 MHz
shape = (20000, 8)
format = mark5b
bps = 2
complex_data = False
verify = fix
readable = True
checks: decodable: True
continuous: no obvious gaps
File information:
number_of_frames = 4
frame_rate = 6400.0 Hz
samples_per_frame = 5000
sample_shape = (8,)
The information is gleaned from info
properties on the various file and
stream readers (see below).
Note
The one format for which file_info
works a bit differently is
GSB, as this format requires separate time-stamp and raw data
files. Only the timestamp file can be inspected usefully.
Reading Files¶
Opening Files¶
As shown at the very start, files can be opened with the general
baseband.open
function. This will try to determine the file type using
file_info
, load the corresponding baseband module, and then open
the file using that module’s master input/output function.
Generally, if one knows the file type, one might as well work with the
corresponding module directly. For instance, to explicitly use the DADA
reader to open the sample DADA file included in Baseband, one can use the DADA
module’s open
function:
>>> from baseband import dada
>>> from baseband.data import SAMPLE_DADA
>>> fh = dada.open(SAMPLE_DADA, 'rs')
>>> fh.read(3)
array([[ -38.-38.j, -38.-38.j],
[ -38.-38.j, -40. +0.j],
[-105.+60.j, 85.-15.j]], dtype=complex64)
>>> fh.close()
In general, file I/O and data manipulation use the same syntax across all file
formats. When opening Mark 4 and Mark 5B files, however, some additional
arguments may need to be passed (as was the case above for inspecting a Mark
5B file, and indeed this is a good way to find out what is needed).
Notes on such features and quirks of individual formats can be
found in the API entries of their open
functions, and within the
Specific file format documentation.
For the rest of this section, we will stick to VDIF files.
Decoding Data and the Sample File Pointer¶
By giving the openers a 'rs'
flag, which is the default, we open files in
“stream reader” mode, where a file is accessed as if it were a stream of
samples. For VDIF, open
will then return an instance of
VDIFStreamReader
, which wraps a raw data file with
methods to decode the binary data frames and seek to and read data
samples. To decode the first 12 samples into a ndarray
, we would
use the read
method:
>>> from baseband import vdif
>>> from baseband.data import SAMPLE_VDIF
>>> fh = vdif.open(SAMPLE_VDIF, 'rs')
>>> d = fh.read(12)
>>> type(d)
<... 'numpy.ndarray'>
>>> d.shape
(12, 8)
>>> d[:, 0].astype(int) # First thread.
array([-1, -1, 3, -1, 1, -1, 3, -1, 1, 3, -1, 1])
As discussed in detail in the VDIF section, VDIF files are
sequences of data frames, each of which is comprised of a header (which
holds information like the time at which the data was taken) and a
payload, or block of data. Multiple concurrent time streams can be
stored within a single frame; each of these is called a “channel”.
Moreover, groups of channels can be stored over multiple frames, each of which
is called a “thread”. Our sample file is an “8-thread, single-channel
file” (8 concurrent time streams with 1 stream per frame), and in the example
above, fh.read
decoded the first 12 samples from all 8 threads, mapping
thread number to the second axis of the decoded data array. Reading files with
multiple threads and channels will produce 3-dimensional arrays.
fh
includes shape
, size
and ndim
, which give the shape, total
number of elements and dimensionality of the file’s entire dataset if it was
decoded into an array. The number of complete samples - the set of samples
from all available threads and channels for one point in time - in the file is
given by the first element in shape
:
>>> fh.shape # Shape of all data from the file in decoded array form.
(40000, 8)
>>> fh.shape[0] # Number of complete samples.
40000
>>> fh.size
320000
>>> fh.ndim
2
The shape of a single complete sample, including names indicating the meaning of shape dimensions, is retrievable using:
>>> fh.sample_shape
SampleShape(nthread=8)
By default, dimensions of length unity are squeezed, or removed from the
sample shape. To retain them, we can pass squeeze=False
to
open
:
>>> fhns = vdif.open(SAMPLE_VDIF, 'rs', squeeze=False)
>>> fhns.sample_shape # Sample shape now keeps channel dimension.
SampleShape(nthread=8, nchan=1)
>>> fhns.ndim # fh.shape and fh.ndim also change with squeezing.
3
>>> d2 = fhns.read(12)
>>> d2.shape # Decoded data has channel dimension.
(12, 8, 1)
>>> fhns.close()
Basic information about the file is obtained by either by fh.info
or simply
fh
itself:
>>> fh.info
Stream information:
start_time = 2014-06-16T05:56:07.000000000
stop_time = 2014-06-16T05:56:07.001250000
sample_rate = 32.0 MHz
shape = (40000, 8)
format = vdif
bps = 2
complex_data = False
verify = fix
readable = True
checks: decodable: True
continuous: no obvious gaps
File information:
edv = 3
number_of_frames = 16
thread_ids = [0, 1, 2, 3, 4, 5, 6, 7]
number_of_framesets = 2
frame_rate = 1600.0 Hz
samples_per_frame = 20000
sample_shape = (8, 1)
>>> fh
<VDIFStreamReader name=... offset=12
sample_rate=32.0 MHz, samples_per_frame=20000,
sample_shape=SampleShape(nthread=8),
bps=2, complex_data=False, edv=3, station=65532,
start_time=2014-06-16T05:56:07.000000000>
Not coincidentally, the first is identical to what we found above using file_info
.
The filehandle itself also shows the offset
, the current location of the
sample file pointer. Above, it is at 12
since we have read in 12 (complete)
samples. If we called fh.read (12)
again we would get the next 12 samples.
If we instead called fh.read()
, it would read from the pointer’s current
position to the end of the file. If we wanted all the data in one array, we
would move the file pointer back to the start of file, using fh.seek
,
before reading:
>>> fh.seek(0) # Seek to sample 0. Seek returns its offset in counts.
0
>>> d_complete = fh.read()
>>> d_complete.shape
(40000, 8)
We can also move the pointer with respect to the end of file by passing 2
as a second argument:
>>> fh.seek(-100, 2) # Second arg is 0 (start of file) by default.
39900
>>> d_end = fh.read(100)
>>> np.array_equal(d_complete[-100:], d_end)
True
-100
means 100 samples before the end of file, so d_end
is equal to
the last 100 entries of d_complete
. Baseband only keeps the most recently
accessed data frame in memory, making it possible to analyze (normally large)
files through selective decoding using seek
and read
.
Note
As with file pointers in general, fh.seek
will not return an error if
one seeks beyond the end of file. Attempting to read beyond
the end of file, however, will result in an EOFError
.
To determine where the pointer is located, we use fh.tell()
:
>>> fh.tell()
40000
>>> fh.close()
Caution should be used when decoding large blocks of data using fh.read
.
For typical files, the resulting arrays are far too large to hold in memory.
Seeking and Telling in Time With the Sample Pointer¶
We can use seek
and tell
with units of time rather than samples. To do
this with tell
, we can pass an appropriate astropy.units.Unit
object to
its optional unit
parameter:
>>> fh = vdif.open(SAMPLE_VDIF, 'rs')
>>> fh.seek(40000)
40000
>>> fh.tell(unit=u.ms)
<Quantity 1.25 ms>
Passing the string 'time'
reports the pointer’s location in absolute time:
>>> fh.tell(unit='time')
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.001250000>
We can also pass an absolute astropy.time.Time
, or a positive or negative time
difference TimeDelta
or astropy.units.Quantity
to seek
.
If the offset is a Time
object, the second argument to seek is
ignored.:
>>> from astropy.time.core import TimeDelta
>>> from astropy.time import Time
>>> fh.seek(TimeDelta(-5e-4, format='sec'), 2) # Seek -0.5 ms from end.
24000
>>> fh.seek(0.25*u.ms, 1) # Seek 0.25 ms from current position.
32000
>>> # Seek to specific time.
>>> fh.seek(Time('2014-06-16T05:56:07.001125'))
36000
We can retrieve the time of the first sample in the file using start_time
,
the time immediately after the last sample using stop_time
, and the time
of the pointer’s current location (equivalent to fh.tell(unit='time')
)
using time
:
>>> fh.start_time
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.000000000>
>>> fh.stop_time
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.001250000>
>>> fh.time
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.001125000>
>>> fh.close()
Extracting Header Information¶
The first header of the file is stored as the header0
attribute of the
stream reader object; it gives direct access to header properties via keyword
lookup:
>>> with vdif.open(SAMPLE_VDIF, 'rs') as fh:
... header0 = fh.header0
>>> header0['frame_length']
629
The full list of keywords is available by printing out header0
:
>>> header0
<VDIFHeader3 invalid_data: False,
legacy_mode: False,
seconds: 14363767,
_1_30_2: 0,
ref_epoch: 28,
frame_nr: 0,
vdif_version: 1,
lg2_nchan: 0,
frame_length: 629,
complex_data: False,
bits_per_sample: 1,
thread_id: 1,
station_id: 65532,
edv: 3,
sampling_unit: True,
sampling_rate: 16,
sync_pattern: 0xacabfeed,
loif_tuning: 859832320,
_7_28_4: 15,
dbe_unit: 2,
if_nr: 0,
subband: 1,
sideband: True,
major_rev: 1,
minor_rev: 5,
personality: 131>
A number of derived properties, such as the time (as a Time
object), are also available through the header object:
>>> header0.time
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.000000000>
These are listed in the API for each header class. For example, the sample VDIF file’s headers are of class:
>>> type(header0)
<class 'baseband.vdif.header.VDIFHeader3'>
and so its attributes can be found here
.
Reading Specific Components of the Data¶
By default, fh.read()
returns complete samples, i.e. with all
available threads, polarizations or channels. If we were only interested in
decoding a subset of the complete sample, we can select specific
components by passing indexing objects to the subset
keyword in open. For
example, if we only wanted thread 3 of the sample VDIF file:
>>> fh = vdif.open(SAMPLE_VDIF, 'rs', subset=3)
>>> fh.sample_shape
()
>>> d = fh.read(20000)
>>> d.shape
(20000,)
>>> fh.subset
(3,)
>>> fh.close()
Since by default data are squeezed, one obtains a data stream with just a
single dimension. If one would like to keep all information, one has to pass
squeeze=False
and also make subset
a list (or slice):
>>> fh = vdif.open(SAMPLE_VDIF, 'rs', subset=[3], squeeze=False)
>>> fh.sample_shape
SampleShape(nthread=1, nchan=1)
>>> d = fh.read(20000)
>>> d.shape
(20000, 1, 1)
>>> fh.close()
Data with multi-dimensional samples can be subset by passing a tuple
of
indexing objects with the same dimensional ordering as the (possibly squeezed)
sample shape; in the case of the sample VDIF with squeeze=False
, this is
threads, then channels. For example, if we wished to select threads 1 and 3,
and channel 0:
>>> fh = vdif.open(SAMPLE_VDIF, 'rs', subset=([1, 3], 0), squeeze=False)
>>> fh.sample_shape
SampleShape(nthread=2)
>>> fh.close()
Generally, subset
accepts any object that can be used to index a
numpy.ndarray
, including advanced indexing (as done above, with
subset=([1, 3], 0)
). If possible, slices should be used instead
of list of integers, since indexing with them returns a view rather
than a copy and thus avoid unnecessary processing and memory allocation.
(An exception to this is VDIF threads, where the subset is used to selectively
read specific threads, and thus is not used for actual slicing of the data.)
Writing to Files and Format Conversion¶
Writing to a File¶
To write data to disk, we again use open
. Writing data in a particular
format requires both the header and data samples. For modifying an existing
file, we have both the old header and old data handy.
As a simple example, let’s read in the 8-thread, single-channel sample VDIF file and rewrite it as an single-thread, 8-channel one, which, for example, may be necessary for compatibility with DSPSR:
>>> import baseband.vdif as vdif
>>> from baseband.data import SAMPLE_VDIF
>>> fr = vdif.open(SAMPLE_VDIF, 'rs')
>>> fw = vdif.open('test_vdif.vdif', 'ws',
... sample_rate=fr.sample_rate,
... samples_per_frame=fr.samples_per_frame // 8,
... nthread=1, nchan=fr.sample_shape.nthread,
... complex_data=fr.complex_data, bps=fr.bps,
... edv=fr.header0.edv, station=fr.header0.station,
... time=fr.start_time)
The minimal parameters needed to generate a file are listed under the
documentation for each format’s open
, though comprehensive lists can be
found in the documentation for each format’s stream writer class (eg. for
VDIF, it’s under VDIFStreamWriter
). In practice we
specify as many relevant header properties as available to obtain a particular
file structure. If we possess the exact first header of the file, it can
simply be passed to open
via the header
keyword. In the example above,
though, we manually switch the values of nthread
and nchan
. Because
VDIF EDV = 3 requires each frame’s payload to contain 5000 bytes, and nchan
is now a factor of 8 larger, we decrease samples_per_frame
, the number of
complete (i.e. all threads and channels included) samples per frame, by a
factor of 8.
Encoding samples and writing data to file is done by passing data arrays into
fw
’s write
method. The first
dimension of the arrays is sample number, and the remaining dimensions must be
as given by fw.sample_shape
:
>>> fw.sample_shape
SampleShape(nchan=8)
In this case, the required dimensions are the same as the arrays from
fr.read
. We can thus write the data to file using:
>>> while fr.tell() < fr.shape[0]:
... fw.write(fr.read(fr.samples_per_frame))
>>> fr.close()
>>> fw.close()
For our sample file, we could simply have written
fw.write(fr.read())
instead of the loop, but for large files, reading and writing should be done in smaller chunks to minimize memory usage. Baseband stores only the data frame or frame set being read or written to in memory.
We can check the validity of our new file by re-opening it:
>>> fr = vdif.open(SAMPLE_VDIF, 'rs')
>>> fh = vdif.open('test_vdif.vdif', 'rs')
>>> fh.sample_shape
SampleShape(nchan=8)
>>> np.all(fr.read() == fh.read())
True
>>> fr.close()
>>> fh.close()
Note
One can also use the top-level open
function for writing,
with the file format passed in via its format
argument.
File Format Conversion¶
It is often preferable to convert data from one file format to another that offers wider compatibility, or better fits the structure of the data. As an example, we convert the sample Mark 4 data to VDIF.
Since we don’t have a VDIF header handy, we pass the relevant Mark 4 header
values into vdif.open
to create one:
>>> import baseband.mark4 as mark4
>>> from baseband.data import SAMPLE_MARK4
>>> fr = mark4.open(SAMPLE_MARK4, 'rs', ntrack=64, decade=2010)
>>> spf = 640 # fanout * 160 = 640 invalid samples per Mark 4 frame
>>> fw = vdif.open('m4convert.vdif', 'ws', sample_rate=fr.sample_rate,
... samples_per_frame=spf, nthread=1,
... nchan=fr.sample_shape.nchan,
... complex_data=fr.complex_data, bps=fr.bps,
... edv=1, time=fr.start_time)
We choose edv = 1
since it’s the simplest VDIF EDV whose header includes a
sampling rate. The concept of threads does not exist in Mark 4, so the file
effectively has nthread = 1
. As discussed in the Mark 4
documentation, the data at the start of each frame is effectively
overwritten by the header and are represented by invalid samples in the stream
reader. We set samples_per_frame
to 640
so that each section of
invalid data is captured in a single frame.
We now write the data to file, manually flagging each invalid data frame:
>>> while fr.tell() < fr.shape[0]:
... d = fr.read(fr.samples_per_frame)
... fw.write(d[:640], valid=False)
... fw.write(d[640:])
>>> fr.close()
>>> fw.close()
Lastly, we check our new file:
>>> fr = mark4.open(SAMPLE_MARK4, 'rs', ntrack=64, decade=2010)
>>> fh = vdif.open('m4convert.vdif', 'rs')
>>> np.all(fr.read() == fh.read())
True
>>> fr.close()
>>> fh.close()
For file format conversion in general, we have to consider how to properly
scale our data to make the best use of the dynamic range of the new encoded
format. For VLBI formats like VDIF, Mark 4 and Mark 5B, samples of the same
size have the same scale, which is why we did not have to rescale our data when
writing 2-bits-per-sample Mark 4 data to a 2-bits-per-sample VDIF file.
Rescaling is necessary, though, to convert DADA or GSB to VDIF. For examples
of rescaling, see the baseband/tests/test_conversion.py
file.
Reading or Writing to a Sequence of Files¶
Data from one continuous observation is sometimes spread over a sequence of
files. Baseband includes the sequentialfile
module for
reading in a sequence as if it were one contiguous file. This module is called
when a list, tuple or filename template is passed to eg. baseband.open
or
baseband.vdif.open
, making the syntax for handling multiple files nearly
identical to that for single ones.
As an example, we write the data from the sample VDIF file
baseband/data/sample.vdif
into a sequence of two files and then read the
files back in. We first load the required data:
>>> from baseband import vdif
>>> from baseband.data import SAMPLE_VDIF
>>> import numpy as np
>>> fh = vdif.open(SAMPLE_VDIF, 'rs')
>>> d = fh.read()
We then create a sequence of filenames:
>>> filenames = ["seqvdif_{0}".format(i) for i in range(2)]
When passing filenames
to open
, we must also pass
file_size
, the file size in bytes, in addition to the usual kwargs
for
writing a file. Since we wish to split the sample file in two, and the file
consists of two framesets, we set file_size
to the byte size of one
frameset (we could have equivalently set it to fh.fh_raw.seek(0, 2) // 2
):
>>> file_size = 8 * fh.header0.frame_nbytes
>>> fw = vdif.open(filenames, 'ws', header0=fh.header0,
... file_size=file_size, sample_rate=fh.sample_rate,
... nthread=fh.sample_shape.nthread)
>>> fw.write(d)
>>> fw.close() # This implicitly closes fwr.
Note
file_size
sets the maximum size a file can reach before the
writer writes to the next one, so setting file_size
to a larger
value than above will lead to the two files having different sizes. By
default, file_size=None
, meaning it can be arbitrarily large, in which
case only one file will be created.
We now read the sequence and confirm their contents are identical to those of the sample file:
>>> fr = vdif.open(filenames, 'rs', sample_rate=fh.sample_rate)
>>> fr.header0.time == fh.header0.time
True
>>> np.all(fr.read() == d)
True
>>> fr.close()
When reading, the filename sequence must be ordered in time.
We can also open the second file on its own and confirm it contains the second frameset of the sample file:
>>> fsf = vdif.open(filenames[1], mode='rs', sample_rate=fh.sample_rate)
>>> fh.seek(fh.shape[0] // 2) # Seek to start of second frameset.
20000
>>> fsf.header0.time == fh.time
True
>>> np.all(fsf.read() == fh.read())
True
>>> fsf.close()
In situations where the file_size
is known, but not the total number of
files to write, one may use the FileNameSequencer
class to create an iterable without a user-defined size. The class is
initialized with a template string that can be formatted with keywords, and a
optional header
that can either be an actual header or a dict
with the
relevant keywords. The template may also contain the special keyword
‘{file_nr}’, which is equal to the indexing value (instead of a header entry).
As an example, let us create a sequencer:
>>> from baseband.helpers import sequentialfile as sf
>>> filenames = sf.FileNameSequencer('f.edv{edv:d}.{file_nr:03d}.vdif',
... header=fh.header0)
Indexing the sequencer using square brackets returns a filename:
>>> filenames[0]
'f.edv3.000.vdif'
>>> filenames[42]
'f.edv3.042.vdif'
The sequencer has extracted the EDV from the header we passed in, and the file number from the index. We can use the sequencer to write a VDIF file sequence:
>>> fw = vdif.open(filenames, 'ws', header0=fh.header0,
... file_size=file_size, sample_rate=fh.sample_rate,
... nthread=fh.sample_shape.nthread)
>>> d = np.concatenate([d, d, d])
>>> fw.write(d)
>>> fw.close()
This creates 6 files:
>>> import glob
>>> len(glob.glob("f.edv*.vdif"))
6
We can read the file sequence using the same sequencer. In reading mode, the sequencer determines the number of files by finding the largest file available that fits the template:
>>> fr = vdif.open(filenames, 'rs', sample_rate=fh.sample_rate)
>>> fr.header0.time == fh.header0.time
True
>>> np.all(fr.read() == d)
True
>>> fr.close()
>>> fh.close() # Close sample file as well.
Because DADA and GUPPI data are usually stored in file sequences with names
derived from header values - eg. ‘puppi_58132_J1810+1744_2176.0010.raw’,
their format openers have template support built-in. For usage details, please
see the API entries for baseband.dada.open
and baseband.guppi.open
.
Diagnosing problems with baseband files¶
Little is more annoying than starting a very long analysis script only to find the reader crashed with an error near the end. Unfortunately, while there is only one way for success, there are many for failure. Some, though, can be found by inspecting files. To see what would show up for a file that misses a frame, we first construct one:
>>> from astropy.time import Time
>>> from baseband import vdif
>>> fc = vdif.open('corrupt.vdif', 'ws', edv=1, nthread=2,
... bps=8, samples_per_frame=16,
... time=Time('J2010'), sample_rate=16*u.kHz)
>>> fc.write(np.zeros((8000, 2)))
>>> fc.fh_raw.seek(-100, 1)
47900
>>> fc.write(np.zeros((8000, 2)))
>>> fc.close()
Here, rewinding the internal raw file pointer a bit to simulate “missing bytes” is an implementation detail that one should not rely on!
Now check its info
:
>>> fh = baseband.vdif.open('corrupt.vdif', 'rs', verify=True)
>>> fh.info.readable
False
>>> fh.info
Stream information:
start_time = 2009-12-31T23:58:53.816000000
stop_time = 2009-12-31T23:58:54.816000000
sample_rate = 0.016 MHz
shape = (16000, 2)
format = vdif
bps = 8
complex_data = False
verify = True
readable = False
checks: decodable: True
continuous: False
errors: continuous: While reading at 7968: AssertionError()
warnings: number_of_frames: file contains non-integer number (1997.9166666666667) of frames
File information:
edv = 1
thread_ids = [0, 1]
frame_rate = 1000.0 Hz
samples_per_frame = 16
sample_shape = (2, 1)
>>> fh.close()
In detail, the error is given for a position earlier than the one we corrupted, because internally baseband reads a frame ahead since a corrupted frame typically means something is bad before as well.
This particular problem is not bad, since the VDIF reader can deal with
missing frames. Indeed, when one opens the file with the default
verify='fix'
, one gets:
>>> fh = baseband.vdif.open('corrupt.vdif', 'rs')
>>> fh.info
Stream information:
start_time = 2009-12-31T23:58:53.816000000
stop_time = 2009-12-31T23:58:54.816000000
sample_rate = 0.016 MHz
shape = (16000, 2)
format = vdif
bps = 8
complex_data = False
verify = fix
readable = True
checks: decodable: True
continuous: fixable gaps
warnings: number_of_frames: file contains non-integer number (1997.9166666666667) of frames
continuous: While reading at 7968: problem loading frame set 498. Thread(s) [1] missing; set to invalid.
File information:
edv = 1
thread_ids = [0, 1]
frame_rate = 1000.0 Hz
samples_per_frame = 16
sample_shape = (2, 1)
>>> fh.close()
Glossary¶
- channel
A single component of the complete sample, or a stream thereof. They typically represent one frequency sub-band, the output from a single antenna, or (for channelized data) one spectral or Fourier channel, ie. one part of a Fourier spectrum.
- complete sample
Set of all component samples - ie. from all threads, polarizations, channels, etc. - for one point in time. Its dimensions are given by the sample shape.
- component
One individual thread and channel, or one polarization and channel, etc. Component samples each occupy one element in decoded data arrays. A component sample is composed of one elementary sample if it is real, and two if it is complex.
- data frame
A block of time-sampled data, or payload, accompanied by a header. “Frame” for short.
- data frameset
In the VDIF format, the set of all data frames representing the same segment of time. Each data frame consists of sets of channels from different threads.
- elementary sample
The smallest subdivision of a complete sample, i.e. the real / imaginary part of one component of a complete sample.
- header
Metadata accompanying a data frame.
- payload
The data within a data frame.
- sample
Data from one point in time. Complete samples contain samples from all components, while elementary samples are one part of one component.
- sample rate
Rate of complete samples.
- sample shape
The lengths of the dimensions of the complete sample.
- squeezing
The removal of any dimensions of length unity from decoded data.
- stream
Timeseries of samples; may refer to all of, or a subsection of, the dataset.
- subset
A subset of a complete sample, in particular one defined by the user for selective decoding.
- thread
A collection of channels from the complete sample, or a stream thereof. For VDIF, each thread is carried by a separate (set of) data frame(s).
Specific File Formats¶
Baseband’s code is subdivided into its supported file formats, and the following sections contain format specifications, usage notes, troubleshooting help and APIs for each.
VDIF¶
The VLBI Data Interchange Format (VDIF) was introduced in 2009 to standardize VLBI data transfer and storage. Detailed specifications are found in VDIF’s specification document.
File Structure¶
A VDIF file is composed of data frames. Each has a header of eight 32-bit words (32 bytes; the exception is the “legacy VDIF” format, which is four words, or 16 bytes, long), and a payload that ranges from 32 bytes to ~134 megabytes. Both are little-endian. The first four words of a VDIF header hold the same information in all VDIF files, but the last four words hold optional user-defined data. The layout of these four words is specified by the file’s extended-data version, or EDV. More detailed information on the header can be found in the tutorial for supporting a new VDIF EDV.
A data frame may carry one or multiple channels, and a stream of data frames all carrying the same (set of) channels is known as a thread and denoted by its thread ID. The collection of frames representing the same time segment (and all possible thread IDs) is called a data frameset (or just “frameset”).
Strict time and thread ID ordering of frames in the stream, while considered part of VDIF best practices, is not mandated, and cannot be guaranteed during data transmission over the internet.
Usage Notes¶
This section covers reading and writing VDIF files with Baseband; general
usage can be found under the Using Baseband section.
For situations in which one is unsure of a file’s format, Baseband features the
general baseband.open
and baseband.file_info
functions, which are also
discussed in Using Baseband. The examples below use
the small sample file baseband/data/sample.vdif
, and the numpy
,
astropy.units
, and baseband.vdif
modules:
>>> import numpy as np
>>> from baseband import vdif
>>> import astropy.units as u
>>> from baseband.data import SAMPLE_VDIF
Simple reading and writing of VDIF files can be done entirely using
open
. Opening in binary mode provides a normal file
reader, but extended with methods to read a VDIFFrameSet
data container for storing a frame set as well as
VDIFFrame
one for storing a single frame:
>>> fh = vdif.open(SAMPLE_VDIF, 'rb')
>>> fs = fh.read_frameset()
>>> fs.data.shape
(20000, 8, 1)
>>> fr = fh.read_frame()
>>> fr.data.shape
(20000, 1)
>>> fh.close()
(As with other formats, fr.data
is a read-only property of the frame.)
Opening in stream mode wraps the low-level routines such that reading and writing is in units of samples. It also provides access to header information:
>>> fh = vdif.open(SAMPLE_VDIF, 'rs')
>>> fh
<VDIFStreamReader name=... offset=0
sample_rate=32.0 MHz, samples_per_frame=20000,
sample_shape=SampleShape(nthread=8),
bps=2, complex_data=False, edv=3, station=65532,
start_time=2014-06-16T05:56:07.000000000>
>>> d = fh.read(12)
>>> d.shape
(12, 8)
>>> d[:, 0].astype(int) # first thread
array([-1, -1, 3, -1, 1, -1, 3, -1, 1, 3, -1, 1])
>>> fh.close()
To set up a file for writing needs quite a bit of header information. Not coincidentally, what is given by the reader above suffices:
>>> from astropy.time import Time
>>> fw = vdif.open('try.vdif', 'ws', sample_rate=32*u.MHz,
... samples_per_frame=20000, nchan=1, nthread=2,
... complex_data=False, bps=2, edv=3, station=65532,
... time=Time('2014-06-16T05:56:07.000000000'))
>>> with vdif.open(SAMPLE_VDIF, 'rs', subset=[1, 3]) as fh:
... d = fh.read(20000) # Get some data to write
>>> fw.write(d)
>>> fw.close()
>>> fh = vdif.open('try.vdif', 'rs')
>>> d2 = fh.read(12)
>>> np.all(d[:12] == d2)
True
>>> fh.close()
Here is a simple example to copy a VDIF file. We use the sort=False
option
to ensure the frames are written exactly in the same order, so the files should
be identical:
>>> with vdif.open(SAMPLE_VDIF, 'rb') as fr, vdif.open('try.vdif', 'wb') as fw:
... while True:
... try:
... fw.write_frameset(fr.read_frameset(sort=False))
... except:
... break
For small files, one could just do:
>>> with vdif.open(SAMPLE_VDIF, 'rs') as fr, \
... vdif.open('try.vdif', 'ws', header0=fr.header0,
... sample_rate=fr.sample_rate,
... nthread=fr.sample_shape.nthread) as fw:
... fw.write(fr.read())
This copies everything to memory, though, and some header information is lost.
Troubleshooting¶
In situations where the VDIF files being handled are corrupted or modified
in an unusual way, using open
will likely lead to an
exception being raised or to unexpected behavior. In such cases, it may still
be possible to read in the data. Below, we provide a few solutions and
workarounds to do so.
Note
This list is certainly incomplete. If you have an issue (solved or otherwise) you believe should be on this list, please e-mail the contributors.
AssertionError when checking EDV in header verify
function¶
All VDIF header classes (other than VDIFLegacyHeader
)
check, using their verify
function, that the EDV read from file matches
the class EDV. If they do not, the following line
assert self.edv is None or self.edv == self['edv']
returns an AssertionError. If this occurs because the VDIF EDV is not yet
supported by Baseband, support can be added by implementing a custom header
class. If the EDV is supported, but the header deviates from the format
found in the VLBI.org EDV registry, the
best solution is to create a custom header class, then override the
subclass selector in VDIFHeader
. Tutorials
for doing either can be found here.
EOFError encountered in _get_frame_rate
when reading¶
When the sample rate is not input by the user and cannot be deduced from header
information (if EDV = 1 or, the sample rate is found in the header), Baseband
tries to determine the frame rate using the private method _get_frame_rate
in VDIFStreamReader
(and then multiply by the
samples per frame to obtain the sample rate). This function raises EOFError
if the file contains less than one second of data, or is corrupt. In either
case the file can be opened still by explicitly passing in the sample rate to
open
via the sample_rate
keyword.
Reference/API¶
baseband.vdif Package¶
VLBI Data Interchange Format (VDIF) reader/writer
For the VDIF specification, see https://vlbi.org/vlbi-standards/vdif/
Classes¶
|
Representation of a VDIF data frame, consisting of a header and payload. |
|
Representation of a set of VDIF frames, combining different threads. |
|
VDIF Header, supporting different Extended Data Versions. |
|
Container for decoding and encoding VDIF payloads. |
Class Inheritance Diagram¶
baseband.vdif.header Module¶
Definitions for VLBI VDIF Headers.
Implements a VDIFHeader class used to store header words, and decode/encode the information therein.
For the VDIF specification, see https://www.vlbi.org/vdif
Classes¶
|
VDIF Header, supporting different Extended Data Versions. |
|
Base for non-legacy VDIF headers that use 8 32-bit words. |
|
Base for VDIF headers that include the sample rate (EDV= 1, 3, 4). |
|
Legacy VDIF header that uses only 4 32-bit words. |
|
VDIF Header for EDV=0. |
|
VDIF Header for EDV=1. |
|
VDIF Header for EDV=2. |
|
VDIF Header for EDV=3. |
|
Mark 5B over VDIF (EDV=0xab). |
Variables¶
Dict for storing VDIF header class definitions, indexed by their EDV. |
Class Inheritance Diagram¶
baseband.vdif.payload Module¶
Definitions for VLBI VDIF payloads.
Implements a VDIFPayload class used to store payload words, and decode to or encode from a data array.
See the VDIF specification page for payload specifications.
Functions¶
Sets up the look-up tables for levels as a function of input byte. |
|
|
|
|
Decodes data stored using 2 bits per sample. |
|
Decodes data stored using 4 bits per sample. |
|
Encodes values using 1 bit per sample, packing the result into bytes. |
|
Encodes values using 2 bits per sample, packing the result into bytes. |
|
Encodes values using 4 bits per sample, packing the result into bytes. |
Classes¶
|
Container for decoding and encoding VDIF payloads. |
Class Inheritance Diagram¶
baseband.vdif.frame Module¶
Definitions for VLBI VDIF frames and frame sets.
Implements a VDIFFrame class that can be used to hold a header and a payload, providing access to the values encoded in both. Also, define a VDIFFrameSet class that combines a set of frames from different threads.
For the VDIF specification, see https://www.vlbi.org/vdif
Classes¶
|
Representation of a VDIF data frame, consisting of a header and payload. |
|
Representation of a set of VDIF frames, combining different threads. |
Class Inheritance Diagram¶
baseband.vdif.file_info Module¶
The VDIFFileReaderInfo property.
Includes information about threads and frame sets.
Classes¶
|
Class Inheritance Diagram¶
baseband.vdif.base Module¶
Classes¶
|
Simple reader for VDIF files. |
|
Simple writer for VDIF files. |
|
Base for VDIF streams. |
|
VLBI VDIF format reader. |
|
VLBI VDIF format writer. |
Class Inheritance Diagram¶
MARK 5B¶
The Mark 5B format is the output format of the Mark 5B disk-based VLBI data system. It is described in its design specifications.
File Structure¶
Each data frame consists of a header consisting of four 32-bit words (16 bytes) followed by a payload of 2500 32-bit words (10000 bytes). The header contains a sync word, frame number, and timestamp (accurate to 1 ms), as well as user-specified data; see Sec. 1 of the design specifications for details. The payload supports \(2^n\) bit streams, for \(0 \leq n \leq 5\), and the first sample of each stream corresponds precisely to the header time. elementary samples may be 1 or 2 bits in size, with the latter being stored in two successive bit streams. The number of channels is equal to the number of bit-streams divided by the number of bits per elementary sample (Baseband currently only supports files where all bit-streams are active). Files begin at a header (unlike for Mark 4), and an integer number of frames fit within 1 second.
The Mark 5B system also outputs files with the active bit-stream mask, number of frames per second, and observational metadata (Sec. 1.3 of the design specifications). Baseband does not yet use these files, and instead requires the user specify, for example, the sample rate.
Usage¶
This section covers reading and writing Mark 5B files with Baseband; general
usage can be found under the Using Baseband section.
For situations in which one is unsure of a file’s format, Baseband features the
general baseband.open
and baseband.file_info
functions, which are also
discussed in Using Baseband. The examples below use
the small sample file baseband/data/sample.m5b
, and the numpy
,
astropy.units
, astropy.time.Time
, and baseband.mark5b
modules:
>>> import numpy as np
>>> import astropy.units as u
>>> from astropy.time import Time
>>> from baseband import mark5b
>>> from baseband.data import SAMPLE_MARK5B
Opening a Mark 5B file with open
in binary mode provides a
normal file reader extended with methods to read a
Mark5BFrame
. The number of channels, kiloday (thousands of
MJD) and number of bits per sample must all be passed when using
read_frame
:
>>> fb = mark5b.open(SAMPLE_MARK5B, 'rb', kday=56000, nchan=8)
>>> frame = fb.read_frame()
>>> frame.shape
(5000, 8)
>>> fb.close()
Our sample file has 2-bit component samples, which is also the default
for read_frame
, so it does not need to
be passed. Also, we may pass a reference Time
object within
500 days of the observation start time to ref_time
, rather than kday
.
Opening as a stream wraps the low-level routines such that reading and writing
is in units of samples. It also provides access to header information. Here,
we also must provide nchan
, sample_rate
, and ref_time
or kday
:
>>> fh = mark5b.open(SAMPLE_MARK5B, 'rs', sample_rate=32*u.MHz, nchan=8,
... ref_time=Time('2014-06-13 12:00:00'))
>>> fh
<Mark5BStreamReader name=... offset=0
sample_rate=32.0 MHz, samples_per_frame=5000,
sample_shape=SampleShape(nchan=8), bps=2,
start_time=2014-06-13T05:30:01.000000000>
>>> header0 = fh.header0 # To be used for writing, below.
>>> d = fh.read(10000)
>>> d.shape
(10000, 8)
>>> d[0, :3]
array([-3.316505, -1. , 1. ], dtype=float32)
>>> fh.close()
When writing to file, we again need to pass in sample_rate
and nchan
,
though time can either be passed explicitly or inferred from the header:
>>> fw = mark5b.open('test.m5b', 'ws', header0=header0,
... sample_rate=32*u.MHz, nchan=8)
>>> fw.write(d)
>>> fw.close()
>>> fh = mark5b.open('test.m5b', 'rs', sample_rate=32*u.MHz,
... kday=57000, nchan=8)
>>> np.all(fh.read() == d)
True
>>> fh.close()
Reference/API¶
baseband.mark5b Package¶
Mark5B VLBI data reader.
Code inspired by Walter Brisken’s mark5access. See https://github.com/demorest/mark5access.
Also, for the Mark5B design, see https://www.haystack.mit.edu/tech/vlbi/mark5/mark5_memos/019.pdf
Classes¶
|
Representation of a Mark 5B frame, consisting of a header and payload. |
|
Decoder/encoder of a Mark5B Frame Header. |
|
Container for decoding and encoding VDIF payloads. |
Class Inheritance Diagram¶
baseband.mark5b.header Module¶
Definitions for VLBI Mark5B Headers.
Implements a Mark5BHeader class used to store header words, and decode/encode the information therein.
For the specification, see https://www.haystack.mit.edu/tech/vlbi/mark5/docs/Mark%205B%20users%20manual.pdf
Classes¶
|
Decoder/encoder of a Mark5B Frame Header. |
Variables¶
CRC polynomial used for Mark 5B Headers, as a check on the time code. |
|
|
Cyclic Redundancy Check. |
Class Inheritance Diagram¶
baseband.mark5b.payload Module¶
Definitions for VLBI Mark 5B payloads.
Implements a Mark5BPayload class used to store payload words, and decode to or encode from a data array.
For the specification, see https://www.haystack.mit.edu/tech/vlbi/mark5/docs/Mark%205B%20users%20manual.pdf
Functions¶
Set up the look-up tables for levels as a function of input byte. |
|
|
|
|
|
|
Encodes values using 1 bit per sample, packing the result into bytes. |
|
Generic encoder for data stored using two bits. |
Classes¶
|
Container for decoding and encoding VDIF payloads. |
Class Inheritance Diagram¶
baseband.mark5b.frame Module¶
Definitions for VLBI Mark 5B frames.
Implements a Mark5BFrame class that can be used to hold a header and a payload, providing access to the values encoded in both.
For the specification, see https://www.haystack.mit.edu/tech/vlbi/mark5/docs/Mark%205B%20users%20manual.pdf
Classes¶
|
Representation of a Mark 5B frame, consisting of a header and payload. |
Class Inheritance Diagram¶
baseband.mark5b.file_info Module¶
The Mark5BFileReaderInfo property.
Includes information about what is needed to calcuate times.
Classes¶
|
Class Inheritance Diagram¶
baseband.mark5b.base Module¶
Classes¶
|
Simple reader for Mark 5B files. |
|
Simple writer for Mark 5B files. |
|
Base for Mark 5B streams. |
|
VLBI Mark 5B format reader. |
|
VLBI Mark 5B format writer. |
Class Inheritance Diagram¶
MARK 4¶
The Mark 4 format is the output format of the MIT Haystack Observatory’s Mark 4 VLBI magnetic tape-based data acquisition system, and one output format of its successor, the Mark 5A hard drive-based system. The format’s specification is in the Mark IIIA/IV/VLBA design specifications.
Baseband currently only supports files that have been parity-stripped and corrected for barrel roll and data modulation.
File Structure¶
Mark 4 files contain up to 64 concurrent data “tracks”. Tracks are divided into 22500-bit “tape frames”, each of which consists of a 160-bit header followed by a 19840-bit payload. The header includes a timestamp (accurate to 1.25 ms), track ID, sideband, and fan-out/in factor (see below); the details of these can be found in 2.1.1 - 2.1.3 in the design specifications. The payload consists of a 1-bit stream. When recording 2-bit elementary samples, the data is split into two tracks, with one carrying the sign bit, and the other the magnitude bit.
The header takes the place of the first 160 bits of payload data, so that the
first sample occurs fanout * 160
sample times after the header time. This
means that a Mark 4 stream is not contiguous in time. The length of
one frame ranges from 1.25 ms to 160 ms in octave steps (which ensures an
integer number of frames falls within 1 minute), setting the maximum sample
rate per track to 18 megabits/track/s.
Data from a single channel may be distributed to multiple tracks - “fan-out” - or multiple channels fed to one track - “fan-in”. Fan-out is used when sampling at rates higher than 18 megabits/track/s. Baseband currently only supports tracks using fan-out (“longitudinal data format”).
Baseband reconstructs the tracks into channels (reconstituting 2-bit data from two tracks into a single channel if necessary) and combines tape frame headers into a single data frame header.
Usage¶
This section covers reading and writing Mark 4 files with Baseband; general
usage can be found under the Using Baseband section.
For situations in which one is unsure of a file’s format, Baseband features the
general baseband.open
and baseband.file_info
functions, which are also
discussed in Using Baseband. The examples below use
the small sample file baseband/data/sample.m4
, and the numpy
,
astropy.units
, astropy.time.Time
, and baseband.mark4
modules:
>>> import numpy as np
>>> import astropy.units as u
>>> from astropy.time import Time
>>> from baseband import mark4
>>> from baseband.data import SAMPLE_MARK4
Opening a Mark 4 file with open
in binary mode provides
a normal file reader but extended with methods to read a
Mark4Frame
. Mark 4 files generally do not start (or end)
at a frame boundary, so in binary mode one has to find the first
header using find_header
(which will
also determine the number of Mark 4 tracks, if not given explicitly). Since
Mark 4 files do not store the full time information, one must pass either the
the decade the data was taken, or an equivalent reference Time
object:
>>> fb = mark4.open(SAMPLE_MARK4, 'rb', decade=2010)
>>> fb.find_header() # Locate first header and determine ntrack.
<Mark4Header bcd_headstack1: [0x3344]*64,
bcd_headstack2: [0x1122]*64,
headstack_id: [0, ..., 1],
bcd_track_id: [0x2, ..., 0x33],
fan_out: [0, ..., 3],
magnitude_bit: [False, ..., True],
lsb_output: [True]*64,
converter_id: [0, ..., 7],
time_sync_error: [False]*64,
internal_clock_error: [False]*64,
processor_time_out_error: [False]*64,
communication_error: [False]*64,
_1_11_1: [False]*64,
_1_10_1: [False]*64,
track_roll_enabled: [False]*64,
sequence_suspended: [False]*64,
system_id: [108]*64,
_1_0_1_sync: [False]*64,
sync_pattern: [0xffffffff]*64,
bcd_unit_year: [0x4]*64,
bcd_day: [0x167]*64,
bcd_hour: [0x7]*64,
bcd_minute: [0x38]*64,
bcd_second: [0x12]*64,
bcd_fraction: [0x475]*64,
crc: [0xea6, ..., 0x212]>
>>> fb.ntrack
64
>>> fb.tell()
2696
>>> frame = fb.read_frame()
>>> frame.shape
(80000, 8)
>>> frame.header.time
<Time object: scale='utc' format='yday' value=2014:167:07:38:12.47500>
>>> fb.close()
Opening in stream mode automatically finds the first frame, and wraps the
low-level routines such that reading and writing is in units of samples. It
also provides access to header information. Here we pass a reference
Time
object within 4 years of the observation start time to
ref_time
, rather than a decade
:
>>> fh = mark4.open(SAMPLE_MARK4, 'rs', ref_time=Time('2013:100:23:00:00'))
>>> fh
<Mark4StreamReader name=... offset=0
sample_rate=32.0 MHz, samples_per_frame=80000,
sample_shape=SampleShape(nchan=8), bps=2,
start_time=2014-06-16T07:38:12.47500>
>>> d = fh.read(6400)
>>> d.shape
(6400, 8)
>>> d[635:645, 0].astype(int) # first channel
array([ 0, 0, 0, 0, 0, -1, 1, 3, 1, -1])
>>> fh.close()
As mentioned in the File Structure section, because the header
takes the place of the first 160 samples of each track, the first payload
sample occurs fanout * 160
sample times after the header time. The stream
reader includes these overwritten samples as invalid data (zeros, by default):
>>> np.array_equal(d[:640], np.zeros((640,) + d.shape[1:]))
True
When writing to file, we need to pass in the sample rate in addition
to decade
. The number of tracks can be inferred from the header:
>>> fw = mark4.open('sample_mark4_segment.m4', 'ws', header0=frame.header,
... sample_rate=32*u.MHz, decade=2010)
>>> fw.write(frame.data)
>>> fw.close()
>>> fh = mark4.open('sample_mark4_segment.m4', 'rs',
... sample_rate=32.*u.MHz, decade=2010)
>>> np.all(fh.read(80000) == frame.data)
True
>>> fh.close()
Note that above we had to pass in the sample rate even when opening the file for reading; this is because there is only a single frame in the file, and hence the sample rate cannot be inferred automatically.
Reference/API¶
baseband.mark4 Package¶
Mark 4 VLBI data reader.
Code inspired by Walter Brisken’s mark5access. See https://github.com/demorest/mark5access.
The format itself is described in detail in https://www.haystack.mit.edu/tech/vlbi/mark5/docs/230.3.pdf
Classes¶
|
Representation of a Mark 4 frame, consisting of a header and payload. |
|
Decoder/encoder of a Mark 4 Header, containing all streams. |
|
Container for decoding and encoding Mark 4 payloads. |
Class Inheritance Diagram¶
baseband.mark4.header Module¶
Definitions for VLBI Mark 4 Headers.
Implements a Mark4Header class used to store header words, and decode/encode the information therein.
For the specification of tape Mark 4 format, see https://www.haystack.mit.edu/tech/vlbi/mark5/docs/230.3.pdf
A little bit on the disk representation is at https://ui.adsabs.harvard.edu/abs/2003ASPC..306..123W
Functions¶
|
Convert a stream of integers to uint32 header words. |
|
Convert a set of uint32 header words to a stream of integers. |
Classes¶
|
Decoder/encoder of a Mark 4 Track Header. |
|
Decoder/encoder of a Mark 4 Header, containing all streams. |
Variables¶
CRC polynomial used for Mark 4 Headers. |
|
|
Cyclic Redundancy Check for a bitstream. |
Class Inheritance Diagram¶
baseband.mark4.payload Module¶
Definitions for VLBI Mark 4 payloads.
Implements a Mark4Payload class used to store payload words, and decode to or encode from a data array.
For the specification, see https://www.haystack.mit.edu/tech/vlbi/mark5/docs/230.3.pdf
Functions¶
|
Reorder 32-track bits to bring signs & magnitudes together. |
|
Reorder 64-track bits to bring signs & magnitudes together. |
Set up the look-up tables for levels as a function of input byte. |
|
|
Decode payload for 8 channels using 2 bits, fan-out 4 (64 tracks). |
|
Encode payload for 8 channels using 2 bits, fan-out 4 (64 tracks). |
Classes¶
|
Container for decoding and encoding Mark 4 payloads. |
Class Inheritance Diagram¶
baseband.mark4.frame Module¶
Definitions for VLBI Mark 4 payloads.
Implements a Mark4Payload class used to store payload words, and decode to or encode from a data array.
For the specification, see https://www.haystack.mit.edu/tech/vlbi/mark5/docs/230.3.pdf
Classes¶
|
Representation of a Mark 4 frame, consisting of a header and payload. |
Class Inheritance Diagram¶
baseband.mark4.file_info Module¶
The Mark4FileReaderInfo property.
Includes information about what is needed to calcuate times, number of tracks and offset of first header.
Classes¶
|
Standardized information on Mark 4 file readers. |
Class Inheritance Diagram¶
baseband.mark4.base Module¶
Classes¶
|
Simple reader for Mark 4 files. |
|
Simple writer for Mark 4 files. |
|
Base for Mark 4 streams. |
|
VLBI Mark 4 format reader. |
|
VLBI Mark 4 format writer. |
Class Inheritance Diagram¶
DADA¶
Distributed Acquisition and Data Analysis (DADA) format data files contain a single data frame consisting of an ASCII header of typically 4096 bytes followed by a payload. DADA is defined by its software specification and actual usage; files are described by an ASCII header.
Usage¶
This section covers reading and writing DADA files with Baseband; general usage
is covered in the Using Baseband section. For
situations in which one is unsure of a file’s format, Baseband features the
general baseband.open
and baseband.file_info
functions, which are also
discussed in Using Baseband. The examples below use
the sample file baseband/data/sample.dada
, and the the astropy.units
and
baseband.dada
modules:
>>> from baseband import dada
>>> import astropy.units as u
>>> from baseband.data import SAMPLE_DADA
Single files can be opened with open
in binary mode. DADA
files typically consist of just a single header and payload, and can be
read into a single DADAFrame
.
>>> fb = dada.open(SAMPLE_DADA, 'rb')
>>> frame = fb.read_frame()
>>> frame.shape
(16000, 2, 1)
>>> frame[:3].squeeze()
array([[ -38.-38.j, -38.-38.j],
[ -38.-38.j, -40. +0.j],
[-105.+60.j, 85.-15.j]], dtype=complex64)
>>> fb.close()
Since the files can be quite large, the payload is mapped (with
numpy.memmap
), so that if one accesses part of the data, only the
corresponding parts of the encoded payload are loaded into memory (since the
sample file is encoded using 8 bits, the above example thus loads 12 bytes into
memory).
Opening in stream mode wraps the low-level routines such that reading and writing is in units of samples, and provides access to header information:
>>> fh = dada.open(SAMPLE_DADA, 'rs')
>>> fh
<DADAStreamReader name=... offset=0
sample_rate=16.0 MHz, samples_per_frame=16000,
sample_shape=SampleShape(npol=2), bps=8,
start_time=2013-07-02T01:39:20.000>
>>> d = fh.read(10000)
>>> d.shape
(10000, 2)
>>> d[:3]
array([[ -38.-38.j, -38.-38.j],
[ -38.-38.j, -40. +0.j],
[-105.+60.j, 85.-15.j]], dtype=complex64)
>>> fh.close()
To set up a file for writing as a stream is possible as well:
>>> from astropy.time import Time
>>> fw = dada.open('{utc_start}.{obs_offset:016d}.000000.dada', 'ws',
... sample_rate=16*u.MHz, samples_per_frame=5000,
... npol=2, nchan=1, bps=8, complex_data=True,
... time=Time('2013-07-02T01:39:20.000'))
>>> fw.write(d)
>>> fw.close()
>>> import os
>>> [f for f in sorted(os.listdir('.')) if f.startswith('2013')]
['2013-07-02-01:39:20.0000000000000000.000000.dada',
'2013-07-02-01:39:20.0000000000020000.000000.dada']
>>> fr = dada.open('2013-07-02-01:39:20.{obs_offset:016d}.000000.dada', 'rs')
>>> d2 = fr.read()
>>> (d == d2).all()
True
>>> fr.close()
Here, we have used an even smaller size of the payload, to show how one can
define multiple files. DADA data are typically stored in sequences of files.
If one passes a time-ordered list or tuple of filenames to
open
, it uses sequentialfile.open
to access the sequence.
If, as above, one passes a template string, open
uses
DADAFileNameSequencer
to create and use a filename
sequencer. (See API links for further details.)
Further details¶
DADA Headers¶
The specification of “Distributed Acquisition and Data Analysis”
(DADA) headers is part of the DADA software specification. In
particular, its appendix B.3 defines expected header keywords, which
we reproduce below. We separate those for which the meaning has been
taken from comments in an actual DADA header
from Effelsberg, as well as additional keywords found in that header
that do not appear in the specification.
Keyword |
Description |
---|---|
Primary (from appendix B.3 [Default]) |
|
HEADER |
name of the header [DADA] |
HDR_VERSION |
version of the header [1.0] |
HDR_SIZE |
size of the header in bytes [4096] |
INSTRUMENT |
name of the instrument |
PRIMARY |
host name of the primary node on which the data were acquired |
HOSTNAME |
host name of the machine on which data were written |
FILE_NAME |
full path of the file to which data were written |
FILE_SIZE |
requested size of data files |
FILE_NUMBER |
number of data files written prior to this one |
OBS_ID |
the identifier for the observations |
UTC_START |
rising edge of the first sample (yyyy-mm-dd-hh:mm:ss) |
MJD_START |
the MJD of the first sample in the observation |
OBS_OFFSET |
the number of bytes from the start of the observation |
OBS_OVERLAP |
the amount by which neighbouring files overlap |
Secondary (description from Effelsberg sample file) |
|
TELESCOPE |
name of the telescope |
SOURCE |
source name |
FREQ |
observation frequency |
BW |
bandwidth in MHz (-ve lower sb) |
NPOL |
number of polarizations observed |
NBIT |
number of bits per sample |
NDIM |
dimension of samples (2=complex, 1=real) |
TSAMP |
sampling interval in microseconds |
RA |
J2000 Right ascension of the source (hh:mm:ss.ss) |
DEC |
J2000 Declination of the source (ddd:mm:ss.s) |
Other (found in Effelsberg sample file) |
|
PIC_VERSION |
Version of the PIC FPGA Software [1.0] |
RECEIVER |
frontend receiver |
SECONDARY |
secondary host name |
NCHAN |
number of channels here |
RESOLUTION |
a parameter that is unclear |
DSB |
(no description) |
Reference/API¶
baseband.dada Package¶
Distributed Acquisition and Data Analysis (DADA) format reader/writer.
Classes¶
|
Representation of a DADA file, consisting of a header and payload. |
|
DADA baseband file format header. |
|
Container for decoding and encoding DADA payloads. |
Class Inheritance Diagram¶
baseband.dada.header Module¶
Definitions for DADA pulsar baseband headers.
Implements a DADAHeader class used to store header definitions in a FITS header, and read & write these from files.
The DADA headers are described in the DADA software specification, at http://psrdada.sourceforge.net/manuals/Specification.pdf
See also DADA Headers.
Classes¶
|
DADA baseband file format header. |
Class Inheritance Diagram¶
baseband.dada.payload Module¶
Payload for DADA format.
Classes¶
|
Container for decoding and encoding DADA payloads. |
Class Inheritance Diagram¶
baseband.dada.frame Module¶
Classes¶
|
Representation of a DADA file, consisting of a header and payload. |
Class Inheritance Diagram¶
baseband.dada.base Module¶
Classes¶
|
List-like generator of DADA filenames using a template. |
|
Simple reader for DADA files. |
|
Simple writer/mapper for DADA files. |
|
Base for DADA streams. |
|
DADA format reader. |
|
DADA format writer. |
Class Inheritance Diagram¶
GUPPI¶
The GUPPI format is the output of the Green Bank Ultimate Pulsar Processing Instrument and any clones operating at other telescopes, such as PUPPI at the Arecibo Observatory. Baseband specifically supports GUPPI data taken in baseband mode, and is based off of DSPSR’s implementation. While general format specifications can be found on Paul Demorest’s site, some of the header information could be invalid or not applicable, particularly with older files.
Baseband currently only supports 8-bit elementary samples.
File Structure¶
Each GUPPI file contains multiple (typically 128) frames, with each frame consisting of an ASCII header composed of 80-character entries, followed by a binary payload (or “block”). The header’s length is variable, but always ends with “END” followed by 77 spaces.
How samples are stored in the payload depends on whether or not it is channels-first. A channels-first payload stores each channel’s stream in a contiguous data block, while a non-channels-first one groups the components of a complete sample together (like with other formats). In either case, for each channel polarization samples from the same point in time are stored adjacent to one another. At the end of each channel’s data is a section of overlap samples identical to the first samples in the next payload. Baseband retains these redundant samples when reading individual GUPPI frames, but removes them when reading files as a stream.
Usage¶
This section covers reading and writing GUPPI files with Baseband; general
usage is covered in the Using Baseband section. For
situations in which one is unsure of a file’s format, Baseband features the
general baseband.open
and baseband.file_info
functions, which are also
discussed in Using Baseband. The examples below use
the sample PUPPI file baseband/data/sample_puppi.raw
, and the the
astropy.units
and baseband.guppi
modules:
>>> from baseband import guppi
>>> import astropy.units as u
>>> from baseband.data import SAMPLE_PUPPI
Single files can be opened with open
in binary mode, which
provides a normal file reader, but extended with methods to read a
GUPPIFrame
:
>>> fb = guppi.open(SAMPLE_PUPPI, 'rb')
>>> frame = fb.read_frame()
>>> frame.shape
(1024, 2, 4)
>>> frame[:3, 0, 1]
array([-32.-10.j, -15.-14.j, 9.-13.j], dtype=complex64)
>>> fb.close()
Since the files can be quite large, the payload is mapped (with
numpy.memmap
), so that if one accesses part of the data, only the
corresponding parts of the encoded payload are loaded into memory (since the
sample file is encoded using 8 bits, the above example thus loads 6 bytes into
memory).
Opening in stream mode wraps the low-level routines such that reading and writing is in units of samples, and provides access to header information:
>>> fh = guppi.open(SAMPLE_PUPPI, 'rs')
>>> fh
<GUPPIStreamReader name=... offset=0
sample_rate=250.0 Hz, samples_per_frame=960,
sample_shape=SampleShape(npol=2, nchan=4), bps=8,
start_time=2018-01-14T14:11:33.000>
>>> d = fh.read()
>>> d.shape
(3840, 2, 4)
>>> d[:3, 0, 1]
array([-32.-10.j, -15.-14.j, 9.-13.j], dtype=complex64)
>>> fh.close()
Note that fh.samples_per_frame
represents the number of samples per frame
excluding overlap samples, since the stream reader works on a linearly
increasing sequence of samples. Frames themselves have access to the overlap,
and fh.header0.samples_per_frame
returns the number of samples per frame
including overlap.
To set up a file for writing as a stream is possible as well. Overlap must be
zero when writing (so we set samples_per_frame
to its stream reader value
from above):
>>> from astropy.time import Time
>>> fw = guppi.open('puppi_test.{file_nr:04d}.raw', 'ws',
... frames_per_file=2, sample_rate=250*u.Hz,
... samples_per_frame=960, pktsize=1024,
... time=Time(58132.59135416667, format='mjd'),
... npol=2, nchan=4)
>>> fw.write(d)
>>> fw.close()
>>> fr = guppi.open('puppi_test.{file_nr:04d}.raw', 'rs')
>>> d2 = fr.read()
>>> (d == d2).all()
True
>>> fr.close()
Here we show how to write a sequence of files by passing a string template
to open
, which prompts it to create and use a filename
sequencer generated with GUPPIFileNameSequencer
. One
may also pass a time-ordered list or tuple of filenames to
open
. Unlike when writing DADA files, which have one frame
per file, we specify the number of frames in one file using``frames_per_file``.
Note that typically one does not have to pass PKTSIZE
, the UDP data packet
size (set by the observing mode), but the sample file has small enough frames
that the default of 8192 bytes is too large. Baseband only uses PKTSIZE
to
double-check the sample offset of the frame, so PKTSIZE
must be set to a
value such that each payload, excluding overlap samples, contains an integer
number of packets. (See API links for further details on how to read and
write file sequences.)
Reference/API¶
baseband.guppi Package¶
Green Bank Ultimate Pulsar Processing Instrument (GUPPI) format reader/writer.
Classes¶
|
Representation of a GUPPI file, consisting of a header and payload. |
|
GUPPI baseband file format header. |
|
Container for decoding and encoding GUPPI payloads. |
Class Inheritance Diagram¶
baseband.guppi.header Module¶
Definitions for GUPPI headers.
Implements a GUPPIHeader class that reads & writes FITS-like headers from file.
Classes¶
|
GUPPI baseband file format header. |
Class Inheritance Diagram¶
baseband.guppi.payload Module¶
Payload for GUPPI format.
Classes¶
|
Container for decoding and encoding GUPPI payloads. |
Class Inheritance Diagram¶
baseband.guppi.frame Module¶
Classes¶
|
Representation of a GUPPI file, consisting of a header and payload. |
Class Inheritance Diagram¶
baseband.guppi.file_info Module¶
The GuppiFileReaderInfo property.
Overrides what can be gotten from the first header.
Classes¶
|
Class Inheritance Diagram¶
baseband.guppi.base Module¶
Classes¶
|
List-like generator of GUPPI filenames using a template. |
|
Simple reader for GUPPI files. |
|
Simple writer/mapper for GUPPI files. |
|
Base for GUPPI streams. |
|
GUPPI format reader. |
|
GUPPI format writer. |
Class Inheritance Diagram¶
GSB¶
The GMRT software backend (GSB) file format is the standard output of the initial correlator of the Giant Metrewave Radio Telescope (GMRT). The GSB design is described by Roy et al. (2010, Exper. Astron. 28:25-60) with further specifications and operating procedures given on the relevant GMRT/GSB pages.
File Structure¶
A GSB dataset consists of an ASCII file with a sequence of headers, and one or more accompanying binary data files. Each line in the header and its corresponding data comprise a data frame, though these do not have explicit divisions in the data files.
Baseband currently supports two forms of GSB data: rawdump, for storing real-valued raw voltage timestreams, and phased, for storing complex pre-channelized data from the GMRT in phased array baseband mode.
Data in rawdump format is stored in a binary file representing the voltage stream from one polarization of a single dish. Each such file is accompanied by a header file which contains GPS timestamps, in the form:
YYYY MM DD HH MM SS 0.SSSSSSSSS
In the default rawdump observing setup, samples are recorded at a rate of 33.3333… megasamples per second (Msps). Each sample is 4 bits in size, and two samples are grouped into bytes such that the oldest sample occupies the least significant bit. Each frame consists of 4 megabytes of data, or \(2^{23}\), samples; as such, the timespan of one frame is exactly 0.25165824 s.
Data in phased format is normally spread over four binary files and one accompanying header file. The binary files come in two pairs, one for each polarization, with the pair contain the first and second half of the data of each frame.
When recording GSB in phased array voltage beam (ie. baseband) mode, the “raw”, or pre-channelized, sample rate is either 33.3333… Msps at 8 bits per sample or 66.6666… Msps at 4 bits per sample (in the latter case, sample bit-ordering is the same as for rawdump). Channelization via fast Fourier transform sets the channelized complete sample rate to the raw rate divided by \(2N_\mathrm{F}\), where \(N_\mathrm{F}\) is the number of Fourier channels (either 256 or 512). The timespan of one frame is 0.25165824 s, and one frame is 8 megabytes in size, for either raw sample rate.
The phased header’s structure is:
<PC TIME> <GPS TIME> <SEQ NUMBER> <MEM BLOCK>
where <PC TIME>
and <GPS TIME>
are the less accurate computer-based
and exact GPS-based timestamps, respectively, with the same format as the
rawdump timestamp; <SEQ NUMBER>
is the frame number; and <MEM BLOCK>
a redundant modulo-8 shared memory block number.
Usage Notes¶
This section covers reading and writing GSB files with Baseband; general usage
is covered in the Using Baseband section. While
Baseband features the general baseband.open
and baseband.file_info
functions, these cannot read GSB binary files without the accompanying
timestamp file (at which point it is obvious the files are GSB).
baseband.file_info
, however, can be used on the timestamp file to determine
if it is in rawdump or phased format.
The examples below use the samplefiles in the baseband/data/gsb/
directory,
and the numpy
, astropy.units
and baseband.gsb
modules:
>>> import numpy as np
>>> import astropy.units as u
>>> from baseband import gsb
>>> from baseband.data import (
... SAMPLE_GSB_RAWDUMP, SAMPLE_GSB_RAWDUMP_HEADER,
... SAMPLE_GSB_PHASED, SAMPLE_GSB_PHASED_HEADER)
A single timestamp file can be opened with open
in text
mode:
>>> ft = gsb.open(SAMPLE_GSB_RAWDUMP_HEADER, 'rt')
>>> ft.read_timestamp()
<GSBRawdumpHeader gps: 2015 04 27 18 45 00 0.000000240>
>>> ft.close()
Reading payloads requires the samples per frame or sample rate. For phased the sample rate is:
sample_rate = raw_sample_rate / (2 * nchan)
where the raw sample rate is the pre-channelized one, and nchan
the number
of Fourier channels. The samples per frame for both rawdump and phased is:
samples_per_frame = timespan_of_frame * sample_rate
Note
Since the number of samples per frame is an integer number while
both the frame timespan and sample rate are not, it is better to separately
caculate samples_per_frame
rather than multiplying
timespan_of_frame
with sample_rate
in order to avoid rounding
issues.
Alternatively, if the size of the frame buffer and the frame rate are known, the
former can be used to determine samples_per_frame
, and the latter used to
determine sample_rate
by inverting the above equation.
If samples_per_frame
is not given, Baseband assumes it is the equivalent of
4 megabytes of data for rawdump, or 8 megabytes if phased. If sample_rate
is not given, it is calculated from samples_per_frame
assuming
timespan_of_frame = 0.25165824
(see File Structure above).
A single payload file can be opened with open
in binary mode.
Here, for our sample file, we have to take into account that in order to keep
these files small, their sample size has been reduced to only 4 or 8
kilobytes worth of samples per frame (for the default timespan). So, we
define their sample rate here, and use that to calculate payload_nbytes
,
the size of one frame in bytes. Since rawdump samples are 4 bits,
payload_nbytes
is just samples_per_frame / 2
:
>>> rawdump_samples_per_frame = 2**13
>>> payload_nbytes = rawdump_samples_per_frame // 2
>>> fb = gsb.open(SAMPLE_GSB_RAWDUMP, 'rb', payload_nbytes=payload_nbytes,
... nchan=1, bps=4, complex_data=False)
>>> payload = fb.read_payload()
>>> payload[:4]
array([[ 0.],
[-2.],
[-2.],
[ 0.]], dtype=float32)
>>> fb.close()
(payload_nbytes
for phased data is the size of one frame divided by the
number of binary files.)
Opening in stream mode allows timestamp and binary files to be read in concert to create data frames, and also wraps the low-level routines such that reading and writing is in units of samples, and provides access to header information.
When opening a rawdump file in stream mode, we pass the timestamp file as the
first argument, and the binary file to the raw
keyword. As per above, we
also pass samples_per_frame
:
>>> fh_rd = gsb.open(SAMPLE_GSB_RAWDUMP_HEADER, mode='rs',
... raw=SAMPLE_GSB_RAWDUMP,
... samples_per_frame=rawdump_samples_per_frame)
>>> fh_rd.header0
<GSBRawdumpHeader gps: 2015 04 27 18 45 00 0.000000240>
>>> dr = fh_rd.read()
>>> dr.shape
(81920,)
>>> dr[:3]
array([ 0., -2., -2.], dtype=float32)
>>> fh_rd.close()
To open a phased fileset in stream mode, we package the binary files into a nested tuple with the format:
((L pol stream 1, L pol stream 2), (R pol stream 1, R pol stream 2))
The nested tuple is passed to raw
(note that we again have to pass a
non-default sample rate):
>>> phased_samples_per_frame = 2**3
>>> fh_ph = gsb.open(SAMPLE_GSB_PHASED_HEADER, mode='rs',
... raw=SAMPLE_GSB_PHASED,
... samples_per_frame=phased_samples_per_frame)
>>> header0 = fh_ph.header0 # To be used for writing, below.
>>> dp = fh_ph.read()
>>> dp.shape
(80, 2, 512)
>>> dp[0, 0, :3]
array([30.+12.j, -1. +8.j, 7.+19.j], dtype=complex64)
>>> fh_ph.close()
To set up a file for writing, we need to pass names for both
timestamp and raw files, as well as sample_rate
, samples_per_frame
, and
either the first header or a time
object. We first calculate
sample_rate
:
>>> timespan = 0.25165824 * u.s
>>> rawdump_sample_rate = (rawdump_samples_per_frame / timespan).to(u.MHz)
>>> phased_sample_rate = (phased_samples_per_frame / timespan).to(u.MHz)
To write a rawdump file:
>>> from astropy.time import Time
>>> fw_rd = gsb.open('test_rawdump.timestamp',
... mode='ws', raw='test_rawdump.dat',
... sample_rate=rawdump_sample_rate,
... samples_per_frame=rawdump_samples_per_frame,
... time=Time('2015-04-27T13:15:00'))
>>> fw_rd.write(dr)
>>> fw_rd.close()
>>> fh_rd = gsb.open('test_rawdump.timestamp', mode='rs',
... raw='test_rawdump.dat',
... sample_rate=rawdump_sample_rate,
... samples_per_frame=rawdump_samples_per_frame)
>>> np.all(dr == fh_rd.read())
True
>>> fh_rd.close()
To write a phased file, we need to pass a nested tuple of filenames or filehandles:
>>> test_phased_bin = (('test_phased_pL1.dat', 'test_phased_pL2.dat'),
... ('test_phased_pR1.dat', 'test_phased_pR2.dat'))
>>> fw_ph = gsb.open('test_phased.timestamp',
... mode='ws', raw=test_phased_bin,
... sample_rate=phased_sample_rate,
... samples_per_frame=phased_samples_per_frame,
... header0=header0)
>>> fw_ph.write(dp)
>>> fw_ph.close()
>>> fh_ph = gsb.open('test_phased.timestamp', mode='rs',
... raw=test_phased_bin,
... sample_rate=phased_sample_rate,
... samples_per_frame=phased_samples_per_frame)
>>> np.all(dp == fh_ph.read())
True
>>> fh_ph.close()
Baseband does not use the PC time in the phased header, and, when writing,
simply uses the same time for both GPS and PC times. Since the PC time can
drift from the GPS time by several tens of milliseconds,
test_phased.timestamp
will not be identical to SAMPLE_GSB_PHASED
, even
though we have written the exact same data to file.
Reference/API¶
baseband.gsb Package¶
GMRT Software Backend (GSB) data reader.
See http://gmrt.ncra.tifr.res.in/gmrt_hpage/sub_system/gmrt_gsb/index.htm
Classes¶
|
Frame encapsulating GSB rawdump or phased data. |
|
GSB Header, based on a line from a timestamp file. |
|
Container for decoding and encoding GSB payloads. |
Class Inheritance Diagram¶
baseband.gsb.header Module¶
Definitions for GSB Headers, using the timestamp files.
Somewhat out of data description for phased data: http://gmrt.ncra.tifr.res.in/gmrt_hpage/sub_system/gmrt_gsb/GSB_beam_timestamp_note_v1.pdf and for rawdump data http://gmrt.ncra.tifr.res.in/gmrt_hpage/sub_system/gmrt_gsb/GSB_rawdump_data_format_v2.pdf
Classes¶
|
GSB header date-time format |
|
GSB Header, based on a line from a timestamp file. |
|
GSB rawdump header. |
|
GSB phased header. |
Class Inheritance Diagram¶
baseband.gsb.payload Module¶
Definitions for GSB payloads.
Implements a GSBPayload class used to store payload blocks, and decode to or encode from a data array.
See http://gmrt.ncra.tifr.res.in/gmrt_hpage/sub_system/gmrt_gsb/index.htm
Classes¶
|
Container for decoding and encoding GSB payloads. |
Class Inheritance Diagram¶
baseband.gsb.base Module¶
Classes¶
|
Simple reader/writer for GSB time stamp files. |
|
Simple reader for GSB data files. |
|
Simple writer for GSB data files. |
|
Base for GSB streams. |
|
GSB format reader. |
|
GSB format writer. |
Class Inheritance Diagram¶
Core Framework and Utilities¶
These sections contain APIs and usage notes for the sequential file opener,
the API for the set of core utility functions and classes located in
vlbi_base
, and sample data that come with baseband (mostly
used for testing).
Baseband Helpers¶
Helpers assist with reading and writing all file formats. Currently,
they only include the sequentialfile
module
for reading a sequence of files as a single one.
Sequential File¶
The sequentialfile
module is for reading from and writing
to a sequence of files as if they were a single, contiguous one. Like with
file formats, there is a master sequentialfile.open
function to open sequences either
for reading or writing. It returns sequential file objects that have read
,
write
, seek
, tell
, and close
methods that work identically to
their single file object counterparts. They additionally have memmap
methods to read or write to files through numpy.memmap
.
It is usually unnecessary to directly access sequentialfile
,
since it is used by baseband.open
and all format openers (except GSB)
whenever a sequence of files is passed - see the Using Baseband
documentation for details. For finer control of
file opening, however, one may manually create a
sequentialfile
object, then pass it to an opener.
To illustrate, we rewrite the multi-file example from Using Baseband. We first load the required data:
>>> from baseband import vdif
>>> from baseband.data import SAMPLE_VDIF
>>> import numpy as np
>>> fh = vdif.open(SAMPLE_VDIF, 'rs')
>>> d = fh.read()
We now create a sequence of filenames and calculate the byte size per file,
then pass these to open
:
>>> from baseband.helpers import sequentialfile as sf
>>> filenames = ["seqvdif_{0}".format(i) for i in range(2)]
>>> file_size = fh.fh_raw.seek(0, 2) // 2
>>> fwr = sf.open(filenames, mode='w+b', file_size=file_size)
The first argument passed to open
must be a time-ordered sequence of
filenames in a list, tuple, or other container that returns IndexError
when
the index is out of bounds. The read mode is ‘w+b’ (a requirement of all
format openers just in case they use numpy.memmap
), and file_size
determines the largest size a file may reach before the next one in the
sequence is opened for writing. We set file_size
such that each file holds
exactly one frameset.
To write the data, we pass fwr
to vdif.open
:
>>> fw = vdif.open(fwr, 'ws', header0=fh.header0,
... sample_rate=fh.sample_rate,
... nthread=fh.sample_shape.nthread)
>>> fw.write(d)
>>> fw.close() # This implicitly closes fwr.
To read the sequence and confirm their contents are identical to the sample
file’s, we may again use open
:
>>> frr = sf.open(filenames, mode='rb')
>>> fr = vdif.open(frr, 'rs', sample_rate=fh.sample_rate)
>>> fr.header0.time == fh.header0.time
True
>>> np.all(fr.read() == d)
True
>>> fr.close()
>>> fh.close() # Close sample file.
Reference/API¶
baseband.helpers Package¶
baseband.helpers.sequentialfile Module¶
Functions¶
|
Read or write several files as if they were one contiguous one. |
Classes¶
|
List-like generator of filenames using a template. |
|
Deal with several files as if they were one contiguous one. |
|
Read several files as if they were one contiguous one. |
|
Write several files as if they were one contiguous one. |
Class Inheritance Diagram¶
VLBI Base¶
Routines on which the readers and writers for specific VLBI formats are based.
Reference/API¶
baseband.vlbi_base Package¶
baseband.vlbi_base.header Module¶
Base definitions for VLBI Headers, used for VDIF and Mark 5B.
Defines a header class VLBIHeaderBase that can be used to hold the words corresponding to a frame header, providing access to the values encoded in via a dict-like interface. Definitions for headers are constructed using the HeaderParser class.
Functions¶
|
Construct a function that converts specific bits from a header. |
|
Construct a function that uses a value to set specific bits in a header. |
|
Return the default value from a header keyword. |
Classes¶
|
Property that is fixed for all instances of a class. |
|
Create a lazily evaluated dictionary of parsers, setters, or defaults. |
|
Parser & setter for VLBI header keywords. |
|
Base class for all VLBI headers. |
Class Inheritance Diagram¶
baseband.vlbi_base.payload Module¶
Base definitions for VLBI payloads, used for VDIF and Mark 5B.
Defines a payload class VLBIPayloadBase that can be used to hold the words corresponding to a frame payload, providing access to the values encoded in it as a numpy array.
Classes¶
|
Container for decoding and encoding VLBI payloads. |
Class Inheritance Diagram¶
baseband.vlbi_base.frame Module¶
Base definitions for VLBI frames, used for VDIF and Mark 5B.
Defines a frame class VLBIFrameBase that can be used to hold a header and a payload, providing access to the values encoded in both.
Classes¶
|
Representation of a VLBI data frame, consisting of a header and payload. |
Class Inheritance Diagram¶
baseband.vlbi_base.base Module¶
Functions¶
|
Create a baseband file opener. |
Classes¶
Error in finding a header in a stream. |
|
|
VLBI file wrapper, used to add frame methods to a binary data file. |
|
VLBI wrapped file reader base class. |
|
VLBI file wrapper, allowing access as a stream of data. |
|
|
|
Class Inheritance Diagram¶
baseband.vlbi_base.file_info Module¶
Provide a base class for “info” properties.
Loosely based on DataInfo
.
Classes¶
|
Like a lazy property, evaluated only once. |
|
|
|
Container providing a standardized interface to file information. |
|
Standardized information on file readers. |
|
Standardized information on stream readers. |
Class Inheritance Diagram¶
baseband.vlbi_base.encoding Module¶
Encoders and decoders for generic VLBI data formats.
Functions¶
|
Generic encoder for data stored using one bit. |
|
Generic encoder for data stored using two bits. |
|
Generic encoder for data stored using four bits. |
|
Generic decoder for data stored using 8 bits. |
|
Encode 8 bit VDIF data. |
Variables¶
Optimal high value for a 2-bit digitizer for which the low value is 1. |
|
Optimal level between low and high for the above OPTIMAL_2BIT_HIGH. |
|
Scaling for four-bit encoding that makes it look like 2 bit. |
|
Scaling for eight-bit encoding that makes it look like 2 bit. |
|
Levels for data encoded with different numbers of bits.. |
baseband.vlbi_base.utils Module¶
Functions¶
|
Calculate the least common multiple of a and b. |
|
|
|
|
|
Convert the pattern to a byte array. |
Classes¶
|
Cyclic Redundancy Check. |
|
Cyclic Redundancy Check for a bitstream. |
Class Inheritance Diagram¶
Sample Data Files¶
baseband.data Package¶
Sample files with baseband data recorded in different formats.
Variables¶
VDIF sample from ARO, written by CHIME backend. |
|
VDIF sample from Christian Ploetz. |
|
DADA sample from Effelsberg, with header adapted to shortened size. |
|
Corrupted VDIF sample. |
|
GSB phased sample. |
|
GSB phased header sample. |
|
GSB rawdump sample. |
|
GSB rawdump header sample. |
|
Mark 4 sample. |
|
Mark 4 sample. |
|
Mark 4 sample. |
|
Mark 4 sample. |
|
Mark 5B sample. |
|
VDIF sample from MWA. |
|
GUPPI/PUPPI sample, npol=2, nchan=4. |
|
VDIF sample. |
|
VDIF sample. |
Developer Documentation¶
The developer documentation feature tutorials for supporting new formats or format extensions such as VDIF EDV. It also contains instructions for publishing new code releases.
Supporting a New VDIF EDV¶
Users may encounter VDIF files with unusual headers not currently supported by Baseband. These may either have novel EDV, or they may purport to be a supported EDV but not conform to its formal specification. To handle such situations, Baseband supports implementation of new EDVs and overriding of existing EDVs without the need to modify Baseband’s source code.
The tutorials below assumes the following modules have been imported:
>>> import numpy as np
>>> import astropy.units as u
>>> from baseband import vdif, vlbi_base as vlbi
VDIF Headers¶
Each VDIF frame begins with a 32-byte, or eight 32-bit word, header that is structured as follows:

Schematic of the standard 32-bit VDIF header, from VDIF specification release 1.1.1 document, Fig. 3. 32-bit words are labelled on the left, while byte and bit numbers above indicate relative addresses within each word. Subscripts indicate field length in bits.¶
where the abbreviated labels are
\(\mathrm{I}_1\) - invalid data
\(\mathrm{L}_1\) - if 1, header is VDIF legacy
\(\mathrm{V}_3\) - VDIF version number
\(\mathrm{log}_2\mathrm{(\#chns)}_5\) - \(\mathrm{log}_2\) of the number of sub-bands in the frame
\(\mathrm{C}_1\) - if 1, complex data
\(\mathrm{EDV}_8\) - “extended data version” number; see below
Detailed definitions of terms are found on pages 5 to 7 of the VDIF specification document.
Words 4 - 7 hold optional extended user data, using a layout specified by the EDV, in word 4 of the header. EDV formats can be registered on the VDIF website; Baseband aims to support all registered formats (but does not currently support EDV = 4).
Implementing a New EDV¶
In this tutorial, we follow the implementation of an EDV=4 header. This would be a first and required step to support that format, but does not suffice, as it also needs a new frame class that allows the purpose of the EDV class, which is to independently store the validity of sub-band channels within a single data frame, rather than using the single invalid-data bit. From the EDV=4 specification, we see that we need to add the following to the standard VDIF header:
Validity header mask (word 4, bits 16 - 24): integer value between 1 and 64 inclusive indicating the number of validity bits. (This is different than \(\mathrm{log}_2\mathrm{(\#chns)}_5\), since some channels can be unused.)
Synchronization pattern (word 5): constant byte sequence
0xACABFEED
, for finding the locations of headers in a data stream.Validity mask (words 6 - 7): 64-bit binary mask indicating the validity of sub-bands. Any fraction of 64 sub-bands can be stored in this format, with any unused bands labelled as invalid (
0
) in the mask. If the number of bands exceeds 64, each bit indicates the validity of a group of sub-bands; see specification for details.
See Sec. 3.1 of the specification for best practices on using the invalid data bit \(\mathrm{I}_1\) in word 0.
In Baseband, a header is parsed using VDIFHeader
,
which returns a header instance of one of its subclasses, corresponding to the
header EDV. This can be seen in the baseband.vdif.header
module class
inheritance diagram. To support a new EDV, we create a new subclass to
baseband.vdif.VDIFHeader
:
>>> class VDIFHeader4(vdif.header.VDIFHeader):
... _edv = 4
...
... _header_parser = vlbi.header.HeaderParser(
... (('invalid_data', (0, 31, 1, False)),
... ('legacy_mode', (0, 30, 1, False)),
... ('seconds', (0, 0, 30)),
... ('_1_30_2', (1, 30, 2, 0x0)),
... ('ref_epoch', (1, 24, 6)),
... ('frame_nr', (1, 0, 24, 0x0)),
... ('vdif_version', (2, 29, 3, 0x1)),
... ('lg2_nchan', (2, 24, 5)),
... ('frame_length', (2, 0, 24)),
... ('complex_data', (3, 31, 1)),
... ('bits_per_sample', (3, 26, 5)),
... ('thread_id', (3, 16, 10, 0x0)),
... ('station_id', (3, 0, 16)),
... ('edv', (4, 24, 8)),
... ('validity_mask_length', (4, 16, 8, 0)),
... ('sync_pattern', (5, 0, 32, 0xACABFEED)),
... ('validity_mask', (6, 0, 64, 0))))
VDIFHeader
has a metaclass that ensures that
whenever it is subclassed, the subclass definition is inserted into the
VDIF_HEADER_CLASSES
dictionary using
its EDV value as the dictionary key. Methods in
VDIFHeader
use this dictionary to determine
the type of object to return for a particular EDV. How all this works is
further discussed in the documentation of the VDIF
baseband.vdif.header
module.
The class must have a private _edv
attribute for it to properly be
registered in VDIF_HEADER_CLASSES
. It must
also feature a _header_parser
that reads these words to return header
properties. For this, we use
baseband.vlbi_base.header.HeaderParser
. To initialize a header parser,
we pass it a tuple of header properties, where each entry follows the
syntax:
('property_name', (word_index, bit_index, bit_length, default))
where
property_name
: name of the header property; this will be the key;word_index
: index into the header words for this key;bit_index
: index to the starting bit of the part used;bit_length
: number of bits used, normally between 1 and 32, but can be 64 for adding two words together; anddefault
: (optional) default value to use in initialization.
For further details, see the documentation of
HeaderParser
.
Once defined, we can use our new header like any other:
>>> myheader = vdif.header.VDIFHeader.fromvalues(
... edv=4, seconds=14363767, nchan=1, samples_per_frame=1024,
... station=65532, bps=2, complex_data=False,
... thread_id=3, validity_mask_length=60,
... validity_mask=(1 << 59) + 1)
>>> myheader
<VDIFHeader4 invalid_data: False,
legacy_mode: False,
seconds: 14363767,
_1_30_2: 0,
ref_epoch: 0,
frame_nr: 0,
vdif_version: 1,
lg2_nchan: 0,
frame_length: 36,
complex_data: False,
bits_per_sample: 1,
thread_id: 3,
station_id: 65532,
edv: 4,
validity_mask_length: 60,
sync_pattern: 0xacabfeed,
validity_mask: 576460752303423489>
>>> myheader['validity_mask'] == 2**59 + 1
True
There is an easier means of instantiating the header parser. As can be seen in the
class inheritance diagram for the header
module, many VDIF
headers are subclassed from other VDIFHeader
subclasses, namely VDIFBaseHeader
and
VDIFSampleRateHeader
. This is because many
EDV specifications share common header values, and so their functions and
derived properties should be shared as well. Moreover, header parsers can be
appended to one another, which saves repetitious coding because the first four
words of any VDIF header are the same. Indeed, we can create the same header
as above by subclassing VDIFBaseHeader
:
>>> class VDIFHeader4Enhanced(vdif.header.VDIFBaseHeader):
... _edv = 42
...
... _header_parser = vdif.header.VDIFBaseHeader._header_parser +\
... vlbi.header.HeaderParser((
... ('validity_mask_length', (4, 16, 8, 0)),
... ('sync_pattern', (5, 0, 32, 0xACABFEED)),
... ('validity_mask', (6, 0, 64, 0))))
...
... _properties = vdif.header.VDIFBaseHeader._properties + ('validity',)
...
... def verify(self):
... """Basic checks of header integrity."""
... super(VDIFHeader4Enhanced, self).verify()
... assert 1 <= self['validity_mask_length'] <= 64
...
... @property
... def validity(self):
... """Validity mask array with proper length.
...
... If set, writes both ``validity_mask`` and ``validity_mask_length``.
... """
... bitmask = np.unpackbits(self['validity_mask'].astype('>u8')
... .view('u1'))[::-1].astype(bool)
... return bitmask[:self['validity_mask_length']]
...
... @validity.setter
... def validity(self, validity):
... bitmask = np.zeros(64, dtype=bool)
... bitmask[:len(validity)] = validity
... self['validity_mask_length'] = len(validity)
... self['validity_mask'] = np.packbits(bitmask[::-1]).view('>u8')
Here, we set edv = 42
because VDIFHeader
’s
metaclass is designed to prevent accidental overwriting of existing
entries in VDIF_HEADER_CLASSES
. If we had used
_edv = 4
, we would have gotten an exception:
ValueError: EDV 4 already registered in VDIF_HEADER_CLASSES
We shall see how to override header classes in the next section. Except for
the EDV, VDIFHeader4Enhanced
’s header structure is identical
to VDIFHeader4
. It also contains a few extra functions to enhance the
header’s usability.
The verify
function is an optional function that runs upon header
initialization to check its veracity. Ours simply checks that the
validity mask length is in the allowed range, but we also call the same function
in the superclass (VDIFBaseHeader
), which
checks that the header is not in 4-word “legacy mode”, that the header’s
EDV matches that read from the words, that there are eight words, and
that the sync pattern matches 0xACABFEED
.
The validity_mask
is a bit mask, which is not necessarily the easiest to
use directly. Hence, implement a derived validity
property that generates
a boolean mask of the right length (note that this is not right for cases
whether the number of channels in the header exceeds 64). We also define a
corresponding setter, and add this to the private _properties
attribute,
so that we can use validity
as a keyword in fromvalues
:
>>> myenhancedheader = vdif.header.VDIFHeader.fromvalues(
... edv=42, seconds=14363767, nchan=1, samples_per_frame=1024,
... station=65532, bps=2, complex_data=False,
... thread_id=3, validity=[True]+[False]*58+[True])
>>> myenhancedheader
<VDIFHeader4Enhanced invalid_data: False,
legacy_mode: False,
seconds: 14363767,
_1_30_2: 0,
ref_epoch: 0,
frame_nr: 0,
vdif_version: 1,
lg2_nchan: 0,
frame_length: 36,
complex_data: False,
bits_per_sample: 1,
thread_id: 3,
station_id: 65532,
edv: 42,
validity_mask_length: 60,
sync_pattern: 0xacabfeed,
validity_mask: [576460752303423489]>
>>> assert myenhancedheader['validity_mask'] == 2**59 + 1
>>> assert (myenhancedheader.validity == [True]+[False]*58+[True]).all()
>>> myenhancedheader.validity = [True]*8
>>> myenhancedheader['validity_mask']
array([255], dtype=uint64)
Note
If you have implemented support for a new EDV that is widely used, we encourage you to make a pull request to Baseband’s GitHub repository, as well as to register it (if it is not already registered) with the VDIF consortium!
Replacing an Existing EDV¶
Above, we mentioned that VDIFHeader
’s
metaclass is designed to prevent accidental overwriting of existing
entries in VDIF_HEADER_CLASSES
, so attempting
to assign two header classes to the same EDV results in an exception. There
are situations such the one above, however, where we’d like to replace
one header with another.
To get VDIFHeader
to use VDIFHeader4Enhanced
when edv=4
, we can manually insert it in the dictionary:
>>> vdif.header.VDIF_HEADER_CLASSES[4] = VDIFHeader4Enhanced
Of course, we should then be sure that its _edv
attribute is correct:
>>> VDIFHeader4Enhanced._edv = 4
VDIFHeader
will now return instances of
VDIFHeader4Enhanced
when reading headers with edv = 4
:
>>> myheader = vdif.header.VDIFHeader.fromvalues(
... edv=4, seconds=14363767, nchan=1,
... station=65532, bps=2, complex_data=False,
... thread_id=3, validity=[True]*60)
>>> assert isinstance(myheader, VDIFHeader4Enhanced)
Note
Failing to modify _edv
in the class definition will lead to an
EDV mismatch when verify
is called during header initialization.
This can also be used to override VDIFHeader
’s
behavior even for EDVs that are supported by Baseband, which may
prove useful when reading data with corrupted or mislabelled headers. To
illustrate this, we attempt to read in a corrupted VDIF file originally
from the Dominion Radio Astrophysical Observatory. This file can be
imported from the baseband data directory:
>>> from baseband.data import SAMPLE_DRAO_CORRUPT
Naively opening the file with
>>> fh = vdif.open(SAMPLE_DRAO_CORRUPT, 'rs')
will lead to an AssertionError. This is because while the headers of the
file use EDV=0, it deviates from that EDV standard by storing additional
information an: an “eud2” parameter in word 5, which is related to the sample time.
Furthermore, the bits_per_sample
setting is incorrect (it should be 3 rather
than 4 – the number is defined such that a one-bit sample has a
bits_per_sample
code of 0). Finally, though not an error, the
thread_id
in word 3 defines two parts, link
and slot
, which
reflect the data acquisition computer node that wrote the data to disk.
To accommodate these changes, we design an alternate header. We first
pop the EDV = 0 entry from VDIF_HEADER_CLASSES
:
>>> vdif.header.VDIF_HEADER_CLASSES.pop(0)
<class 'baseband.vdif.header.VDIFHeader0'>
We then define a replacement class:
>>> class DRAOVDIFHeader(vdif.header.VDIFHeader0):
... """DRAO VDIF Header
...
... An extension of EDV=0 which uses the thread_id to store link
... and slot numbers, and adds a user keyword (illegal in EDV0,
... but whatever) that identifies data taken at the same time.
...
... The header also corrects 'bits_per_sample' to be properly bps-1.
... """
...
... _header_parser = vdif.header.VDIFHeader0._header_parser + \
... vlbi.header.HeaderParser((('link', (3, 16, 4)),
... ('slot', (3, 20, 6)),
... ('eud2', (5, 0, 32))))
...
... def verify(self):
... pass # this is a hack, don't bother with verification...
...
... @classmethod
... def fromfile(cls, fh, edv=0, verify=False):
... self = super(DRAOVDIFHeader, cls).fromfile(fh, edv=0,
... verify=False)
... # Correct wrong bps
... self.mutable = True
... self['bits_per_sample'] = 3
... return self
We override verify
because VDIFHeader0
’s
verify
function checks that word 5 contains no data. We also override
the fromfile
class method such that the bits_per_sample
property
is reset to its proper value whenever a header is read from file.
We can now read in the corrupt file by manually reading in the header, then the payload, of each frame:
>>> fh = vdif.open(SAMPLE_DRAO_CORRUPT, 'rb')
>>> header0 = DRAOVDIFHeader.fromfile(fh)
>>> header0['eud2'] == 667235140
True
>>> header0['link'] == 2
True
>>> payload0 = vdif.payload.VDIFPayload.fromfile(fh, header0)
>>> payload0.shape == (header0.samples_per_frame, header0.nchan)
True
>>> fh.close()
Reading a frame using VDIFFrame
will still fail,
since its _header_class
is VDIFHeader
,
and so VDIFHeader.fromfile
,
rather than the function we defined, is used to read in headers. If we
wanted to use VDIFFrame
, we would need to set
VDIFFrame._header_class = DRAOVDIFHeader
before using baseband.vdif.open()
, so that header files are read
using DRAOVDIFHeader.fromfile
.
A more elegant solution that is compatible with baseband.vdif.base.VDIFStreamReader
without hacking baseband.vdif.frame.VDIFFrame
involves modifying the
bits-per-sample code within __init__()
. Let’s remove our previous custom
class, and define a replacement:
>>> vdif.header.VDIF_HEADER_CLASSES.pop(0)
<class '__main__.DRAOVDIFHeader'>
>>> class DRAOVDIFHeaderEnhanced(vdif.header.VDIFHeader0):
... """DRAO VDIF Header
...
... An extension of EDV=0 which uses the thread_id to store link and slot
... numbers, and adds a user keyword (illegal in EDV0, but whatever) that
... identifies data taken at the same time.
...
... The header also corrects 'bits_per_sample' to be properly bps-1.
... """
... _header_parser = vdif.header.VDIFHeader0._header_parser + \
... vlbi.header.HeaderParser((('link', (3, 16, 4)),
... ('slot', (3, 20, 6)),
... ('eud2', (5, 0, 32))))
...
... def __init__(self, words, edv=None, verify=True, **kwargs):
... super(DRAOVDIFHeaderEnhanced, self).__init__(
... words, verify=False, **kwargs)
... self.mutable = True
... self['bits_per_sample'] = 3
...
... def verify(self):
... pass
If we had the whole corrupt file, this might be enough to use the stream reader without further modification. It turns out, though, that the frame numbers are not monotonic and that the station ID changes between frames as well, so one would be better off making a new copy. Here, we can at least now read frames:
>>> fh2 = vdif.open(SAMPLE_DRAO_CORRUPT, 'rb')
>>> frame0 = fh2.read_frame()
>>> np.all(frame0.data == payload0.data)
True
>>> fh2.close()
Reading frames using VDIFFileReader.read_frame
will now work as well, but
reading frame sets using VDIFFileReader.read_frameset
will still fail.
This is because the frame and thread numbers that function relies on
are meaningless for these headers, and grouping threads together using
the link
, slot
and eud2
values should be manually performed
by the user.
Release Procedure¶
This procedure is based off of Astropy’s, and additionally uses information from the PyPI packaging tutorial.
Prerequisites¶
To make releases, you will need
The twine package.
An account on PyPI.
Collaborator status on Baseband’s repository at
mhvk/baseband
to push new branches.An account on Read the Docs that has access to Baseband.
Optionally, a GPG signing key associated with your GitHub account. While releases do not need to be signed, we recommend doing so to ensure they are trustworthy. To make a GPG key and associate it with your GitHub account, see the Astropy documentation.
Versioning¶
Baseband follows the semantic versioning specification:
major.minor.patch
where
major
number represents backward incompatible API changes.minor
number represents feature updates to last major version.patch
number represents bugfixes from last minor version.
Major and minor versions have their own release branches on GitHub that end
with “x” (eg. v1.0.x
, v1.1.x
), while specific releases are tagged
commits within their corresponding branch (eg. v1.1.0
and v1.1.1
are
tagged commits within v1.1.x
).
Procedure¶
The first two steps of the release procedure are different for major and minor releases than it is for patch releases. Steps specifically for major/minor releases are labelled “m”, and patch ones labelled “p”.
1m. Preparing major/minor code for release¶
We begin in the main development branch (the local equivalent to
mhvk/baseband:master
). First, check the following:
Ensure tests pass. Run the test suite by running
python3 setup.py test
in the Baseband root directory.Update
CHANGES.rst
. All merge commits to master since the last release should be documented (except trivial ones such as typo corrections). SinceCHANGES.rst
is updated for each merge commit, in practice it is only necessary to change the date of the release you are working on from “unreleased” to the current date.Add authors and contributors to
AUTHORS.rst
. To list contributors, one can use:git shortlog -n -s -e
This will also list contributors to astropy-helpers and the astropy template, who should not be added. If in doubt, cross-reference with the authors of pull requests.
Once finished, git add
any changes and make a commit:
git commit -m "Finalizing changelog and author list for v<version>"
For major/minor releases, the patch number is 0
.
Submit the commit as a pull request to master.
1p. Cherry-pick code for a patch release¶
We begin by checking out the appropriate release branch:
git checkout v<version branch>.x
Bugfix merge commits are backported to this branch from master by way of git
cherry-pick
. First, find the SHA hashes of the relevant merge commits in the
main development branch. Then, for each:
git cherry-pick -m 1 <SHA-1>
For more information, see Astropy’s documentation.
Once you have cherry-picked, check the following:
Ensure tests pass and documentation builds. Run the test suite by running
python3 setup.py test
, and build documentation by runningpython3 setup.py build_docs
, in the Baseband root directory.Update
CHANGES.rst
. Typically, merge commits record their changes, including any backported bugfixes, inCHANGES.rst
. Cherry-picking should add these records to this branch’sCHANGES.rst
, but if not, manually add them before making the commit (and manually remove any changes not relevant to this branch). Also, change the date of the release you are working on from “unreleased” to the current date.
Commit your changes:
git commit -m "Finalizing changelog for v<version>"
2m. Create a new release branch¶
Still in the main development branch, change the version
keyword under the
[[metadata]]
section of setup.cfg
to:
version = <version>
and make a commmit:
git commit -m "Preparing v<version>."
Submit the commit as a pull request to master.
Once the pull request has been merged, make and enter a new release branch:
git checkout -b v<version branch>.x
2p. Append to the release branch¶
In the release branch, prepare the patch release commit by changing the
version
keyword under the [[metadata]]
section of setup.cfg
to:
version = <version>
then make a new commmit:
git commit -m "Preparing v<version>."
4. Clean and package the release¶
Checkout the tag:
git checkout v<version>
Clean the repository:
git clean -dfx
cd astropy_helpers; git clean -dfx; cd ..
and ensure the repository has the proper permissions:
umask 0022
chmod -R a+Xr .
Finally, package the release’s source code:
python setup.py build sdist
5. Test the release¶
We now test installing and running Baseband in clean virtual environments, to
ensure there are no subtle bugs that come from your customized development
environment. Before creating the virtualenvs, we recommend checking if the
$PYTHONPATH
environmental variable is set. If it is, set it to a null
value (in bash, PYTHONPATH=
) before proceeding.
To create the environments:
python3 -m venv --no-site-packages test_release
Now, for each environment, activate it, navigate to the Baseband root directory, and run the tests:
source <name_of_virtualenv>/bin/activate
cd <baseband_directory>
pip install dist/baseband-<version>.tar.gz
pip install pytest-astropy
cd ~/
python -c 'import baseband; baseband.test()'
deactivate
If the test suite raises any errors (at this point, likely dependency issues), delete the release tag:
git tag -d v<version>
For a major/minor release, delete the v<version branch>.x
branch as well.
Then, make the necessary changes directly on the main development branch. Once
the issues are fixed, repeat steps 2 - 6.
If the tests succeed, you may optionally re-run the cleaning and packaging code above following the tests:
git clean -dfx
cd astropy_helpers; git clean -dfx; cd ..
umask 0022
chmod -R a+Xr .
python setup.py build sdist
You may optionally sign the source as well:
gpg --detach-sign -a dist/baseband-<version>.tar.gz
7. Publish the release on GitHub¶
If you are working a major/minor release, first push the branch to upstream
(assuming upstream is mhvk/baseband
):
git push upstream v<version branch>.x
Push the tag to GitHub as well:
git push upstream v<version>
Go to the mhvk/baseband
Releases section. Here, published releases are in
shown in blue, and unpublished tags in grey and in a much smaller font. To
publish a release, click on the v<version>
tag you just pushed, then click
“Edit tag” (on the upper right). This takes you to a form where you can
customize the release title and description. Leave the title blank, in
which case it is set to “v<version>”; you can leave the description blank as well
if you wish. Finally, click on “Publish release”. This takes you back to
Releases, where you should see our new release in blue.
The Baseband GitHub repo automatically updates Baseband’s Zenodo repository for each published release.
Check if your release has made it to Zenodo by clicking the badge in
Readme.rst
.
9. (Optional) test uploading the release¶
PyPI provides a test environment to safely try uploading new releases. To take advantage of this, use:
twine upload --repository-url https://test.pypi.org/legacy/ dist/baseband-<version>*
To test if this was successful, create a new virtualenv as above:
virtualenv --no-site-packages --python=python3 pypitest
Then (pip install pytest-astropy
comes first because test.pypi
does not
contain recent versions of Astropy):
source <name_of_virtualenv>/bin/activate
pip install pytest-astropy
pip install --index-url https://test.pypi.org/simple/ baseband
python -c 'import baseband; baseband.test()'
deactivate
11. Check if Readthedocs has updated¶
Go to Read the Docs and check that the
stable
version points to the latest stable release. Each minor release has
its own version as well, which should be pointing to its latest patch release.
12m. Clean up master¶
In the main development branch, add the next major/minor release to
CHANGES.rst
. Also update the version
keyword in setup.cfg
to:
version = <next major/minor version>.dev
Make a commmit:
git commit -m "Add v<next major/minor version> to the changelog."
Then submit a pull request to master.
12p. Update CHANGES.rst on master¶
Change the release date of the patch release in CHANGES.rst
on master to
the current date, then:
git commit -m "Added release date for v<version> to the changelog."
(Alternatively, git cherry-pick
the changelog fix from the release branch
back to the main development one.)
Then submit a pull request to master.
Project Details¶
Authors and Credits¶
If you used this package in your research, please cite it via DOI 10.5281/zenodo.1214268.
Authors¶
Marten van Kerkwijk (@mhvk)
Chenchong Charles Zhu (@cczhu)
Other contributors (alphabetical)¶
Rebecca Lin (@00rebe)
Nikhil Mahajan (@theXYZT)
Robert Main (@ramain)
Dana Simard (@danasimard)
George Stein (@georgestein)
If you have contributed to Baseband but are not listed above, please send one of the authors an e-mail, or open a pull request for this page.
Full Changelog¶
3.1 (unreleaded)¶
Bug Fixes¶
Frame rates are now calculated correctly also for Mark 4 data in which the first frame is the last within a second. [#341]
Fixed a bug where a VDIF header was not found correctly if the file pointer was very close to the start of a header already. [#346]
In VDIF header verification, include that the implied payload must have non-negative size. [#348]
Mark 4 now checks by default (
verify=True
) that frames are ordered correctly. [#349]find_header
will now always check that the frame corresponding to a header is complete (i.e., fits within the file). [#354]The
count
argument to.read()
no longer is changed in-place, making it safe to pass in array scalars or dimensionless quantities. [#373]
Other Changes and Additions¶
The Mark 4, Mark 5B, and VDIF stream readers are now able to replace missing pieces of files with zeros using
verify='fix'
. This is also the new default; useverify=True
for the old behaviour of raising an error on any inconsistency. [#357]The
VDIFFileReader
gained a newget_thread_ids()
method, which will scan through frames to determine the threads present in the file. This is now used insideVDIFStreamReader
and, combined with the above, allows reading of files that have missing threads in their first frame set. [#361]The stream reader info now also checks whether streams are continuous by reading the first and last sample, allowing a simple way to check whether the file will likely pose problems before possibly spending a lot of time reading it. [#364]
Much faster localization of Mark 5B frames. [#351]
VLBI file readers have gained a new method
locate_frames
that finds frame starts near the current location. [#354]For VLBI file readers,
find_header
now raises an exception if no frame is found (rather than returnNone
).The Mark 4 file reader’s
locate_frame
has been deprecated. Its functionality is replaced bylocate_frames
andfind_header
. [#354]Custom stream readers can now override only part of reading a given frame and testing that it is the right one. [#355]
The
HeaderParser
class was refactored and simplified, making setting keys faster. [#356]info
now also provides the number of frames in a file. [#364]
3.0 (2019-08-28)¶
This version only supports python3.
New Features¶
File information now includes whether a file can be read and decoded. The
readable()
method on stream readers also includes whether the data in a file can be decoded. [#316]
Bug Fixes¶
Empty GUPPI headers can now be created without having to pass in
verify=False
. This is needed for astropy 3.2, which initializes an empty header in its revamped.fromstring
method. [#314]VDIF multichannel headers and payloads are now forced to have power-of-two bits per sample. [#315]
Bits per complete sample for VDIF payloads are now calculated correctly also for non power-of-two bits per sample. [#315]
Guppi raw file info now presents the correct sample rate, corrected for overlap. [#319]
All headers now check that
samples_per_frame
are set to possible numbers. [#325]Getting
.info
on closed files no longer leads to an error (though no information can be retrieved). [#326]
Other Changes and Additions¶
Increased speed of VDIF stream reading by removing redundant verification. Reduces the overhead for verification for VDIF CHIME data from 50% (factor 1.5) to 13%. [#321]
2.0 (2018-12-12)¶
VDIF and Mark 5B readers and writers now support 1 bit per sample. [#277, #278]
Bug Fixes¶
VDIF reader will now properly ignore corrupt last frames. [#273]
Mark5B reader more robust against headers not being parsed correctly in
Mark5BFileReader.find_header
. [#275]All stream readers now have a proper
dtype
attribute, not a correspondingnp.float32
ornp.complex64
. [#280]GUPPI stream readers no longer emit warnings on not quite FITS compliant headers. [#283]
Other Changes and Additions¶
Added release procedure to the documentation. [#268]
1.2 (2018-07-27)¶
New Features¶
Expanded support for acccessing sequences of files to VLBI format openers and
baseband.open
. Enabledbaseband.guppi.open
to open file sequences using string templates like withbaseband.dada.open
. [#254]Created
baseband.helpers.sequentialfile.FileNameSequencer
, a general-purpose filename sequencer that can be passed to any format opener. [#253]
Other Changes and Additions¶
Moved the Getting Started section to “Using Baseband”, and created a new quickstart tutorial under Getting Started to better assist new users. [#260]
1.1.1 (2018-07-24)¶
Bug Fixes¶
Ensure
gsb
times can be decoded with astropy-dev (which is to become astropy 3.1). [#249]Fixed rounding error when encoding 4-bit data using
baseband.vlbi_base.encoding.encode_4bit_base
. [#250]Added GUPPI/PUPPI to the list of file formats used by
baseband.open
andbaseband.file_info
. [#251]
1.1 (2018-06-06)¶
New Features¶
Added a new
baseband.file_info
function, which can be used to inspect data files. [#200]Added a general file opener,
baseband.open
which for a set of formats will check whether the file is of that format, and then load it using the corresponding module. [#198]Allow users to pass a
verify
keyword to file openers reading streams. [#233]Added support for the GUPPI format. [#212]
Enabled
baseband.dada.open
to read streams where the last frame has an incomplete payload. [#228]
API Changes¶
In analogy with Mark 5B, VDIF header time getting and setting now requires a frame rate rather than a sample rate. [#217, #218]
DADA and GUPPI now support passing either a
start_time
oroffset
(in addition totime
) to set the start time in the header. [#240]
Bug Fixes¶
Other Changes and Additions¶
The
baseband.data
module with sample data files now has an explicit entry in the documentation. [#198]Increased speed of VLBI stream reading by changing the way header sync patterns are stored, and removing redundant verification steps. VDIF sequential decode is now 5 - 10% faster (depending on the number of threads). [#241]
1.0.1 (2018-06-04)¶
Bug Fixes¶
Fixed a bug in
baseband.dada.open
where passing asqueeze
setting is ignored when also passing header keywords in ‘ws’ mode. [#211]Raise an exception rather than return incorrect times for Mark 5B files in which the fractional seconds are not set. [#216]
Other Changes and Additions¶
Fixed broken links and typos in the documentation. [#211]
1.0.0 (2018-04-09)¶
Initial release.
Licenses¶
Baseband License¶
Baseband is licensed under the GNU General Public License v3.0. The full text
of the license can be found in LICENSE
under Baseband’s root directory.