Using Baseband¶
For most file formats, one can simply import baseband and use baseband.open
to access the file. This gives one a filehandle from which one can read
decoded samples:
>>> import baseband
>>> from baseband.data import SAMPLE_DADA
>>> fh = baseband.open(SAMPLE_DADA)
>>> fh.read(3)
array([[ -38.-38.j, -38.-38.j],
[ -38.-38.j, -40. +0.j],
[-105.+60.j, 85.-15.j]], dtype=complex64)
>>> fh.close()
For other file formats, a bit more information is needed. Below, we cover the basics of inspecting files, reading from and writing to files, converting from one format to another, and diagnosing problems. We assume that Baseband as well as numpy and the astropy.units module have been imported:
>>> import baseband
>>> import numpy as np
>>> import astropy.units as u
Inspecting Files¶
Baseband allows you to quickly determine basic properties of a file, including
what format it is, using the baseband.file_info
function. For instance, it
shows that the sample VDIF file that comes with Baseband is very short (sample
files can all be found in the baseband.data
module):
>>> import baseband.data
>>> baseband.file_info(baseband.data.SAMPLE_VDIF)
VDIFStream information:
start_time = 2014-06-16T05:56:07.000000000
stop_time = 2014-06-16T05:56:07.001250000
sample_rate = 32.0 MHz
shape = (40000, 8)
format = vdif
bps = 2
complex_data = False
verify = fix
readable = True
checks: decodable: True
continuous: no obvious gaps
VDIFFile information:
edv = 3
number_of_frames = 16
thread_ids = [0, 1, 2, 3, 4, 5, 6, 7]
number_of_framesets = 2
frame_rate = 1600.0 Hz
samples_per_frame = 20000
sample_shape = (8, 1)
The same function will also tell you when more information is needed. For instance, for Mark 5B files one needs the number of channels used, as well as (roughly) when the data were taken:
>>> baseband.file_info(baseband.data.SAMPLE_MARK5B)
Mark5BFile information:
format = mark5b
number_of_frames = 4
frame_rate = 6400.0 Hz
bps = 2
complex_data = False
readable = False
missing: nchan: needed to determine sample shape, frame rate, decode data.
kday, ref_time: needed to infer full times.
>>> from astropy.time import Time
>>> baseband.file_info(baseband.data.SAMPLE_MARK5B, nchan=8, ref_time=Time('2014-01-01'))
Mark5BStream information:
start_time = 2014-06-13T05:30:01.000000000
stop_time = 2014-06-13T05:30:01.000625000
sample_rate = 32.0 MHz
shape = (20000, 8)
format = mark5b
bps = 2
complex_data = False
verify = fix
readable = True
checks: decodable: True
continuous: no obvious gaps
Mark5BFile information:
number_of_frames = 4
frame_rate = 6400.0 Hz
samples_per_frame = 5000
sample_shape = (8,)
The information is gleaned from info
properties on the various file and
stream readers (see below).
Note
The one format for which file_info
works a bit differently is
GSB, as this format requires separate time-stamp and raw data
files. Only the timestamp file can be inspected usefully.
Reading Files¶
Opening Files¶
As shown at the very start, files can be opened with the general
baseband.open
function. This will try to determine the file type using
file_info
, load the corresponding baseband module, and then open
the file using that module’s master input/output function.
Generally, if one knows the file type, one might as well work with the
corresponding module directly. For instance, to explicitly use the DADA
reader to open the sample DADA file included in Baseband, one can use the DADA
module’s open
function:
>>> from baseband import dada
>>> from baseband.data import SAMPLE_DADA
>>> fh = dada.open(SAMPLE_DADA, 'rs')
>>> fh.read(3)
array([[ -38.-38.j, -38.-38.j],
[ -38.-38.j, -40. +0.j],
[-105.+60.j, 85.-15.j]], dtype=complex64)
>>> fh.close()
In general, file I/O and data manipulation use the same syntax across all file
formats. When opening Mark 4 and Mark 5B files, however, some additional
arguments may need to be passed (as was the case above for inspecting a Mark
5B file, and indeed this is a good way to find out what is needed).
Notes on such features and quirks of individual formats can be
found in the API entries of their open
functions, and within the
Specific file format documentation.
For the rest of this section, we will stick to VDIF files.
Decoding Data and the Sample File Pointer¶
By giving the openers a 'rs'
flag, which is the default, we open files in
“stream reader” mode, where a file is accessed as if it were a stream of
samples. For VDIF, open
will then return an instance of
VDIFStreamReader
, which wraps a raw data file with
methods to decode the binary data frames and seek to and read data
samples. To decode the first 12 samples into a ndarray
, we would
use the read
method:
>>> from baseband import vdif
>>> from baseband.data import SAMPLE_VDIF
>>> fh = vdif.open(SAMPLE_VDIF, 'rs')
>>> d = fh.read(12)
>>> type(d)
<... 'numpy.ndarray'>
>>> d.shape
(12, 8)
>>> d[:, 0].astype(int) # First thread.
array([-1, -1, 3, -1, 1, -1, 3, -1, 1, 3, -1, 1])
As discussed in detail in the VDIF section, VDIF files are
sequences of data frames, each of which is comprised of a header (which
holds information like the time at which the data was taken) and a
payload, or block of data. Multiple concurrent time streams can be
stored within a single frame; each of these is called a “channel”.
Moreover, groups of channels can be stored over multiple frames, each of which
is called a “thread”. Our sample file is an “8-thread, single-channel
file” (8 concurrent time streams with 1 stream per frame), and in the example
above, fh.read
decoded the first 12 samples from all 8 threads, mapping
thread number to the second axis of the decoded data array. Reading files with
multiple threads and channels will produce 3-dimensional arrays.
fh
includes shape
, size
and ndim
, which give the shape, total
number of elements and dimensionality of the file’s entire dataset if it was
decoded into an array. The number of complete samples - the set of samples
from all available threads and channels for one point in time - in the file is
given by the first element in shape
:
>>> fh.shape # Shape of all data from the file in decoded array form.
(40000, 8)
>>> fh.shape[0] # Number of complete samples.
40000
>>> fh.size
320000
>>> fh.ndim
2
The shape of a single complete sample, including names indicating the meaning of shape dimensions, is retrievable using:
>>> fh.sample_shape
SampleShape(nthread=8)
By default, dimensions of length unity are squeezed, or removed from the
sample shape. To retain them, we can pass squeeze=False
to
open
:
>>> fhns = vdif.open(SAMPLE_VDIF, 'rs', squeeze=False)
>>> fhns.sample_shape # Sample shape now keeps channel dimension.
SampleShape(nthread=8, nchan=1)
>>> fhns.ndim # fh.shape and fh.ndim also change with squeezing.
3
>>> d2 = fhns.read(12)
>>> d2.shape # Decoded data has channel dimension.
(12, 8, 1)
>>> fhns.close()
Basic information about the file is obtained by either by fh.info
or simply
fh
itself:
>>> fh.info
VDIFStream information:
start_time = 2014-06-16T05:56:07.000000000
stop_time = 2014-06-16T05:56:07.001250000
sample_rate = 32.0 MHz
shape = (40000, 8)
format = vdif
bps = 2
complex_data = False
verify = fix
readable = True
checks: decodable: True
continuous: no obvious gaps
VDIFFile information:
edv = 3
number_of_frames = 16
thread_ids = [0, 1, 2, 3, 4, 5, 6, 7]
number_of_framesets = 2
frame_rate = 1600.0 Hz
samples_per_frame = 20000
sample_shape = (8, 1)
>>> fh
<VDIFStreamReader name=... offset=12
sample_rate=32.0 MHz, samples_per_frame=20000,
sample_shape=SampleShape(nthread=8),
bps=2, complex_data=False, edv=3, station=65532,
start_time=2014-06-16T05:56:07.000000000>
Not coincidentally, the first is identical to what we found above using file_info
.
The filehandle itself also shows the offset
, the current location of the
sample file pointer. Above, it is at 12
since we have read in 12 (complete)
samples. If we called fh.read (12)
again we would get the next 12 samples.
If we instead called fh.read()
, it would read from the pointer’s current
position to the end of the file. If we wanted all the data in one array, we
would move the file pointer back to the start of file, using fh.seek
,
before reading:
>>> fh.seek(0) # Seek to sample 0. Seek returns its offset in counts.
0
>>> d_complete = fh.read()
>>> d_complete.shape
(40000, 8)
We can also move the pointer with respect to the end of file by passing 2
as a second argument:
>>> fh.seek(-100, 2) # Second arg is 0 (start of file) by default.
39900
>>> d_end = fh.read(100)
>>> np.array_equal(d_complete[-100:], d_end)
True
-100
means 100 samples before the end of file, so d_end
is equal to
the last 100 entries of d_complete
. Baseband only keeps the most recently
accessed data frame in memory, making it possible to analyze (normally large)
files through selective decoding using seek
and read
.
Note
As with file pointers in general, fh.seek
will not return an error if
one seeks beyond the end of file. Attempting to read beyond
the end of file, however, will result in an EOFError
.
To determine where the pointer is located, we use fh.tell()
:
>>> fh.tell()
40000
>>> fh.close()
Caution should be used when decoding large blocks of data using fh.read
.
For typical files, the resulting arrays are far too large to hold in memory.
Seeking and Telling in Time With the Sample Pointer¶
We can use seek
and tell
with units of time rather than samples. To do
this with tell
, we can pass an appropriate astropy.units.Unit
object to
its optional unit
parameter:
>>> fh = vdif.open(SAMPLE_VDIF, 'rs')
>>> fh.seek(40000)
40000
>>> fh.tell(unit=u.ms)
<Quantity 1.25 ms>
Passing the string 'time'
reports the pointer’s location in absolute time:
>>> fh.tell(unit='time')
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.001250000>
We can also pass an absolute astropy.time.Time
, or a positive or negative time
difference TimeDelta
or astropy.units.Quantity
to seek
.
If the offset is a Time
object, the second argument to seek is
ignored.:
>>> from astropy.time.core import TimeDelta
>>> from astropy.time import Time
>>> fh.seek(TimeDelta(-5e-4, format='sec'), 2) # Seek -0.5 ms from end.
24000
>>> fh.seek(0.25*u.ms, 1) # Seek 0.25 ms from current position.
32000
>>> # Seek to specific time.
>>> fh.seek(Time('2014-06-16T05:56:07.001125'))
36000
We can retrieve the time of the first sample in the file using start_time
,
the time immediately after the last sample using stop_time
, and the time
of the pointer’s current location (equivalent to fh.tell(unit='time')
)
using time
:
>>> fh.start_time
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.000000000>
>>> fh.stop_time
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.001250000>
>>> fh.time
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.001125000>
>>> fh.close()
Extracting Header Information¶
The first header of the file is stored as the header0
attribute of the
stream reader object; it gives direct access to header properties via keyword
lookup:
>>> with vdif.open(SAMPLE_VDIF, 'rs') as fh:
... header0 = fh.header0
>>> header0['frame_length']
629
The full list of keywords is available by printing out header0
:
>>> header0
<VDIFHeader3 invalid_data: False,
legacy_mode: False,
seconds: 14363767,
_1_30_2: 0,
ref_epoch: 28,
frame_nr: 0,
vdif_version: 1,
lg2_nchan: 0,
frame_length: 629,
complex_data: False,
bits_per_sample: 1,
thread_id: 1,
station_id: 65532,
edv: 3,
sampling_unit: True,
sampling_rate: 16,
sync_pattern: 0xacabfeed,
loif_tuning: 859832320,
_7_28_4: 15,
dbe_unit: 2,
if_nr: 0,
subband: 1,
sideband: True,
major_rev: 1,
minor_rev: 5,
personality: 131>
A number of derived properties, such as the time (as a Time
object), are also available through the header object:
>>> header0.time
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.000000000>
These are listed in the API for each header class. For example, the sample VDIF file’s headers are of class:
>>> type(header0)
<class 'baseband.vdif.header.VDIFHeader3'>
and so its attributes can be found here
.
Reading Specific Components of the Data¶
By default, fh.read()
returns complete samples, i.e. with all
available threads, polarizations or channels. If we were only interested in
decoding a subset of the complete sample, we can select specific
components by passing indexing objects to the subset
keyword in open. For
example, if we only wanted thread 3 of the sample VDIF file:
>>> fh = vdif.open(SAMPLE_VDIF, 'rs', subset=3)
>>> fh.sample_shape
()
>>> d = fh.read(20000)
>>> d.shape
(20000,)
>>> fh.subset
(3,)
>>> fh.close()
Since by default data are squeezed, one obtains a data stream with just a
single dimension. If one would like to keep all information, one has to pass
squeeze=False
and also make subset
a list (or slice):
>>> fh = vdif.open(SAMPLE_VDIF, 'rs', subset=[3], squeeze=False)
>>> fh.sample_shape
SampleShape(nthread=1, nchan=1)
>>> d = fh.read(20000)
>>> d.shape
(20000, 1, 1)
>>> fh.close()
Data with multi-dimensional samples can be subset by passing a tuple
of
indexing objects with the same dimensional ordering as the (possibly squeezed)
sample shape; in the case of the sample VDIF with squeeze=False
, this is
threads, then channels. For example, if we wished to select threads 1 and 3,
and channel 0:
>>> fh = vdif.open(SAMPLE_VDIF, 'rs', subset=([1, 3], 0), squeeze=False)
>>> fh.sample_shape
SampleShape(nthread=2)
>>> fh.close()
Generally, subset
accepts any object that can be used to index a
numpy.ndarray
, including advanced indexing (as done above, with
subset=([1, 3], 0)
). If possible, slices should be used instead
of list of integers, since indexing with them returns a view rather
than a copy and thus avoid unnecessary processing and memory allocation.
(An exception to this is VDIF threads, where the subset is used to selectively
read specific threads, and thus is not used for actual slicing of the data.)
Writing to Files and Format Conversion¶
Writing to a File¶
To write data to disk, we again use open
. Writing data in a particular
format requires both the header and data samples. For modifying an existing
file, we have both the old header and old data handy.
As a simple example, let’s read in the 8-thread, single-channel sample VDIF file and rewrite it as an single-thread, 8-channel one, which, for example, may be necessary for compatibility with DSPSR:
>>> import baseband.vdif as vdif
>>> from baseband.data import SAMPLE_VDIF
>>> fr = vdif.open(SAMPLE_VDIF, 'rs')
>>> fw = vdif.open('test_vdif.vdif', 'ws',
... sample_rate=fr.sample_rate,
... samples_per_frame=fr.samples_per_frame // 8,
... nthread=1, nchan=fr.sample_shape.nthread,
... complex_data=fr.complex_data, bps=fr.bps,
... edv=fr.header0.edv, station=fr.header0.station,
... time=fr.start_time)
The minimal parameters needed to generate a file are listed under the
documentation for each format’s open
, though comprehensive lists can be
found in the documentation for each format’s stream writer class (eg. for
VDIF, it’s under VDIFStreamWriter
). In practice we
specify as many relevant header properties as available to obtain a particular
file structure. If we possess the exact first header of the file, it can
simply be passed to open
via the header
keyword. In the example above,
though, we manually switch the values of nthread
and nchan
. Because
VDIF EDV = 3 requires each frame’s payload to contain 5000 bytes, and nchan
is now a factor of 8 larger, we decrease samples_per_frame
, the number of
complete (i.e. all threads and channels included) samples per frame, by a
factor of 8.
Encoding samples and writing data to file is done by passing data arrays into
fw
’s write
method. The first
dimension of the arrays is sample number, and the remaining dimensions must be
as given by fw.sample_shape
:
>>> fw.sample_shape
SampleShape(nchan=8)
In this case, the required dimensions are the same as the arrays from
fr.read
. We can thus write the data to file using:
>>> while fr.tell() < fr.shape[0]:
... fw.write(fr.read(fr.samples_per_frame))
>>> fr.close()
>>> fw.close()
For our sample file, we could simply have written
fw.write(fr.read())
instead of the loop, but for large files, reading and writing should be done in smaller chunks to minimize memory usage. Baseband stores only the data frame or frame set being read or written to in memory.
We can check the validity of our new file by re-opening it:
>>> fr = vdif.open(SAMPLE_VDIF, 'rs')
>>> fh = vdif.open('test_vdif.vdif', 'rs')
>>> fh.sample_shape
SampleShape(nchan=8)
>>> np.all(fr.read() == fh.read())
True
>>> fr.close()
>>> fh.close()
Note
One can also use the top-level open
function for writing,
with the file format passed in via its format
argument.
File Format Conversion¶
It is often preferable to convert data from one file format to another that offers wider compatibility, or better fits the structure of the data. As an example, we convert the sample Mark 4 data to VDIF.
Since we don’t have a VDIF header handy, we pass the relevant Mark 4 header
values into vdif.open
to create one:
>>> import baseband.mark4 as mark4
>>> from baseband.data import SAMPLE_MARK4
>>> fr = mark4.open(SAMPLE_MARK4, 'rs', ntrack=64, decade=2010)
>>> spf = 640 # fanout * 160 = 640 invalid samples per Mark 4 frame
>>> fw = vdif.open('m4convert.vdif', 'ws', sample_rate=fr.sample_rate,
... samples_per_frame=spf, nthread=1,
... nchan=fr.sample_shape.nchan,
... complex_data=fr.complex_data, bps=fr.bps,
... edv=1, time=fr.start_time)
We choose edv = 1
since it’s the simplest VDIF EDV whose header includes a
sampling rate. The concept of threads does not exist in Mark 4, so the file
effectively has nthread = 1
. As discussed in the Mark 4
documentation, the data at the start of each frame is effectively
overwritten by the header and are represented by invalid samples in the stream
reader. We set samples_per_frame
to 640
so that each section of
invalid data is captured in a single frame.
We now write the data to file, manually flagging each invalid data frame:
>>> while fr.tell() < fr.shape[0]:
... d = fr.read(fr.samples_per_frame)
... fw.write(d[:640], valid=False)
... fw.write(d[640:])
>>> fr.close()
>>> fw.close()
Lastly, we check our new file:
>>> fr = mark4.open(SAMPLE_MARK4, 'rs', ntrack=64, decade=2010)
>>> fh = vdif.open('m4convert.vdif', 'rs')
>>> np.all(fr.read() == fh.read())
True
>>> fr.close()
>>> fh.close()
For file format conversion in general, we have to consider how to properly
scale our data to make the best use of the dynamic range of the new encoded
format. For VLBI formats like VDIF, Mark 4 and Mark 5B, samples of the same
size have the same scale, which is why we did not have to rescale our data when
writing 2-bits-per-sample Mark 4 data to a 2-bits-per-sample VDIF file.
Rescaling is necessary, though, to convert DADA or GSB to VDIF. For examples
of rescaling, see the baseband/tests/test_conversion.py
file.
Reading or Writing to a Sequence of Files¶
Data from one continuous observation is sometimes spread over a sequence of
files. Baseband includes the sequentialfile
module for
reading in a sequence as if it were one contiguous file. This module is called
when a list, tuple or filename template is passed to eg. baseband.open
or
baseband.vdif.open
, making the syntax for handling multiple files nearly
identical to that for single ones.
As an example, we write the data from the sample VDIF file
baseband/data/sample.vdif
into a sequence of two files and then read the
files back in. We first load the required data:
>>> from baseband import vdif
>>> from baseband.data import SAMPLE_VDIF
>>> import numpy as np
>>> fh = vdif.open(SAMPLE_VDIF, 'rs')
>>> d = fh.read()
We then create a sequence of filenames:
>>> filenames = ["seqvdif_{0}".format(i) for i in range(2)]
When passing filenames
to open
, we must also pass
file_size
, the file size in bytes, in addition to the usual kwargs
for
writing a file. Since we wish to split the sample file in two, and the file
consists of two framesets, we set file_size
to the byte size of one
frameset (we could have equivalently set it to fh.fh_raw.seek(0, 2) // 2
):
>>> file_size = 8 * fh.header0.frame_nbytes
>>> fw = vdif.open(filenames, 'ws', header0=fh.header0,
... file_size=file_size, sample_rate=fh.sample_rate,
... nthread=fh.sample_shape.nthread)
>>> fw.write(d)
>>> fw.close() # This implicitly closes fwr.
Note
file_size
sets the maximum size a file can reach before the
writer writes to the next one, so setting file_size
to a larger
value than above will lead to the two files having different sizes. By
default, file_size=None
, meaning it can be arbitrarily large, in which
case only one file will be created.
We now read the sequence and confirm their contents are identical to those of the sample file:
>>> fr = vdif.open(filenames, 'rs', sample_rate=fh.sample_rate)
>>> fr.header0.time == fh.header0.time
True
>>> np.all(fr.read() == d)
True
>>> fr.close()
When reading, the filename sequence must be ordered in time.
We can also open the second file on its own and confirm it contains the second frameset of the sample file:
>>> fsf = vdif.open(filenames[1], mode='rs', sample_rate=fh.sample_rate)
>>> fh.seek(fh.shape[0] // 2) # Seek to start of second frameset.
20000
>>> abs(fsf.header0.time - fh.time) < 1.*u.ns
True
>>> np.all(fsf.read() == fh.read())
True
>>> fsf.close()
In situations where the file_size
is known, but not the total number of
files to write, one may use the FileNameSequencer
class to create an iterable without a user-defined size. The class is
initialized with a template string that can be formatted with keywords, and a
optional header
that can either be an actual header or a dict
with the
relevant keywords. The template may also contain the special keyword
‘{file_nr}’, which is equal to the indexing value (instead of a header entry).
As an example, let us create a sequencer:
>>> from baseband.helpers import sequentialfile as sf
>>> filenames = sf.FileNameSequencer('f.edv{edv:d}.{file_nr:03d}.vdif',
... header=fh.header0)
Indexing the sequencer using square brackets returns a filename:
>>> filenames[0]
'f.edv3.000.vdif'
>>> filenames[42]
'f.edv3.042.vdif'
The sequencer has extracted the EDV from the header we passed in, and the file number from the index. We can use the sequencer to write a VDIF file sequence:
>>> fw = vdif.open(filenames, 'ws', header0=fh.header0,
... file_size=file_size, sample_rate=fh.sample_rate,
... nthread=fh.sample_shape.nthread)
>>> d = np.concatenate([d, d, d])
>>> fw.write(d)
>>> fw.close()
This creates 6 files:
>>> import glob
>>> len(glob.glob("f.edv*.vdif"))
6
We can read the file sequence using the same sequencer. In reading mode, the sequencer determines the number of files by finding the largest file available that fits the template:
>>> fr = vdif.open(filenames, 'rs', sample_rate=fh.sample_rate)
>>> fr.header0.time == fh.header0.time
True
>>> np.all(fr.read() == d)
True
>>> fr.close()
>>> fh.close() # Close sample file as well.
Because DADA and GUPPI data are usually stored in file sequences with names
derived from header values - eg. ‘puppi_58132_J1810+1744_2176.0010.raw’,
their format openers have template support built-in. For usage details, please
see the API entries for baseband.dada.open
and baseband.guppi.open
.
Diagnosing problems with baseband files¶
Little is more annoying than starting a very long analysis script only to find the reader crashed with an error near the end. Unfortunately, while there is only one way for success, there are many for failure. Some, though, can be found by inspecting files. To see what would show up for a file that misses a frame, we first construct one:
>>> from astropy.time import Time
>>> from baseband import vdif
>>> fc = vdif.open('corrupt.vdif', 'ws', edv=1, nthread=2,
... bps=8, samples_per_frame=16,
... time=Time('J2010'), sample_rate=16*u.kHz)
>>> fc.write(np.zeros((8000, 2)))
>>> fc.fh_raw.seek(-100, 1)
47900
>>> fc.write(np.zeros((8000, 2)))
>>> fc.close()
Here, rewinding the internal raw file pointer a bit to simulate “missing bytes” is an implementation detail that one should not rely on!
Now check its info
:
>>> fh = baseband.vdif.open('corrupt.vdif', 'rs', verify=True)
>>> fh.info.readable
False
>>> fh.info
VDIFStream information:
start_time = 2009-12-31T23:58:53.816000000
stop_time = 2009-12-31T23:58:54.816000000
sample_rate = 0.016 MHz
shape = (16000, 2)
format = vdif
bps = 8
complex_data = False
verify = True
readable = False
checks: decodable: True
continuous: False
errors: continuous: While reading at 7968: AssertionError()
warnings: number_of_frames: file contains non-integer number (1997.9166666666667) of frames
VDIFFile information:
edv = 1
thread_ids = [0, 1]
frame_rate = 1000.0 Hz
samples_per_frame = 16
sample_shape = (2, 1)
>>> fh.close()
In detail, the error is given for a position earlier than the one we corrupted, because internally baseband reads a frame ahead since a corrupted frame typically means something is bad before as well.
This particular problem is not bad, since the VDIF reader can deal with
missing frames. Indeed, when one opens the file with the default
verify='fix'
, one gets:
>>> fh = baseband.vdif.open('corrupt.vdif', 'rs')
>>> fh.info
VDIFStream information:
start_time = 2009-12-31T23:58:53.816000000
stop_time = 2009-12-31T23:58:54.816000000
sample_rate = 0.016 MHz
shape = (16000, 2)
format = vdif
bps = 8
complex_data = False
verify = fix
readable = True
checks: decodable: True
continuous: fixable gaps
warnings: number_of_frames: file contains non-integer number (1997.9166666666667) of frames
continuous: While reading at 7968: problem loading frame set 498. Thread(s) [1] missing; set to invalid.
VDIFFile information:
edv = 1
thread_ids = [0, 1]
frame_rate = 1000.0 Hz
samples_per_frame = 16
sample_shape = (2, 1)
>>> fh.close()