Getting Started with Baseband¶
This quickstart tutorial is meant to help the reader hit the ground running with Baseband. For more detail, including writing to files, see Using Baseband.
For installation instructions, please see Installing Baseband.
When using Baseband, we typically will also use
numpy, the astropy.units module,
and Time
from the
astropy.time module.
Let’s import all of these:
>>> import baseband
>>> import numpy as np
>>> import astropy.units as u
>>> from astropy.time import Time
Opening Files¶
For this tutorial, we’ll use two sample files:
>>> from baseband.data import SAMPLE_VDIF, SAMPLE_MARK5B
The first file is a VDIF one created from EVN/VLBA observations of Black Widow pulsar PSR B1957+20, while the second is a Mark 5B from EVN/WSRT observations of the same pulsar.
To open the VDIF file:
>>> fh_vdif = baseband.open(SAMPLE_VDIF)
Opening the Mark 5B file is slightly more involved, as not all required metadata is stored in the file itself:
>>> fh_m5b = baseband.open(SAMPLE_MARK5B, nchan=8, sample_rate=32*u.MHz,
... ref_time=Time('2014-06-13 12:00:00'))
Here, we’ve manually passed in as keywords the number of channels, the
sample rate (number of samples per channel per second) as an
astropy.units.Quantity
, and a reference time within 500 days of the start of
the observation as an astropy.time.Time
. That last keyword is needed to
properly read timestamps from the Mark 5B file.
baseband.open
tries to open files using all available formats, returning
whichever is successful. If you know the format of your file, you can pass
its name with the format
keyword, or directly use its format opener (for
VDIF, it is baseband.vdif.open
). Also, the baseband.file_info
function can
help determine the format and any missing information needed by baseband.open
- see Inspecting Files.
Do you have a sequence of files you want to read in? You can pass a list of
filenames to baseband.open
, and it will open them up as if they were a single
file! See Reading or Writing to a Sequence of Files.
Reading Files¶
Radio baseband files are generally composed of blocks of binary data, or payloads, stored alongside corresponding metadata, or headers. Each header and payload combination is known as a data frame, and most formats feature files composed of a long series of frames.
Baseband file objects are frame-reading wrappers around Python file objects,
and have the same interface, including
seek
for seeking to different parts of the file,
tell
for reporting the file
pointer’s current position, and
read
for reading data. The
main difference is that Baseband file objects read and navigate in units of
samples.
Let’s read some samples from the VDIF file:
>>> data = fh_vdif.read(3)
>>> data
array([[-1. , 1. , 1. , -1. , -1. , -1. ,
3.316505, 3.316505],
[-1. , 1. , -1. , 1. , 1. , 1. ,
3.316505, 3.316505],
[ 3.316505, 1. , -1. , -1. , 1. , 3.316505,
-3.316505, 3.316505]], dtype=float32)
>>> data.shape
(3, 8)
Baseband decodes binary data into ndarray
objects. Notice we
input 3
, and received an array of shape (3, 8)
; this is because
there are 8 VDIF threads. Threads and channels represent different
components of the data such as polarizations or frequency sub-bands, and the
collection of all components at one point in time is referred to as a
complete sample. Baseband reads in units of complete samples,
and works with sample rates in units of complete samples per second (including
with the Mark 5B example above). Like an ndarray
, calling
fh_vdif.shape
returns the shape of the entire dataset:
>>> fh_vdif.shape
(40000, 8)
The first axis represents time, and all additional axes represent the shape of a complete sample. A labelled version of the complete sample shape is given by:
>>> fh_vdif.sample_shape
SampleShape(nthread=8)
Baseband extracts basic properties and header metadata from opened files. Notably, the start and end times of the file are given by:
>>> fh_vdif.start_time
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.000000000>
>>> fh_vdif.stop_time
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.001250000>
For an overview of the file, we can either print fh_vdif
itself, or use the
info
method:
>>> fh_vdif
<VDIFStreamReader name=... offset=3
sample_rate=32.0 MHz, samples_per_frame=20000,
sample_shape=SampleShape(nthread=8),
bps=2, complex_data=False, edv=3, station=65532,
start_time=2014-06-16T05:56:07.000000000>
>>> fh_vdif.info
VDIFStream information:
start_time = 2014-06-16T05:56:07.000000000
stop_time = 2014-06-16T05:56:07.001250000
sample_rate = 32.0 MHz
shape = (40000, 8)
format = vdif
bps = 2
complex_data = False
verify = fix
readable = True
checks: decodable: True
continuous: no obvious gaps
VDIFFile information:
edv = 3
number_of_frames = 16
thread_ids = [0, 1, 2, 3, 4, 5, 6, 7]
number_of_framesets = 2
frame_rate = 1600.0 Hz
samples_per_frame = 20000
sample_shape = (8, 1)
Seeking is also done in units of complete samples, which is equivalent to seeking in timesteps. Let’s move forward 100 complete samples:
>>> fh_vdif.seek(100)
100
Seeking from the end or current position is also possible, using the same syntax as for typical file objects. It is also possible to seek in units of time:
>>> fh_vdif.seek(-1000, 2) # Seek 1000 samples from end.
39000
>>> fh_vdif.seek(10*u.us, 1) # Seek 10 us from current position.
39320
fh_vdif.tell
returns the current offset in samples or in time:
>>> fh_vdif.tell()
39320
>>> fh_vdif.tell(unit=u.us) # Time since start of file.
<Quantity 1228.75 us>
>>> fh_vdif.tell(unit='time')
<Time object: scale='utc' format='isot' value=2014-06-16T05:56:07.001228750>
Finally, we close both files:
>>> fh_vdif.close()
>>> fh_m5b.close()