Flagging GMRT data
Introduction
The aim of the programs described below is to allow flagging of bad
data points in GMRT interferometer data. Data editing itself is done
within the AIPS environment with the task UVFLG. The identification
of bad data is done outside the AIPS environment with more sophisticated
algorithms than currently available within AIPS. For more information
about the AIPS tasks, please use the AIPS verbs HELP and EXPLAIN
The data flagging pipeline for GMRT data is set up as follows:
- 1. Export the data from AIPS (UVFND)
- 2. Search for bad data and create a list of data to be flagged (program BORG)
- 3. Convert the flaglist into an AIPS runfile that invokes the AIPS task UVFLG
AIPS RUNFILES
An important concept to take work out of the hands of the user are AIPS runfiles.
For a description, type 'HELP RUN' in AIPS. An AIPS runfile can be modified with
any text editor. These are the important points to remember about runfiles:
- Before starting AIPS type (case-sensitive): setenv FITSDIR $PWD
- Also: setenv RUNFIL $PWD
- The first line must be a comment (start with *)
- The filename must be in upper case, with extension the hexadecimal AIPS user number.
- Keep the runfile in the the directory from which AIPS is started (that is the directory
in which you type 'aips tv=local'). There are other ways to do this, but this is the easiest.
- The commands in the runfile RUNME.06D (a runfile of user number 229) are executed by typing
'RUN RUNME' at the AIPS prompt.
The steps 1 and 3 mentioned above are carried out by (large) AIPS
runfiles that will be generated by programs described below.
Step 1: exporting data from AIPS
The data is written into ASCII tables with the AIPS task UVFND (STOKES='CORR').
The big advantage is that this human-readable format makes the procedure
very transparent. There are two disadvantages. Neither of these is a fundamental
problem though.
1. UVFND will not print more than approximately 32000 lines at a time.
2. The floating format for amplitude is fixed to ff.fff. This is sufficient for
calibrators before flux calibration, but the target source must be flux calibrated
first.
As a result, it is necessary to complete flagging on the calibrators,
do the flux calibration, and then search the source data for bad
points. This is only necessary because of the poor numerical format in
the output of UVFND. None of the algorithms that detect bad data
assume flux calibration.
Frequency channels are considered separately. All of the UV data for a
particular source (calibrator or target source) and a particular
channel will be stored in a single data file. The flagging procedure
is the same whether a single channel or a hundred channels are considered.
A hundred channels will require a hundred ASCII files per source.
The extraction of the data is fully automated. The program that
generates the required runfile is called mkuvfnd. This program
requires the output of the AIPS task LISTR, with the keyword
OPTYPE='SCAN'. The required output should be copied from the AIPS
terminal into a file. Note that before running LISTR you must create
an NX table with AIPS task INDXR. EXAMPLE input
file MRK527.listr for mkuvfnd. DO NOT include the part of
the LISTR output with the RA and DEC of the source in this file!
The program mkuvfnd takes four arguments in the following prescribed order:
- 1. Source name (in the example this may be any one of '3C48', '2223-052', or 'MRK527'
- 2. Start frequency channel
- 3. End frequency channel
- 4. Filename with the output of LISTR.
The output should be redirected to and AIPS runfile name, e.g. GET3C48.06D for AIPS user 229.
The proper calling sequence to extract the data for 3C48 from channels
34 to 56 is then (follow the links to inspect the files in this example)
mkuvfnd 3C48 34 56 MRK527.listr > GET3C48.06D
It is highly recommended to start with a single channel (set start end
end channels to the same value). The output of the program
mkuvfnd is here redirected to the AIPS runfile
GET3C48.06D. Within AIPS, set the inname etc. to the correct UV data
set with the verb GETNAME. Then type
run GET3C48
This will
generate 23 files in the current directory with the names 3C48.34,
3C48.35, ... , 3C48.56, each containing the data for 3C48 in the
channels 34 through 56. If the dataset is large, selecting and writing
the data may take a long time.
This procedure should be repeated for each source, so
mkuvfnd 2223-052 34 56 MRK527.listr > GET2223.06D
mkuvfnd MRK527 34 56 MRK527.listr > GETM527.06D
Note that AIPS puts strong restrictions on the number of characters in filenames.
Step 2: search for bad data
It is assumed that the data were extracted according to Step 1. These
data will be searched for irregularities in a number of ways. The
program that does this is called borg (baseline offset
recognition gadget). This program takes arguments in the following prescribed order
(further explained below):
- 1. data filename
- 2. polarization (RR or LL)
- 3. minimum amplitude threshold
- 4. clip level
- 5. list of antennas that may be skipped
- 6. verbosity level
The borg program may run in several modes, depending on the values
of the input parameters. For the example above (3C48 at 327 MHz, the
borg program may be used with the following calling
sequence:
borg 3C48.34 RR 0.070 0.010 " 1 12 " 11
and
borg 3C48.34 LL 0.070 0.010 " 1 12 " 11
This gives an impression of reasonable values. The antennas 1 and 12
(these are the antenna numbers used in AIPS) are known to be dead at the
time of observation. Discarding them saves time. Use " none " (with the quotes)
if all antennas must be checked. The verbosity level (input parameter 6)
sets the amount of screen output. 11 is recommended, and other values are not
discussed here (higher means more screen output).
The borg program itself may be used to find out which values
for parameters 3 and 4 are reasonable. As an example, we'll try it on
the phase calibrator (2223-052) of the example above. Set parameter 3
to 0.0001 and parameter 4 to 1.
IMPORTANT: If the minimum amplitude threshold is set to a value
less than or equal to 0, the borg program will operate in a different mode,
that is in general useful for faint sources, but not for the calibrators!
To find out which values are appropriate for the phase calibrator, try
borg 2223-052.34 RR 0.0001 1.0 " none " 11
The borg program looks through the data. For each pair of antennas,
it identifies separate scans and finds the modal amplitude for the scan.
The modal amplitude is the amplitude that represents most of the data points
in the scan. It is derived from a frequency histogram. The modal amplitude is
not affected by points with a deviant amplitude, unlike the mean or even the
median amplitude.
Some of the screen output is given here. The statistics of each scan is
printed per baseline. All of this is for one of two polarizations
(RR). The timeranges of the scans are also indicated. Antenna 1 is
dead, so there is virtually no signal in any baseline involving this
antenna. Baseline 2-3 has amplitude 0.052 and r.m.s. 0.00256 (a
significant part of this is due to the limited floating point
accuracy). The "Mean" reported here is actually a mean of points
around the modal value, in order to circumvent numerical problems.
In this case, it was decided to discard all baselines with amplitude
less than 0.010, and to flag any point that deviates more than 0.009
(approximately 3 sigma) from the modal amplitude. Typically, the lower
amplitude threshold (borg input parameter 3) may be chosen as
half the amplitude of a typical baseline. The present choice is on the
low side, but inspection by 'UVPLT' showed that the quality of the faint
baselines was good. Most baselines are significantly stronger.
The list of data points to be flagged in polarization RR is now created
by the command
borg 2223-052.34 RR 0.01 0.009 " 1 12 " 11
In fact, the borg program creates two outputfiles. One is the
primary flaglist. It contains points that were rejected by the criteria
set above (input parameters 3 and 4).
In practice, it becomes larious to run the borg program in this way.
It is much more convenient to make a list of files
(this example will check source 2223-052 in channels 34 through 45)
that should be checked. Then, call the program mkchkjob in the following way
(follow the links to view the files involved in this example):
mkchkjob 0.01 0.009 " 1 12 " 11 < filelist.txt > runjob
The "<" and ">" are unix input/output redirection characters.
To start the process, type on the unix prompt:
source runjob
The program will start running for a while. The screen output is mostly the same as
discussed above, but there will be a few extra process-related messages.
It is assumed that the ASCII data files are in the same directory. This procedure is not only
easier to run on many channels, it is also significantly faster.
It is assumed that the datafiles are compressed with gzip, but there is no problem
with files that were not compressed. When a file is no longer required, it will be
compressed with gzip. If this does not work, gzip is not available on your machine, or
the correct path is not known.
Inspection of the file runjob shows that a sorting routine is called. You
should not normally have to do this yourself. If you find any files with extension .sort,
and no process is running, the job did not end properly. This may happen during a system crash
etc. In such a case, delete ONLY the .sort files and any output files related to the frequency
channel that the .sort files relate to (see header). The data will be checked channel
by channel in the order of the file filelist.txt. Previously checked
channels were are unaffected by the crash.