Flagging GMRT data


Introduction

The aim of the programs described below is to allow flagging of bad data points in GMRT interferometer data. Data editing itself is done within the AIPS environment with the task UVFLG. The identification of bad data is done outside the AIPS environment with more sophisticated algorithms than currently available within AIPS. For more information about the AIPS tasks, please use the AIPS verbs HELP and EXPLAIN

The data flagging pipeline for GMRT data is set up as follows:



AIPS RUNFILES
An important concept to take work out of the hands of the user are AIPS runfiles. For a description, type 'HELP RUN' in AIPS. An AIPS runfile can be modified with any text editor. These are the important points to remember about runfiles: The steps 1 and 3 mentioned above are carried out by (large) AIPS runfiles that will be generated by programs described below.

Step 1: exporting data from AIPS

The data is written into ASCII tables with the AIPS task UVFND (STOKES='CORR'). The big advantage is that this human-readable format makes the procedure very transparent. There are two disadvantages. Neither of these is a fundamental problem though.
1. UVFND will not print more than approximately 32000 lines at a time.
2. The floating format for amplitude is fixed to ff.fff. This is sufficient for calibrators before flux calibration, but the target source must be flux calibrated first.
As a result, it is necessary to complete flagging on the calibrators, do the flux calibration, and then search the source data for bad points. This is only necessary because of the poor numerical format in the output of UVFND. None of the algorithms that detect bad data assume flux calibration.
Frequency channels are considered separately. All of the UV data for a particular source (calibrator or target source) and a particular channel will be stored in a single data file. The flagging procedure is the same whether a single channel or a hundred channels are considered. A hundred channels will require a hundred ASCII files per source.
The extraction of the data is fully automated. The program that generates the required runfile is called mkuvfnd. This program requires the output of the AIPS task LISTR, with the keyword OPTYPE='SCAN'. The required output should be copied from the AIPS terminal into a file. Note that before running LISTR you must create an NX table with AIPS task INDXR. EXAMPLE input file MRK527.listr for mkuvfnd. DO NOT include the part of the LISTR output with the RA and DEC of the source in this file!

The program mkuvfnd takes four arguments in the following prescribed order:

The output should be redirected to and AIPS runfile name, e.g. GET3C48.06D for AIPS user 229. The proper calling sequence to extract the data for 3C48 from channels 34 to 56 is then (follow the links to inspect the files in this example)
mkuvfnd 3C48 34 56 MRK527.listr > GET3C48.06D
It is highly recommended to start with a single channel (set start end end channels to the same value). The output of the program mkuvfnd is here redirected to the AIPS runfile GET3C48.06D. Within AIPS, set the inname etc. to the correct UV data set with the verb GETNAME. Then type
run GET3C48
This will generate 23 files in the current directory with the names 3C48.34, 3C48.35, ... , 3C48.56, each containing the data for 3C48 in the channels 34 through 56. If the dataset is large, selecting and writing the data may take a long time.
This procedure should be repeated for each source, so
mkuvfnd 2223-052 34 56 MRK527.listr > GET2223.06D
mkuvfnd MRK527 34 56 MRK527.listr > GETM527.06D
Note that AIPS puts strong restrictions on the number of characters in filenames.

Step 2: search for bad data

It is assumed that the data were extracted according to Step 1. These data will be searched for irregularities in a number of ways. The program that does this is called borg (baseline offset recognition gadget). This program takes arguments in the following prescribed order (further explained below): The borg program may run in several modes, depending on the values of the input parameters. For the example above (3C48 at 327 MHz, the borg program may be used with the following calling sequence:
borg 3C48.34 RR 0.070 0.010 " 1 12 " 11
and
borg 3C48.34 LL 0.070 0.010 " 1 12 " 11

This gives an impression of reasonable values. The antennas 1 and 12 (these are the antenna numbers used in AIPS) are known to be dead at the time of observation. Discarding them saves time. Use " none " (with the quotes) if all antennas must be checked. The verbosity level (input parameter 6) sets the amount of screen output. 11 is recommended, and other values are not discussed here (higher means more screen output).

The borg program itself may be used to find out which values for parameters 3 and 4 are reasonable. As an example, we'll try it on the phase calibrator (2223-052) of the example above. Set parameter 3 to 0.0001 and parameter 4 to 1.

IMPORTANT: If the minimum amplitude threshold is set to a value less than or equal to 0, the borg program will operate in a different mode, that is in general useful for faint sources, but not for the calibrators!

To find out which values are appropriate for the phase calibrator, try borg 2223-052.34 RR 0.0001 1.0 " none " 11

The borg program looks through the data. For each pair of antennas, it identifies separate scans and finds the modal amplitude for the scan. The modal amplitude is the amplitude that represents most of the data points in the scan. It is derived from a frequency histogram. The modal amplitude is not affected by points with a deviant amplitude, unlike the mean or even the median amplitude.

Some of the screen output is given here. The statistics of each scan is printed per baseline. All of this is for one of two polarizations (RR). The timeranges of the scans are also indicated. Antenna 1 is dead, so there is virtually no signal in any baseline involving this antenna. Baseline 2-3 has amplitude 0.052 and r.m.s. 0.00256 (a significant part of this is due to the limited floating point accuracy). The "Mean" reported here is actually a mean of points around the modal value, in order to circumvent numerical problems.
In this case, it was decided to discard all baselines with amplitude less than 0.010, and to flag any point that deviates more than 0.009 (approximately 3 sigma) from the modal amplitude. Typically, the lower amplitude threshold (borg input parameter 3) may be chosen as half the amplitude of a typical baseline. The present choice is on the low side, but inspection by 'UVPLT' showed that the quality of the faint baselines was good. Most baselines are significantly stronger.
The list of data points to be flagged in polarization RR is now created by the command
borg 2223-052.34 RR 0.01 0.009 " 1 12 " 11

In fact, the borg program creates two outputfiles. One is the primary flaglist. It contains points that were rejected by the criteria set above (input parameters 3 and 4).

In practice, it becomes larious to run the borg program in this way. It is much more convenient to make a list of files (this example will check source 2223-052 in channels 34 through 45) that should be checked. Then, call the program mkchkjob in the following way (follow the links to view the files involved in this example):

mkchkjob 0.01 0.009 " 1 12 " 11 < filelist.txt > runjob
The "<" and ">" are unix input/output redirection characters. To start the process, type on the unix prompt:
source runjob
The program will start running for a while. The screen output is mostly the same as discussed above, but there will be a few extra process-related messages. It is assumed that the ASCII data files are in the same directory. This procedure is not only easier to run on many channels, it is also significantly faster. It is assumed that the datafiles are compressed with gzip, but there is no problem with files that were not compressed. When a file is no longer required, it will be compressed with gzip. If this does not work, gzip is not available on your machine, or the correct path is not known.

Inspection of the file runjob shows that a sorting routine is called. You should not normally have to do this yourself. If you find any files with extension .sort, and no process is running, the job did not end properly. This may happen during a system crash etc. In such a case, delete ONLY the .sort files and any output files related to the frequency channel that the .sort files relate to (see header). The data will be checked channel by channel in the order of the file filelist.txt. Previously checked channels were are unaffected by the crash.