Data file formats

This section provides an overview of the file formats supported by Source. Table 5 lists the supported time-series data file formats. Raster data file formats are listed in Table 6. Several GIS, graphics and other formats that are also recognised by Source are listed in Table 7 but are not otherwise described in this guide.

Note: Formats with the ** symbol are part of the GDAL raster formats. A complete list of these is provided here.

Table 5. Text-based time-series data file formats

File extension	Description
.AR1	Annual stochastic time series
.AWB	AWBM daily time series
.BSB	SWAT BSB time series
.BSM	BoM 6 minute time series
.CDT	Comma delimited time series
.CSV	Comma-separated value
.DAT	F.Chiew time series
.IQQM	IQQM time series
.MRF	MFM monthly rainfall files
.PCP	SWAT daily time series
.SDT	Space delimited time series
.SILO5	SILO 5 time series
.SILO8	SILO 8 time series
.TTS	Tarsier daily time series

Table 6. Text-based raster data file formats

File extension	Description
.ASC**	ESRI ASCII grids
.MWASC	Map window ASCII grids
.TAPESG	Grid-based Terrain Analysis Data

Table 7. Other supported file formats

File extension	Description
.FLT	ESRI Binary Raster Interchange format
.JPG	GEO JPG Image (also .JPEG), and must have an associated .jgw world file
.MIF	MapInfo Interchange
.SHP**	ESRI Shape files
.TIF**	GeoTIFF Image (also .TIFF)
.TILE	Tiled Raster Files
.TNE	Tarsier Node Link Network Files
.TRA	Tarsier Raster Files
.TSD	Tarsier Sites Data Files
.ADF**	ArcINFO/ESRI Binary Grid
.IMG**	ERDAS Imagine

Annual stochastic time series

The .AR1 format contains replicates of annual time-series data generated using the AR(1) stochastic method. The file format is shown in Table 8. This format is not the same as the AR(1) format (.GEN) generated and exported by the Stochastic Climate Library.

Table 8. AR1 data file format

Row	Column (space-separated)
Row	1	2	3..nypr
1	desc
2	nypr	nr
odd	rn
even	value	value	value

where:

desc is a title describing the collection site

nypr is the number of years per replicate

nr is the number of replicates

rn is the replicate number in the range 1..nr

value is one of the nypr data points per row for the replicate, to three decimal places.

ESRI ASCII grids

The .ASC format is a space delimited grid file, with a 6 line header as shown in Table 9. Values are not case sensitive and arranged in space delimited rows and columns, reflecting the structure of the grid. Units for cell size length depend on the input data, and could be either geographic (eg degrees) or projected (eg metres, kilometres). Units are generally determined by the application, with metres (m) being common for most TIME-based applications. For a file format description, refer to:

http://resources.esri.com/help/9.3/arcgisengine/com_cpp/gp_toolref/spatial_analyst_tools/esri_ascii_raster_format.htm

Arcinfo grid coverages can be converted to .ASC files using ESRI’s GRIDASCII command. ASC files can be imported into ArcGIS using the ASCIIGRID command.

Table 9. .ASC data file format

Row	Column (space-delimited)
Row	1	2	3..n
1	ncols	nc
2	nrows	nr
3	xref	x
4	yref	y
5	cellsize	size
6	nodata_value	sentinel
7..n	value	value	value

where:

nc is the number of columns

nr is the number of rows

xref is either XLLCENTER (centre of the grid) or XLLCORNER (lower left corner of grid)

yref is either YLLCENTER (centre of the grid) or YLLCORNER (lower left corner of grid)

(x,y) are the coordinates of the origin (by centre or lower left corner of the grid)

size is the cell side length

sentinel is a null data string (eg -9999)

value is a data point. There should be nc × nr data points.

AWBM daily time series

An AWBM daily time-series format file (.AWB) is an ASCII text file containing daily time-series data formatted as shown in Table 10. Dates (the year and month) were optional in the original AWBM file format, but are not optional in the format used in Source.

Table 10. AWB data file format

Row	Column (space-separated)
Row	1	2..ndays+1	ndays+2	ndays+3
1..n	ndays	value	year	month

where:

ndays is the number of days in the month (28..31)

value is the data point corresponding with a given day in the month (ie. ndays columns)

year is the year of observation (four digits)

month is the month of observation (one or two digits).

SWAT BSB time series

A .BSB is a line-based fixed-format file, typically used by applications written in FORTRAN. The header line gives the fields for the file with subsequent lines providing data for each basin to be used for each time-step. The format is shown in Table 11. For more details refer to the SWAT manual.

Table 11. .BSB data file format

Row	Character positions (space added)
Row	1..8	10..12	14..21	23..36	38..46
1	SUB	GIS	MON	AREAkm2	PRECIPmm
2..n	id	gis	mon	area	precip

where:

id is the basin identifier (both SUB and the id are text, left-aligned)

gis is the GIS value (integer, right-aligned, eg. "1")

month is the month of observation (integer, right-aligned, eg. "0")

area is the basin area in square kilometers (real, right aligned, eg "1.14170E+02")

precip is the basin precipitation in millimetres (real, right aligned, eg "1.2000").

BOM 6 minute time series

A .BSM (also .PLUV) is a fixed-format file, typically supplied by the Australian Bureau of Meteorology for 6 minute pluviograph data. The file has two header lines (record types 1 and 2) followed by an arbitrary number of records of type 3. The formats of record types 1..3 are shown in Table 12, Table 13 and Table 14, respectively.

All fields in .BSM files use fixed spacing when supplied, but Source can also read spaced-separated values.

Rainfall data points:

Each row of data contains all of the observations for that day;
The number of observations for a day depends on the observation interval. For example, if the observation interval is 6 minutes, there will be 24×60÷6=240 observations (raini fields) in each row of data;
Each rain field is in FORTRAN format F7.1 (a field width of seven bytes with one decimal place);
Assuming that observations are numbered from 1..n, the starting column position of any given raini field can be computed from 14+7×i;
The unit of measurement is tenths of a millimetre (eg. a rainfall of 2 mm will be encoded as "20.0").
Values are interpreted as follows:
- 0.0 means there was no rain during the interval.
- a positive non-zero value is the observed rainfall, in tenths of a millimetre, during the interval.
- If there is zero rain for the whole day, no record is written for that day.

Missing data:

A sentinel value of -9999.0 means that no data is available for that interval;
A sentinel value of -8888.0 means that rain may have fallen during the interval but the total is known only for a period of several intervals. This total is entered as a negative value in the last interval of the accumulated period. For example, the following the following pattern would show that a total of 2 millimetres of rain fell at some time during an 18-minute period: -8888.0-8888.0 -20.0
If an entire month of data is missing, either no records are written or days filled with missing values (-9999.0) are written. No attempt is made to write dummy records if complete years of data are missing.

Example file

61078 1

61078 2 WILLIAMTOWN RAAF

61078 19521231 .0 .0 .0 [etc., 240 values]

61078 1953 1 1 .0 .0 .0 [etc., 240 values]

61078 1953 1 3 .0 .2 .0 [etc., 240 values]

61078 1953 115 .0 .0 .2 [etc., 240 values]

61078 1953 118 .0 .0 .0 [etc., 240 values]

61078 1953 212 .0 .0 .0 [etc., 240 values]

61078 1953 213 .0 .0 .0 [etc., 240 values]

61078 1953 214 .0 .0 .0 [etc., 240 values]

61078 19521231 .0 .0 .0 [etc., 240 values]

61078 19521231 .0 .0 .0 [etc., 240 values]

The following notes are taken from the Bureau of Meteorology advice:

All data available in the computer archive are provided. However very few sites have uninterrupted historical record, with no gaps. Such gaps or missing data may be due to many reasons from illness of the observer to a broken instrument. A site may have been closed, reopened, upgraded or downgraded during its existence, possibly causing breaks in the record of any particular element.
Final quality control for any element usually occurs once the manuscript records have been received and processed, which may be 6-12 weeks after the end of the month. Thus quality-controlled data will not normally be available immediately, in "real time".

Table 12. .BSM data file format (record type 1)

Row	Character positions (space padded)
Row	1..16	7..15	16	17..n
1..n	snum	blank	1	blank

where:

snum is the station number

blank ASCII space characters

Table 13. .BSM data file format (record type 2)

Row	Character positions (space padded)
Row	1..6	7..12	13..16	17..18	19..20	21..n
1..n	snum	blank	year	month	day	{rain_i...}

where:

snum is the station number

year is the year of the observation (four digits)

month is the month of the observation (one or two digits, right-aligned, space padded)

day is the date of the observation (one or two digits, right-aligned, space padded)

rain_i is a rainfall data point as explained below.

Comma delimited time series

A .CDT comma delimited time-series format file is an ASCII text file that contains regular (periodic) time-series data. The file type commonly has no header line but, if required, it can support a single line header of "Date,Time series 1".

You can use the .CDT format to associate observations with a variety of time interval specifications. Table 15 shows how to structure annual data, Table 16 how to specify daily data aggregated at the monthly level, and Table 17 the more traditional daily time series (one date, one observation). Table 18 explains how to supply data in six-minute format.

Table 15. .CDT data file format (annual time series)

Row	Column (comma-separated)
Row	1	2
1..n	year	value..n

where:

year is the year of observation (four digits, eg. 2011)

value is the observed value (eg. 9876).

Table 16. .CDT data file format (time series with monthly data)

Row	Column (comma-separated)
Row	1	2
1..n	mm/yyyy	value

where:

mm is the month of observation (two digits, eg. 09)

yyyy is the year of observation (four digits, eg. 2011)

value is the observed value (eg. 2600).

Table 17. .CDT data file format (daily time series with daily data)

Row	Column (comma-separated)
Row	1	2
1..n	date	value

where:

date is the date of observation in ISO format (eg. 2000-12-31)

value is the observed value (eg. 2600).

Table 18. .CDT data file format (six-minute time series)

Row	Column (comma-separated)
Row	1	2	3..n
1..n	date	time	value

where:

date is the date of observation in ISO format (eg. 2000-12-31)

time is the time of observation in hours and minutes (eg 23:48)

value is the observed value (eg. 10).

Comma-separated value

A comma separated value or .CSV file is an ASCII text file that contains data in a variety of representations. When a .CSV contains regular (periodic) time-series data, there are at least two columns of data. The first contains a time-stamp and the remaining columns contain data points associated with the time-stamp. The format is shown in Table 19. All columns are separated using commas. Annual data can be entered using the notation 01/yyyy, where yyyy is a year. Header lines in .CSV files are usually optional.

Table 19. .CSV data file format

Row	Column (comma-separated)
Row	1	2..n
1	Date	desc
2..n	date	value

where:

desc is a title for the column (header rows are often optional)

date is a date in ISO 8601 format ("yyyy-MM-dd HH:mm:ss" where " HH:mm:ss" is optional)

value is a data point (eg a real number with one decimal place)

F.Chiew time series

A .DAT is a two-column daily time-series file with the fixed format shown in Table 20. Note that the first two characters in each line are always spaces with the data starting at the third character position.

Table 20. .DAT data file format

Row	Character positions (space padded)
Row	1..2	3..6	7..8	9..10	12..20
1..n	blank	year	month	day	value

where:

blank is ASCII space characters

year is the year of the observation (four digits)

month is the month of the observation (one or two digits, right-aligned, space padded)

day is the date of the observation (one or two digits, right-aligned, space padded)

value is the data point (real, two decimal places, right aligned, eg "1.20").

IQQM time series

An .IQQM time-series format file is an ASCII text file that contains daily, monthly or annual time-series data. The file has a five line header formatted as shown in Table 21. The header is followed by as many tables as are needed to describe the range delimited by fdate..ldate. The format of each table is shown in Table 22.

Each value is right-justified in 7 character positions with one leading space and one trailing quality indicator. In other words, there are five character positions for digits which are space-filled and right-aligned. The first value in each row (ie the observation for the first day of the month) occupies character positions 5..11. The second value occupies character positions 12..18, the third value positions 19..25, and so on across the row. In months with 31 days, the final value occupies character positions 215..221. The character positions corresponding with non-existent days in a given month are entirely blank. The mtotal and ytotal fields can support up to 8 digits. Both are space-filled, right-aligned in character positions 223..230.

The quality indicators defined by IQQM are summarised in Table 23. At present, Source does not act on these quality indicators.

Missing data points are generally represented as "-1?". A value is also considered to be a missing data point if it is expressed as a negative number and is not followed by either an "n" or "N" quality indicator.

Divider lines consist of ASCII hyphens (0x2D), beginning in character position 5 and ending at position 231.

Example file

Title: Meaningful title     Date:06/08/2001 Time:11:38:25.51
Site : Dead Politically Correct Person's Creek
Type : Flow
Units: ML/d
Date : 01/01/1898 to 30/06/1998      Interval : Daily
Year:1898
     ------------------------------------ ------------------------------------
       01    02    03    04   05    06  ...  28    29    30    31     Total
     ------------------------------------ ------------------------------------
Jan    3     4     3     4    3      4        2    3     2     3       224
Feb    2     3     2     3    2      3        2                        134
Mar    3     22    4     2    2      2        1    2     1     2       84
Apr    1     2     1     2    1      2        1    1     1             37
May    1     1     4     3    53     33       1    1     1     1       143
Jun    1     1     0     1   -1?     7        63   58    52            816
Jul    48    43    40    36   33     30       77   70    63    59      1389
Aug    54    49    46    41   39     35       30   28    26    420     2433
Sep   880   362   282   256  245     215      241  39    36            4414
Oct    35    33    31    31   29     28       22   28    20    17      783
Nov    15    16    15    18   16     15       11   12    11            415
Dec    12    11    11    11   11     10       9     8    9     8       422
----------------------------------------- ------------------------------------
                                                                       11294