Data file formats

Note: This is documentation for version 4.3 of Source. For a different version of Source, select the relevant space by using the Spaces menu in the toolbar above

Data file formats

This section provides an overview of the file formats supported by Source. Table 1 lists the supported time-series data file formats. Raster data file formats are listed in Table 2. Several GIS, graphics and other formats that are also recognised by Source are listed in Table 3 but are not otherwise described in this guide. Click on the link associated with each file extension to go directly to information about that time series.

Note: Formats with the ** symbol are part of the GDAL raster formats. A complete list of these is provided here.

 

Table 1. Text-based time-series data file formats
Table 2. Text-based raster data file formats

File extension

Description

Table 3. Other supported file formats

File extension

Description

File extension

Description

.SDB

Source Database

.FLT

ESRI Binary Raster Interchange format

.JPG

GEO JPG Image (also .JPEG), and must have an associated .jgw world file

.MIF

MapInfo Interchange

.SHP**

ESRI Shape files

.TIF**

GeoTIFF Image (also .TIFF)

.TILE

Tiled Raster Files

.TNE

Tarsier Node Link Network Files

.TRA

Tarsier Raster Files

.TSD

Tarsier Sites Data Files

.ADF**

ArcINFO/ESRI Binary Grid

.IMG**

ERDAS Imagine

Note: Source will warn you if you import data containing negative numbers. Also, the presence of any zero values in the data stream will hamper your ability to adjust the Y-axis to show log values in the Charting Tool.

Annual stochastic time series

The .AR1 format contains replicates of annual time-series data generated using the AR(1) stochastic method. The file format is shown in Table 4. This format is not the same as the AR(1) format (.GEN) generated and exported by the Stochastic Climate Library.

Table 4. AR1 data file format

Row

Column (space-separated)

1

2

3..nypr

1

desc

 

 

2

nypr

nr

 

odd

rn

 

 

even

value

value

value

where:

desc is a title describing the collection site

nypr is the number of years per replicate

nr is the number of replicates

rn is the replicate number in the range 1..nr

value is one of the nypr data points per row for the replicate, to three decimal places.

ESRI ASCII grids

The .ASC format is a space delimited grid file, with a 6 line header as shown in Table 5. Values are not case sensitive and arranged in space delimited rows and columns, reflecting the structure of the grid. Units for cell size length depend on the input data, and could be either geographic (eg degrees) or projected (eg metres, kilometres). Units are generally determined by the application, with metres (m) being common for most TIME-based applications. For a file format description, refer to:

http://resources.esri.com/help/9.3/arcgisengine/com_cpp/gp_toolref/spatial_analyst_tools/esri_ascii_raster_format.htm

Arcinfo grid coverages can be converted to .ASC files using ESRI’s GRIDASCII command. ASC files can be imported into ArcGIS using the ASCIIGRID command.

Table 5. .ASC data file format

Row

Column (space-delimited)

1

2

3..n

1

ncols

nc

 

2

nrows

nr

 

3

xref

x

 

4

yref

y

 

5

cellsize

size

 

6

nodata_value

sentinel

 

7..n

value

value

value

where:

nc is the number of columns

nr is the number of rows

xref is either XLLCENTER (centre of the grid) or XLLCORNER (lower left corner of grid)

yref is either YLLCENTER (centre of the grid) or YLLCORNER (lower left corner of grid)

(x,y) are the coordinates of the origin (by centre or lower left corner of the grid)

size is the cell side length

sentinel is a null data string (eg -9999)

value is a data point. There should be nc × nr data points.

AWBM daily time series

An AWBM daily time series format file (.AWB) is an ASCII text file containing daily time series data formatted as shown in Table 6. Dates (the year and month) were optional in the original AWBM file format, but are not optional in the format used in Source.

Table 6. AWB data file format

Row

Column (space-separated)

1

2..ndays+1

ndays+2

ndays+3

1..n

ndays

value

year

month

where:

ndays is the number of days in the month (28..31)

value is the data point corresponding with a given day in the month (ie. ndays columns)

year is the year of observation (four digits)

month is the month of observation (one or two digits).

SWAT BSB time series

A .BSB is a line-based fixed-format file, typically used by applications written in FORTRAN. The SWAT BSB subbasin output file contains summary information for each of the subbasins in a watershed. The reported values for the variables are the total amount or weighted average of all hydrological response units (HRUs) within the subbasin. The format is shown in Table 7.  For more details, refer to the SWAT 2012 input/output manual (Arnold et al., 2012).

The .BSB file format specifies data time step numbers, but not dates. When imported into Source via File Data Sources, the user has the opportunity to manually set the correct data start date.

Table 7. .BSB data file format (first 7 data columns only). The .BSB format also includes an 8 line header, which is not shown.

Row

Character positions (space delimited)

7..10

12..19

21..24

25..34

35..44

45..54

55..64

1

SUB

GIS

MON

AREAkm2

PRECIPmm

SNOMELTmm

PETmm

2..n

id

gis

mon

area

precip

snomelt

pet

where:

id is the basin identifier (4-digit integer, left aligned, e.g. "1")

gis is the GIS value (8-digit integer, right-aligned, eg. "1")

month is the month (or day of year for daily data) of observation (4-digit integer, right-aligned, eg. "0")

area is the basin area in square kilometers (real, right aligned, eg "1.14170E+02")

precip is the basin precipitation in millimetres (real, right aligned, eg "1.2000").

snomelt is the basin snow melt in millimetires (real, right aligned, eg "0.111E+01")

pet is the basin potential evapotranspiration (PET) in millimetres (real, right aligned, eg "0.900E+01")

BOM 6 minute time series

A .BSM (also .PLUV) is a fixed-format file, typically supplied by the Australian Bureau of Meteorology for 6 minute pluviograph data. The file has two header lines (record types 1 and 2) followed by an arbitrary number of records of type 3. The formats of record types 1..3 are shown in Table 8Table 9 and Table 10, respectively.

All fields in .BSM files use fixed spacing when supplied, but Source can also read spaced-separated values.

Rainfall data points:

  • Each row of data contains all of the observations for that day;

  • The number of observations for a day depends on the observation interval. For example, if the observation interval is 6 minutes, there will be 24×60÷6=240 observations (raini fields) in each row of data;

  • Each rain field is in FORTRAN format F7.1 (a field width of seven bytes with one decimal place);

  • Assuming that observations are numbered from 1..n, the starting column position of any given raini field can be computed from 14+7×i;

  • The unit of measurement is tenths of a millimetre (eg. a rainfall of 2 mm will be encoded as "20.0").

  • Values are interpreted as follows:

    • 0.0 means there was no rain during the interval.

    • a positive non-zero value is the observed rainfall, in tenths of a millimetre, during the interval.

    • If there is zero rain for the whole day, no record is written for that day.

Missing data:

  • A sentinel value of -9999.0 means that no data is available for that interval;

  • A sentinel value of -8888.0 means that rain may have fallen during the interval but the total is known only for a period of several intervals. This total is entered as a negative value in the last interval of the accumulated period. For example, the following the following pattern would show that a total of 2 millimetres of rain fell at some time during an 18-minute period: -8888.0-8888.0 -20.0

  • If an entire month of data is missing, either no records are written or days filled with missing values (-9999.0) are written. No attempt is made to write dummy records if complete years of data are missing.

Example file

61078 1

61078 2 WILLIAMTOWN RAAF

61078 19521231 .0 .0 .0 [etc., 240 values]

61078 1953 1 1 .0 .0 .0 [etc., 240 values]

61078 1953 1 3 .0 .2 .0 [etc., 240 values]

61078 1953 115 .0 .0 .2 [etc., 240 values]

61078 1953 118 .0 .0 .0 [etc., 240 values]

61078 1953 212 .0 .0 .0 [etc., 240 values]

61078 1953 213 .0 .0 .0 [etc., 240 values]

61078 1953 214 .0 .0 .0 [etc., 240 values]

61078 19521231 .0 .0 .0 [etc., 240 values]

61078 19521231 .0 .0 .0 [etc., 240 values]

The following notes are taken from the Bureau of Meteorology advice:

  • All data available in the computer archive are provided. However very few sites have uninterrupted historical record, with no gaps. Such gaps or missing data may be due to many reasons from illness of the observer to a broken instrument. A site may have been closed, reopened, upgraded or downgraded during its existence, possibly causing breaks in the record of any particular element.

  • Final quality control for any element usually occurs once the manuscript records have been received and processed, which may be 6-12 weeks after the end of the month. Thus quality-controlled data will not normally be available immediately, in "real time".

Table 8. .BSM data file format (record type 1)

Row

Character positions (space padded)

1..16

7..15

16

17..n

1..n

snum

blank

1

blank

where:

snum is the station number

blank ASCII space characters

Table 9. .BSM data file format (record type 2)

Row

Character positions (space padded)

1..6

7..12

13..16

17..18

19..20

21..n

1..n

snum

blank

year

month

day

{raini...}

where:

snum is the station number

year is the year of the observation (four digits)

month is the month of the observation (one or two digits, right-aligned, space padded)

day is the date of the observation (one or two digits, right-aligned, space padded)

raini is a rainfall data point as explained below.

Comma delimited time series

A .CDT comma delimited time-series format file is an ASCII text file that contains regular (periodic) time-series data. The file type commonly has no header line but, if required, it can support a single line header of "Date,Time series 1".

You can use the .CDT format to associate observations with a variety of time interval specifications. Table 10 shows how to structure annual data, Table 11 how to specify daily data aggregated at the monthly level, and Table 12 the more traditional daily time series (one date, one observation). Table 13 explains how to supply data in six-minute format.

Table 10. .CDT data file format (annual time series)

Row

Column (comma-separated)

1

2

1..n

year

value

where:

year is the year of observation (four digits, eg. 2011)

value is the observed value (eg. 9876).

Table 11. .CDT data file format (time series with monthly data)

Row

Column (comma-separated)

1

2

1..n

mm/yyyy

value

where:

mm is the month of observation (two digits, eg. 09)

yyyy is the year of observation (four digits, eg. 2011)