Time Series Data Utility - Source.DataUtility.exe

Source.DataUtility.exe is a command line utility for interacting with time series data files. It is distributed with Source and found in the same folder as the other executables.

Description and Rationale

'TimeseriesCycleCreator' tool was developed for use in Run Manager for doing replicate runs. It depends on TIME timeseries libraries and thus adds a dependency on Source. As timeseries formats can change between versions of Source it would be beneficial to bring the tool into Source rather than having a single static version built against a single version of Source. Additionally the tool has been used for generating replicates for other projects outside of Run Manager. Therefore, the functionality is included in the 'Source.DataUtility' command line tool. The tool enables the user to avail the functionality without opening the Source user interface. 

Functionality

In adding the cycling functionality to 'Source.DataUtility', it allows the existing functionalities to be extended to work with multiple files with a directory and optionally its sub-directories. Combining with the Source.DataUtility also meant that the cycling functionality could take advantage of the existing functionalities in the DataUtility, for example only cycling a sub-series (very useful to make sure all timeseries cover the same period) or extracting only certain timeseries from the input file(s), and supporting all TIME timeseries formats.

The same functionality is now fully incorporated into Source and the full description of the functionality can be viewed in 'Replicate Analysis'. 

All use cases start with an input time series file (-i argument). This can be any time series format supported by Source (e.g. res.csv, .sdb, ...)

From that you can:

  • output to another file, possibly of a different format (selected by file extension). (-o argument)
  • restrict output to one or more specific time series, by DisplayName (-k argument) 
  • Extract only part of any given time series (--startTime and --endTime arguments)
  • List the timeseries and metadata contained in the input timeseries file (-l argument)
  • List supported time series file formats (--listFormats argument)
Example

Suppose, there is a Source project data source or recorder that a user wants to replicate, the data in 'res.csv' in an input directory can be called from the Command Prompt and output the results (replicates) to a given output older. 

An example  syntax is given as: 

C:\Users\USER\Downloads\FOLDER> .\Source.DataUtility.exe --inputDir "D:\Replicate\Test\Demand" --startCycleDate 01/01/1890 --startReportYear 1890 --endReportYear 1899 --inputExt .res.csv --outputExt .res.csv --increment 1 --replicates 10 --outputDir "C:\Users\USER\Desktop\Output" --key "Inflow 1>Downstream Flow"

As per the above command, the input time series file in the form of 'res.csv' located in the input directory 'D:\Replicate\Test\Demand' is cycled from the start cycle date of 01/01/1890 and reported against dates between 01/01/1890 and 31/12/1899 with an increment of 1 year. The 10 replicates are output to the folder 'Output' given by the output directory. The term 'key' refers to the data column in the input 'res.csv' file which is to be cycled. 

The output folder will have 10 folders, each for one replicate. For example, the first folder will be named '1', and will have  the metadata and three columns with reported date (in yyyy-mm-dd format) , data and the original data year. For example, the replicate '1' will have a file with reported date starting as '1890-01-01' and corresponding data from 1890-01-01 and so on until 1899-12-31. For the second replicate '2', it starts with the data corresponding to 1891-01-01 reported against the reported date of 1890-01-01 and so on. For the 10th replicate '10', it starts with the data corresponding to 1899-01-01  reported against 1890-01-01 and so on. 

For more details about the concept and utility behind the TimeseriesCycleCreator tool, users are recommended to go through 'Replicate Analysis'.