TOMS
Developed by Timothy Smith
TOMS (Timeseries Output Management System) is a simple system to link a Source output file to the run configuration that created it.
It is not a plugin, it is an example application to help with model development automation.
It does depend on a Plugin to write provenance (eg the ExampleProvenance plugin).
This page is broken down into sections
- Advantages
- Components and Requirements
- How it works
- How to set up TOMS
- Using TOMS
Advantages
TOMS can keep track of runs that have been made previously. Given a Source output file it can tell you what scenario in what project was used to create it. It can tell you when it was run, who ran it, what Source version, and what inputs were used. It will also allow you to associate notes with a run (eg "the run we used for the deliverable document v3") to make them easier to understand.
More importantly it can reload the project (in whatever source version you specify) or even just reproduce an output file using a new version of Source.
It has the facility to be used by a server to automatically ensure that every new output file is reproducible, and as a regression test to confirm new Source versions gives the same result for all important runs.
It is easy to set up, and free to use.
Components and Requirements
TOMS has three components and five requirements.
Component 1: the TOMS user interface. This has the same system dependencies as Source, so if you can run Source you should be able to use it.
Component 2: the TOMS test interface. This is a .NET library that can be used by the test framework NUnit (see below) to automate both testing TOMS itself, and that the contents of the repository (see below).
Component 3: the TOMS library. This is a .NET library that holds the core functionality of TOMS and can be used if someone wishes to write their own TOMS-like functionality.
Requirement 1: Source. It is likely that the machine that runs TOMS will have several versions of Source installed. At least one version must be installed.
Requirement 2: A Provenance Writer plugin. TOMS isn't magic. In order to rerun a Source run, or load the project that create it, TOMS needs to have a large quantity of information about that run (the provenance of the run). It is necessary to install in Source a Provenance Writer Plugin that writes all the information and stores it in the repository (see below). An Example Provenance plugin that is sufficient for the vast majority of users is shipped with Source. The code is freely available on request if you would like to customise it in the future.
Requirement 3: A Repository. The Provenance data must be stored in an ordered way where TOMS can find it. It is also desirable that this repository ensure the provenance information is not modified, that certain system data is stored, that provenance information can be easily shared between multiple users on multiple machines, and that it is possible to "listen" for changes to the repository. The Example Provenance plugin uses Mercurial version control software to maintain a repository that is in a particular format. Setup is addressed later.
Requirement 4: TeamCity automation software. In order to use TOMS on a server it is necessary to have automation software that can monitor the repository and run TOMS. TeamCity is software used by eWater that performs this process easily.
How it Works
While it is not necessary to know exactly how TOMS works to use it, it can be useful to have an overview of the underlying mechanisms.
When Source is instructed to run a scenario through the user interface it will notify the Provenance Writer plugin at three points.
- Before the scenario is run the Provenance Writer will store data such as; which input set is being used, which data sources are available, which scenario is selected and so on.
It will also check that a repository is being used and that any changes that might affect the output (such as the contents of data sources) are committed to the repository. Uncommitted changes could be lost, so will mean that provenance is impossible to prove. If this is the case the user will be warned at the end of run. Note: the warning functionality is currently being improved to allow the user to decide whether they wish to run if provenance cannot be guaranteed. - After the scenario is run more information (primarily that generated by the run) will be stored.
A copy of the project file as of the time of the run will also be saved and stored in the repository. Note: this is a very safe mechanism of ensuring that project file is stored exactly as it was, however this is potentially very expensive in run time (save time is added to run time on every run) and repository size (which is increased by the size of the project file every time). Due to this eWater is currently investigating ways to confirm the project file is unchanged since it was last loaded, and if it is unchanged simply store that information rather than the project file itself. - Finally once all information is gathered the Provenance file will be written and committed to the repository.
When TOMS shows the contents of a directory it goes through multiple steps.
- Firstly it looks at the repository and extracts any information it can from the directory structure.
- Next it queries the repository management software for additional information (such as the date files were written).
- Next it reads its own configuration file to find out any notes (or requests to hide run information) and updates the user interface. The configuration file is a simple .json file.
- Finally it goes through all the Provenance files and extracts the written data, keeping information useful for the summary. This process can be time consuming and results in a slow update of additional information on the user interface as it becomes available.
- When TOMS has processed all files it will return control to the user.
When TOMS is asked to add a note or remove (hide) a run
- TOMS stores the information internally.
- A run that is removed is hidden on the local machine.
- When TOMS is next closed down, or when the repository is next changed, the configuration .json file is updated. Note: this means if TOMS crashes it may not update the configuration file.
When TOMS opens a run
- TOMS opens the Provenance file and parses the information it needs
- TOMS creates a new temporary directory.
- TOMS checks out all the files it needs to the new temporary directory.
- TOMS opens the project in the new temporary directory
Note: the temporary directory is not in the repository. This means that the provenance plugins will not operate. Any changes to the project file will not be saved to the repository.
When TOMS reruns a run
- TOMS opens the Provenance file and parses the information it needs
- TOMS creates a new temporary directory.
- TOMS checks out all the files it needs to the new temporary directory.
- TOMS uses the Source command line runner to run the project and generate an output file
When TeamCity calls TOMS as a test auite
- TeamCity uses NUnit to ask the TOMS library what "Tests" are available
- The tests are run one by one.
- One set of tests ensures that for any output that has been checked into the output directory the following is true
- the output file has a run identifier
- the provenance file for the run identifier is present
- The test then reruns the project to regenerate a new output file
- The new output file and the old output file are then compared.
How to Set Up TOMS
Step 1: setting up a Repository
- You will need to install mercurial. It is recommended to install TortoiseHg.
- You may need a remote mercurial host. These instructions assume BitBucket.org.
This is not mandatory and as this will involve storing data remotely security issues should be kept in mind. - There are three options for a repository; using an already existing repository
- To use a remote repository that already exists
- Log in to Bitbucket.org
- Go to the Bitbucket page for the repository; for example https://bitbucket.org/Harakani/provenanceregressiontests
- Get the location of the repository; for example https://Harakani@bitbucket.org/Harakani/provenanceregressiontests
- Use Mercurial clone to create a local clone of the repository in the destination directory.
- To create a local repository for testing purposes
- In HgWorkbench select File→New Repository
- Add the destination directory
- To create a remote repository
- Log in to Bitbucket.org
- click "Create" (the + symbol)
- create new repository
- Use Mercurial clone to create a local clone of the new repository in the destination directory.
- Set up the repository structure
- Before the repository is first used it will be necessary to add some simple structure. Create the following subdirectories and commit the changes
- Inputs - this is the directory where data sources should go. This is especially true of "Reload on Run" files.
- Models - this is where project files should go. As Input Sets are effectively model variants, they also should go here.
- Runs - this is where the ProvenanceWriter will write provenance files. Each run will be given a subdirectory based on its identifier and have a project and a .json file.
Note: this directory is for the ProvenanceWriter and not for users. - Output - this is where outputs should be stored. When you have an output you want to keep, commit it. This should mean that all uncomitted outputs can be removed at any point.
- It is also a good point to create a TomsConfig.json file. Cteate an empty text file with that name (TomsConfig.json). TOMS will write its configuration in this file when it first runs, but will not commit it.
Note: other names are certainly possible, this is simply a default.
- Before the repository is first used it will be necessary to add some simple structure. Create the following subdirectories and commit the changes
- To use a remote repository that already exists
Step 2: setting up Source
- Ensure Source is installed on the computer. This should be Source 4.2.7 or later.
- We now need to install the ExampleProvenance plugin
- Open Source
- Select "Plugin Manager" from the Tools menu.
- Click on Browse
- Navigate to the Plugins directory of the directory in which Source was installed.
- Select ExampleProvenance.dll
- Click Open.
- We will also need to install the Mercurial.Net.dll plugin
- Click on Browse
- Select Mercurial.Net.dll
- Click Open
- Click OK
- Source will need to restart, so select "yes" when prompted.
Step 3: setting up TOMS
- Option 1: build TOMS yourself
- The TOMS repository can be found at https://bitbucket.org/ewater/toms
- Apply to Geoff Davis for permission to access the directory.
- Check it out.
- Build in Visual Studio.
- Option 2: just get the assemblies and run straight away.
- From this list download the following into a new directory for TOMS by clicking on the links.
- You will also need to add the following to the directory. You can find them in your Source download directory
- ExampleProvenance.dll
- Mercurial.Net.dll
- Mercurial.Net.xml
- Newtonsoft.Json.dll
- Newtonsoft.Json.xml
- RiverSystem.Dora.API.dll
- You will also need to add the following to the directory. You can find them in your Source download directory
Using TOMS
Running a run
With Source configured as above load the project file from the Models subdirectory of the Repository.
Click "Configure" and configure your input sets.
Ensure you are using "Single analysis".
Click Run
The run will run.
When the results manager comes up, click on the Scenario. Note click on the Scenario itself, not any subfolders.
In the panel that is brought up is several pieces of information
- Run Name: a name for the run
- Identifier: this is a unique ID that source has assigned to this run. Even if the name is changed, this identifier is persistent.
- Repeatability Issues; a list of any issues that would prevent repeatability will be shown here. If there is an issue, the run cannot be repeated. Please fix the error and try again. If there are no issues this will simply say "None"
- External Files; if there are external files the location of those files and their states will be shown. This has many fields, but the most important is Status (which should be "Clean") and Absolute Path (which should point to the file). The other fields are largely there to help TOMS and to clarify errors.
- Status; the mercurial state of the file
- Purpose; an optional field - ignore for now
- Absolute Path; where the file is on the local machine
- Repository Path; where the repository is on the disk (at the moment this should be the same for all files)
- Relative Path; where the file is relative to the repository root
- Expected Path; where it is expected the file should go (this is special because the model file copy that is stored will be treated as if it is the original file)
- Exists; does the file exist on disk - should always be ticked
- Reloaded; is the file "Reload on Run" - should always be ticked
- Version Controlled; is the file version controlled - should always be ticked
- Revision; the revision of the file used in the run
- Message; the last message of the revision used (as revisions are hard for humans to make sense of)
- Metadata table; this is a simple list of properties about the run. There are three fields, and each row represents a combination.
- Key; what the property is
- sub-Key; some properties are multivariate. For example "Provenance Plugins Loaded" has a sub-key for each plugin class that is loaded.
- Value; the value of the property, represented as a string.
Repeatability Issues
Issue | Fix |
---|---|
Could not find model file to commit | The model files has not been committed to the Models subdirectory of the repoitory. Please commit it using Mercurial. |
Using TOMS
Go to the directory where you installed TOMS. Click on PushButtonRepeatability.exe
A window like the following will come up
It is necessary to set the SourceDirectory to the directory of the version of Source you want to use. This is the subdirectory "Output" of that installation.
Set the repository root and the config file to the locations on disk.
If any of these are incorrect there should be a red explanatory remark to the right of the cell, and the "Get Runs" button will vanish. For example;
When these are all filled in then correctly click on "Get Runs".
TOMS will now spend a while filling in all possible runs.
Click on the run you are interested in and run controls will appear in the top right.
From here you can
- Open the run in the selected Source Version
- Click on open
- You may be prompted to update the run to the new version of source. If so, agree to update.
- Source will open with the file loaded in a temporary directory (not the repository).
TOMS will still be open behind this and TOMS can open multiple runs at the same time. - Note: the input set is NOT configured automatically.
- Note: Plugins are not necessarily loaded by default - you will need to endure this manually.
- Rerun the run in the selected Source Version
- Click on rerun
- Enter an output file name (.res.csv)
- A window will show up where the run is performed.
- If there is an error, then a window will pop up at the end to tell you so.
- Note: at this stage you may only rerun one run at a time.
- Add a comment to the run in the TOMS Available- Runs table.
- The comment dialog will pop up
- The text field will hold any existing comment.
- Enter the comment in the text field and press 'OK'.
- Comments are persistent so long as TOMS is closed normally
- Note: Comments are not commited to a repository by default, so are user specific.
- Remove this run from the TOMS Available Runs table.
- A removal dialog will pop up asking if you wish to remove the run.
- Click 'OK'.
- Note: removed runs are removed from the TOMS user interface and not from the repository.