Replicate Analysis

The Replicate Analysis run configuration in Source allows users to run the same model scenario multiple times using input time series replication. The replicate analysis functionality uses a cycled input data concept in which the data for the first year in the simulation is shifted progressively by a year or selected year increments. In this way multiple instances of any Source model can be run using input data derived using the cycled data concept, allowing extraction of results for each replicate. This is useful when the user wants to get an idea of the risk associated with the decision variables. For instance, 'Replicate Analysis' can be used to assess the risk of spill in reservoirs under varying input climate conditions. 

Once the user has defined the required number of replicates and increment and selected the data source to be cycled, the Source model runs for each replicate and provides an output for each replicate in the Results Manager. For example, if the data source selected for cycling is a time series of rainfall data, it is used for multiple simulations using the same model scenario to produce the required outputs, such as downstream flows. In this way, the user can analyse the effect of changes in rainfall patterns on the downstream flows. 

Replicate analysis run configuration

To use this functionality, firstly the user has to select the 'Replicate analysis' option from the run configuration drop-down list on the Simulation toolbar (Figure 1). 

Figure 1. Simulation toolbar- Replicate analysis

If a new version of Source is installed and used for the first time, the 'Replicate analysis' option has to be added using the 'Add new configuration' tab of 'Edit>Scenario Options>Running Configurations'.

Run configuration

Once the 'Replicate analysis' is selected, the user can configure the replicate run. There are two configuration tabs ('Run Configuration' and 'Data Sources Configuration') associated with this functionality, replacing the standard Run Configuration tab for a ‘Single Analysis’ run.  The 'Run Configuration' tab of the configuration window appears in Figure 2.

  Figure 2. Replicate Analysis Configuration Window- Run Configuration

Using this configuration setup, users can configure multi-replicate simulation with multiple data sources and be able to specify the start and end dates of the model run period, the number of replicates, the interval between consecutive replicates (Replicate Increment), the start cycle year for replicates, option for remove excess data and replicate name format. The 'Start Date' and 'End Date' are the model run dates against which the cycled data from the data sources are reported. The start and end dates remain the same for all replicates. The 'Start Cycle Date' is the first data date that will be used to generate replicate data, and its month and day are the same as those in Start Date.

The description for each run configuration parameter is given in Table 1 below.

Table 1. Run Configuration parameters and their description. 

Option

Description

Example

Start Date

The start date for the model run period. 

01/11/1891

End Date

The end date of the model run period. 

30/06/1892

Time Step

The time step of run, which is the same as the time step in the source data

Daily/Monthly

Run Separate Networks in Parallel

Option to allow running separate networks parallelly


Max Replicates

Max replicate number allowed to be entered. It is calculated based on other input parameters.

128

Number of Replicates

The number of times (replicates) the model will be run. It can not be more than Max Replicates

5

Replicate Increment

The interval between consecutive replicates

1

Start Cycle Date

The first data date that will be used to generate the replicate data.

The user only needs to select a year. Its month and day are the same as those in the Start Date

1/11/1891


Remove excess data

The option to truncate/ignore the partial data from the source data.


Replicate Name Format

The format option for the name shown in the results

Three options: Replicate Start Year, Replate Number, and Replicate Number and Start Year

Replicate Number and Start Year (1:1891).

(If the replicate number is more than 9, it could be 01:1891, 001:1891, etc.)


The above run configuration as in Figure 2 can be explained using the figure (Figure 3) below. 

Figure 3. Explanation of the above replicate run configuration

The original data used in the replicate run are related to input parameters such as Start Cycle Date and Remove excess data. For example, the source data are from 01/01/1891 to 30/06/2020, and other entries are not changed in Figure 2 :

  • If 2018 is selected for the Start Cycle Date, the source data used for Replicate 1 to 5 will be those of 01/01/2018 -30/06/2019, 01/01/1891 -30/06/1892, 01/01/1892 -30/06/1893, 01/01/1893 -30/06/1894, and 01/01/1894 -30/06/1895. The partial data from 2000 was trimmed for the option "Remove excess data” before building the data for the replicate. 2019 data cannot make a whole replicate, and the next replicate after 1:2018 will start from 01/01/1891.
  • If 2019 is selected for the Start Cycle Date, the source data used for Replicate 1 to 5 will be those of 01/01/1891 -30/06/1892, 01/01/1892 -30/06/1893, 01/01/1893 -30/06/1894, and 01/01/1894 -30/06/1895 and 01/01/1895 -30/06/1896. The partial data from 2000 was trimmed for the option "Remove excess data” before building the data for the replicate. 2019 data cannot make a whole replicate, and the first replicate will start from 01/01/1891.
  • If 2000 is selected for the Start Cycle Date, the user interface will give an error message because the partial data in the year 2000 will be trimmed, and it cannot be used for the replicate.

The leap year in the model run period from the Start Date to the End date may not match the data/date in a replicate. Replicate Analysis handles the conflicts as follows, and the example is based on the data in Figure 2 and Figure 3:

  • If there is a leap year (e.g. 29/02/1892) in the model run period and the replicate data  is not a leap year (e.g. 29/02/1893 in replicate name 2:1892), the mean original value from 28/02/1893 and 01/03/1893 will be assigned to 29/02/1892 for the model run.
  • If there is a leap year (e.g. 29/02/1892) in the model run period and the replicate data is also a leap year (e.g. 29/02/1896 in replicate name 5:1896), the original value on 29/02/1896 will be assigned to 29/02/1892 for the model run.
  • If there is no leap year in the model run period and the replicate data is a leap year, the original value in the replicate data on 29 February will be ignored in the model run.

The details about handling conflicts of the leap year in Replicate Analysis are given in Table 2.

Table 2. Data used in the leap year in Replicate Analysis

Model Run Period

Replicate Name and the original data used

1:1891

2:1892

3:1893

4:1894

5:1895

1/11/1891

1/11/1891

1/11/1892

1/11/1893

1/11/1894

1/11/1895

……

……

……

……

……

……

28/02/1892

28/02/1892

28/02/1893

28/02/1894

28/02/1895

28/02/1896

29/02/1892

29/02/1892

(28/02/1893+1/03/1893)/2

(28/02/1894+1/03/1894)/2

(28/02/1895+1/03/1895)/2

29/02/1896

 1/03/1892

 1/03/1892

1/03/1893

1/03/1894

1/03/1895

1/03/1896

……

……

……

……

……

……

30/06/1892

30/06/1892

30/06/1893

30/06/1894

30/06/1895

30/06/1896


Figure 4 shows another example of the replicate run configuration. The Start Date and End Date cover multiple years on a monthly basis. The input data starting from 1/01/1957 are cycled and placed against the start and end dates of 1/01/1990 and 1/12/2000  ( only 1st day format used for the monthly data) respectively.

Figure 4. Another example of Replicate Analysis Run Configuration


The above run configuration can be explained using Figure 5 below. 

Figure 5. Explanation of the above run configuration example

Data sources configuration

When running a replicate analysis, the user can decide whether to cycle all the data sources or only selected data sources. For this, the 'Data Sources Configuration' tab is used as shown in Figure 6. All data sources are cycled in the left example configuration, whereas in the right example configuration, only one data source (Inflow_Crab_Creek.csv) is cycled.

Figure 6. Replicate Analysis Configuration Window- Data Sources Configuration


Figures 7, 8 and 9 below further illustrate different replicate analysis configurations and resulting replicate outputs. The symbols in the figure represent the yearly or sub-yearly values/data corresponding to the original data dates. The start and end dates are the same for all replicates.

In Figure 7, the replicate increment is 1 year and original data from 2002 to 2008 are cycled and reported against dates between 2002 (start date) and 2008 (end date). The Source model is run three times (Number of replicates = 3) from the start date (01/01/2002) till the end date (31/12/2008), and the input time series selected for replication is cycled, incrementing by 1 year each time. Replicate 1 starts with the original data of 2002 reported against the start date of 2002; Replicate 2 starts with the original data corresponding to 2003 and Replicate 3 starts with the original data from 2004.

Figure 7. Replicate analysis configuration and output with replicates =3, increment =1

For the replicate run configuration as in Figure 7, the data is reported as given in Table 3. 

Table 3. Model run year and replicated input data year for the three replicates

Replicate 1

Replicate 2

Replicate 3

Model run year

Replicated input data year

Model run year

Replicated input data year

Model run year

Replicated input data year

2002

2002

2002

2003

2002

2004

2003

2003

2003

2004

2003

2005

2004

2004

2004

2005

2004

2006

2005

2005

2005

2006

2005

2007

2006

2006

2006

2007

2006

2008

2007

2007

2007

2008

2007

2002

2008

2008

2008

2002

2008

2003


Figure 8 has the same original data as in Figure 7 with original data from 2002 to 2008 cycled and reported against dates between 2002 (start date) and 2008 (end date), but with a replicate increment of 2 and a Start Cycle date of 2004 The model is run twice (number of replicates is 2) from the start date (01/01/2002) till the end date (31/12/2008), and the selected input time series is replicated by an increment of 2 years starting in 2004. Replicate 1 starts with original data from 2004, as this is the 'Start Cycle Date'. The second replicate starts with data from 2006 as the increment is two.

Figure 8. Replicate analysis configuration and output with replicates =2, increment =2

For the above configuration (Figure 8), the data is reported as given in Table 4. 

Table 4. Model run year and replicated input data year for the two replicates

Replicate 1

Replicate 2

Model year

Replicated input data year

Model year

Replicated input data year

2002

2004

2002

2006

2003

2005

2003

2007

2004

2006

2004

2008

2005

2007

2005

2002

2006

2008

2006

2003

2007

2002

2007

2004

2008

2003

2008

2005


Figure 9 illustrates that the model can be run and cycled on a subset of the available input data. In this example, it is assumed original input data starts in 2000 and ends in 2010, but the modeller only requires a replicate analysis for 2002 – 2008. Selected input data are cycled and reported against dates from 2002 (start date) to 2008 (end date). In this example the number of replicates is still 2, but the Start Cycle date is 01/01/2005. The model is run twice (2 replicates) from 01/01/2002 to 31/12/2008, and the selected input time series is replicated by an increment of 2 years starting in 2005. Replicate 1 starts with original data from 2005, as it is the 'Start Cycle Date'. The second replicate starts with data from 2007 as the increment is two.

Figure 9. Replicate analysis configuration and output with replicates = 2, increment = 2

For the above configuration (Figure 9), the data is reported as given in Table 5. 

Table 5. Model run year and replicated input data year for the two replicates

Replicate 1

Replicate 2

Model year

Replicated input data year

Model year

Replicated input data year

2002

2005

2002

2007

2003

2006

2003

2008

2004

2007

2004

2002

2005

2008

2005

2003

2006

2002

2006

2004

2007

2003

2007

2005

2008

2004

2008

2006

Results

The replicate analysis run will produce a single run with sub-runs for each replicate run as appears in the Results Manager.  Results of the run and all sub-runs can be exported to res.csv or Source Db format. Individual sub-run results can be exported to res.csv, Source Db or other formats.


The replicate analysis is further explained by using some Source model examples.

Example 1

Consider an example Source model as shown in Figure 10. The input data sources are time series monthly rainfall and two monthly demands and the data period spans from 1/01/1957 to 1/12/2003 for all of the data sources. 

Figure 10. Example model and the replicate analysis configuration

As shown in Figure 10, for each replicate, the model will be run for a 10-year period starting from 1/01/1990 and ending on 1/12/2000 (the first day format for monthly data). Though any time step can be used, in this example, a monthly time step is considered as the input data time step is monthly. The number of replicates is taken as 5 with an increment of 1 year. The input data is cycled starting in 1/01/1957. Only rainfall data is cycled. 

Once the configuration is set up, the model run produces the results shown in Figure 11 in the Results Manager 'Table' format. It can be seen from the left side of Results Manager that the scenario results are provided as five replicates as sub-runs. The sub-run names correspond to the replicate number and the start cycle year of each replicate with an increment of  one year.

The 'Date' column in the table (right side of the figure) corresponds to the 'Start Date' and 'End Date' and dates in between. These dates would be the same for all replicates. Each column to the right of the 'Date' column represents each replicate (sub-run) of the data source 'Rainfall'. It can be seen from the table that for the first replicate, the values (data) for 1990 correspond to the original values of 1957, whereas in the second replicate, the values for 1990 correspond to those of 1958 and so on. For the fifth replicate, the values for 1990 are replaced with values corresponding to 1961.

Figure 11. Replicate run results as in Results Manager


The below table (Table 6) shows how the original data dates are cycled in the above example. 

Table 6. Reported date and the corresponding original data dates

Replicate run name

1957 (Replicate 1)

1958 (Replicate 2)

1959 (Replicate 3)

1960 (Replicate 4)

1961 (Replicate 5)

Reported starting date

Original data starting date

Original data starting date

Original data starting date

Original data starting date

Original data starting date

1990

1957

1958

1959

1960

1961

1991

1958

1959

1960

1961

1962

1992

1959

1960

1961

1962

1963

1993

1960

1961

1962

1963

1964

.

.

.

.

.

.

.

.

.

.

.

.

2000

1967

1968

1969

1970

1971

Example 2

Consider another example that illustrates the ability of the Replicate Analysis functionality to cycle part of yearly data and report against the same start date. In the Replicate Analysis configuration as in Figure 12, the cycle date starts on 1/05/1998 with 15 replicates and increment of one year. The 'Start Date' has the same date as that of 'Start Cycle Date', while the 'End Date' is 30/11/1998. Therefore, the reporting period is only seven months. The data source (Creek_Inflows) period spans from 11/01/1998 to 31/12/2013 (seven months). 

Figure 12. Replicate Analysis Configuration with partial year data cycling 


The model run results in 15 replicates, each having seven months of data (from 1st of May to 30th November) from each year between 1998 and 2012 reported against 1/05/1998 to 30/11/1998. 

In Figure 13, the sub-run names indicate each replicate with seven months of data. The right side of the figure shows that three replicates with data (from 1st May to 30th November) corresponding to start cycle dates as 1998, 1999 and 2000 are reported against 01 May 1998 and 30 Nov 1998. 

Figure 13. Replicate run results showing 15 replicates with partial year data cycled between 1/05/1998 and 30/11/1998



It should be noted that the number of replicates should be less than or equal to the number of years of data in the data source. 

The below table (Table 6) shows how the replicate analysis works in the above example.

Table 6. Reported date and the corresponding original data dates

Reported date1998 (Replicate 1)1999 (Replicate 2)2000 (Replicate 3)......2012 (Replicate 15)
Reported starting dateOriginal data starting dateOriginal data starting dateOriginal data starting date......Original data starting date
1/05/1998 - 30/11/19981/05/1998 - 30/11/19981/05/1999 - 30/11/19991/05/2000 - 30/11/2000......1/05/2012 - 30/11/2012