Statistics are generated and may be analysed in many parts of Source. This page provides information on the statistical functions available in Source including what they measure and how they're interpreted. For information on how to use statistics in Source, see the relevant section of the user guide.
Types in general, how they're used,
link to optimisation
In Results Manager, statistics are User generated and auto generated
Locations
Where statistics are put in, created, may be analysed
In Results Manager, statistics are categorised as either user or auto generated.
Univariate and Bivariate Statistics
Univariate statistics provide information on single ......
Bivariate statistics .....
It is common for hydrological time series to contain missing values and to have differing start and end dates. Generally, Source calculates bivariate statistics using only data from those time steps for which there are complete data pairs. *TODO: where is this not true? Bivariate statis in charting??
See Calibration analysis - SRG, /wiki/spaces/TIME/pages/56721988,
Nash-Sutcliffe Efficiency (NSE)
Definition
The NSE is a normalised statistic that measures the relative magnitude of the model error variance compared to the measured data variance (Nash and Sutcliffe, 1970). It is commonly used to evaluate the fit of modelled to observed streamflow data, and the definition and discussion below assume that it is being applied in this context. However, the NSE can be used to evaluate the fit between time series of any type.
The NSE defined as:
...
where
Qobs,i is the observed flow for time step i
Qmod,i is the modelled flow for time step i
N is the number of time steps
The time step size is arbitrary.
Interpretation
The NSE can range between -∞ and 1.
- NSE = 1 corresponds to a perfect match between modelled and observed data
- NSE = 0 indicates that the model predictions are as accurate as the mean of the observed data
- NSE < 0 indicates that the mean of the observed data is a better predictor than the model
The NSE is sensitive to the timing of flow events. It is often applied on a daily time step. Applying it on a longer time step, such as monthly, can be used to evaluate the fit to the monthly pattern of flows without being influenced by the timing of individual runoff events.
Sensitive to extreme values and insensitive to small values. For example, the NSE is generally not suitable for evaluating the fit to low flows as the value will be dominated by the fit to high flows
Links
NSE of Log Data (NSE Log)
Definition
NSE Log is the standard NSE statistic (equation (1)) applied to the logarithm of flow data:
...
where
c is a positive constant equal to the maximum of 1 ML and the 10th percentile of the observed flow
other terms are as defined in equation (1)
As with the standard NSE, the time step size is arbitrary. The NSE Log cannot be applied to time series with negative values, as the logarithm of a number less than or equal to zero is undefined.
Interpretation
Using the logarithm of flows has the effect of reducing the sensitivity of the statistic to high flows and increasing the sensitivity to low and mid-range flows. For this reason, NSE Log is often used where low-flow performance is important. The use of the constant c de-emphasises very small flows, which tend to be unreliable, and avoids numerical problems arising from attempting to calculate the logarithm of zero flows.
The NSE Log can range between -∞ and 1 and the interpretation is the same as for the NSE, but applied to log data.
Links
Absolute Bias
Definition
Interpretation
Links
Bias Penalty
Definition
Interpretation
Links
Pearson's Correlation
Definition
Interpretation
Links
Flow Duration
Definition
The Flow Duration statistic is calculated by sorting the observed and modelled data values in increasing order and then calculating the NSE (equation (1)) of the sorted data.
It can be applied for any time step size.
Interpretation
The Flow Duration is insensitive to the timing of flows and instead measures the fit to the overall distribution of flow magnitudes. It is sensitive to high flows and less sensitive to low flows.
Links
Flow Duration of Log Data (Log Flow Duration)
Definition
The Log Flow Duration statistic is calculated applying the Flow Duration to log transformed data:
...
Interpretation
The Log Flow Duration measures the fit to the overall distribution of flow magnitudes and it is sensitive to low and mid-range flows.
Links
Sum of Daily Flows and Daily Exceedance (Flow Duration) Curve and Bias (SDEB)
Definition
The SDEB statistic combines three terms:
- the sum of errors on power transformed flow,
- the same sum on sorted flow values and
- the relative simulation bias.
This objective function is based on the function introduced by Coron et al. (2012) and has been successfully applied in a number of projects (e.g. Lerat et al., 2013). It has the following equation:
...
where:
α is a weighting factor set to 0.1
λ is an exponent set to 0.5
RQobs,k is the k’th ranked observed flow of a total of N ranked flows
RQsim,k is the k’th ranked modelled flow of a total of N ranked flows
Other terms are as defined previously.
Interpretation
The coefficient α and the power transform λ are used to balance the three terms within the objective function.
...
Links
References
Coron, L., V. Andrassian, P. Perrin, J. Lerat, J. Vaze, M. Bourqui and F. Hendrickx (2012) Crash testing hydrological models in contrasted climate conditions: an experiment on 216 Australian catchments. Water Resources Research, 48, W05552, doi:10.1029/ 2011WR011721.
Lerat, J., C.A. Egan, S. Kim, M. Gooda, A. Loy, Q. Shao and C. Petheram (2013) Calibration of river models for the Flinders and Gilbert catchments. A technical report to the Australian Government from the CSIRO Flinders and Gilbert Agricultural Resource Assessment, part of the North Queensland Irrigated Agriculture Strategy. CSIRO Water for a Healthy Country and Sustainable Agriculture flagships, Australia.
Nash, J.E. and J.V. Sutcliffe (1970) River flow forecasting through conceptual models part I — A discussion of principles. Journal of Hydrology, 10 (3), 282–290.There are two main categories of statistics in Source:
- Univariate statistics provide information on a single variable and are intended to summarise and reveal patterns in that variable, see Univariate Statistics SRG.
- Bivariate statistics compare two variables for the purpose of determining empirical relationships between them, see Bivariate Statistics SRG.
Locations
Statistics are generated and may be analysed in many parts of Source:
- In Results Manager:
- Chart statistics, found on the Statistics tab, provide univariate and bivariate statistics for selected result(s), see Chart Statistics.
- User-generated statistics provide a broader range of statistics for either an entire scenario or for a single result from different scenarios, see Results Manager Statistics.
- In Calibration Wizard:
- During configuration, you select the objective function that the optimsation algorithm will either minimise or maximise during the calibration (depending on the statistic chosen), see Calibration Wizard for Catchments - Select elements to record.
- Once a calibration run has finished, you can view a summary of the objective function statistics from the calibration run in the Simulation Runner, or view univariate and bivariate statistics, see Calibration Wizard for Catchments - Inspecting calibration results.
- In Data Source Explorer, you can right click on a Data Source and select View Data to open a chart, table and statistics view of the data source. See Specifying data inputs - Data Sources Explorer.