General statistics

Statistics are generated and may be analysed in many parts of Source. This page provides information on the statistical functions available in Source including what they measure and how they're interpreted. For information on how to use statistics in Source, see the relevant section of the user guide.

Types in general, how they're used,

link to optimisation

In Results Manager, statistics are User generated and auto generated

Locations

Where statistics are put in, created, may be analysed

In Results Manager, statistics are categorised as either user or auto generated.

Univariate and Bivariate Statistics

Univariate statistics provide information on single ......

Bivariate statistics .....

It is common for hydrological time series to contain missing values and to have differing start and end dates. Generally, Source calculates bivariate statistics using only data from those time steps for which there are complete data pairs. *TODO: where is this not true? Bivariate statis in charting??

See Calibration analysis - SRG, /wiki/spaces/TIME/pages/56721988,

Nash-Sutcliffe Efficiency (NSE)

Definition

The NSE is a normalised statistic that measures the relative magnitude of the model error variance compared to the measured data variance (Nash and Sutcliffe, 1970). It is commonly used to evaluate the fit of modelled to observed streamflow data, and the definition and discussion below assume that it is being applied in this context. However, the NSE can be used to evaluate the fit between time series of any type.

The NSE defined as:

Equation 1

where

Q_obs,i is the observed flow for time step i

Q_mod,i is the modelled flow for time step i

N is the number of time steps

The time step size is arbitrary.

Interpretation

The NSE can range between -∞ and 1.

NSE = 1 corresponds to a perfect match between modelled and observed data
NSE = 0 indicates that the model predictions are as accurate as the mean of the observed data
NSE < 0 indicates that the mean of the observed data is a better predictor than the model

The NSE is sensitive to the timing of flow events. It is often applied on a daily time step. Applying it on a longer time step, such as monthly, can be used to evaluate the fit to the monthly pattern of flows without being influenced by the timing of individual runoff events.

Sensitive to extreme values and insensitive to small values. For example, the NSE is generally not suitable for evaluating the fit to low flows as the value will be dominated by the fit to high flows

Links

NSE of Log Data (NSE Log)

Definition

NSE Log is the standard NSE statistic (equation (1)) applied to the logarithm of flow data:

Equation 2

where

c is a positive constant equal to the maximum of 1 ML and the 10^th percentile of the observed flow

other terms are as defined in equation (1)

As with the standard NSE, the time step size is arbitrary. The NSE Log cannot be applied to time series with negative values, as the logarithm of a number less than or equal to zero is undefined.

Interpretation

Using the logarithm of flows has the effect of reducing the sensitivity of the statistic to high flows and increasing the sensitivity to low and mid-range flows. For this reason, NSE Log is often used where low-flow performance is important. The use of the constant c de-emphasises very small flows, which tend to be unreliable, and avoids numerical problems arising from attempting to calculate the logarithm of zero flows.

The NSE Log can range between -∞ and 1 and the interpretation is the same as for the NSE, but applied to log data.

Links

Absolute Bias

Definition

Interpretation

Links

Bias Penalty

Definition

Interpretation

Links

Pearson's Correlation

Definition

Interpretation

Links

Flow Duration

Definition

The Flow Duration statistic is calculated by sorting the observed and modelled data values in increasing order and then calculating the NSE (equation (1)) of the sorted data.

It can be applied for any time step size.

Interpretation

The Flow Duration is insensitive to the timing of flows and instead measures the fit to the overall distribution of flow magnitudes. It is sensitive to high flows and less sensitive to low flows.

Links

Flow Duration of Log Data (Log Flow Duration)

Definition

The Log Flow Duration statistic is calculated applying the Flow Duration to log transformed data:

Equation X

Interpretation

The Log Flow Duration measures the fit to the overall distribution of flow magnitudes and it is sensitive to low and mid-range flows.

Links

Sum of Daily Flows and Daily Exceedance (Flow Duration) Curve and Bias (SDEB)

Definition

The SDEB statistic combines three terms:

the sum of errors on power transformed flow,
the same sum on sorted flow values and
the relative simulation bias.

This objective function is based on the function introduced by Coron et al. (2012) and has been successfully applied in a number of projects (e.g. Lerat et al., 2013). It has the following equation:

Equation X

where:

α is a weighting factor set to 0.1
λ is an exponent set to 0.5
R_Qobs,k is the k’th ranked observed flow of a total of N ranked flows
R_Qsim,k is the k’th ranked modelled flow of a total of N ranked flows
Other terms are as defined previously.

Interpretation

The coefficient α and the power transform λ are used to balance the three terms within the objective function.

The weighting factor α is used to reduce the impact of the timing errors on the objective function. This type of error can have a significant effect on the first term in equation (X), where a slight misalignment of observed and simulated peak flow timing can result in large amplitude errors. The second term is based on sorted flow values, which remain unaffected by timing errors. By way of example, in their study of the Flinders and Gilbert Rivers in Northern Australia, Lerat et al. (2013) used values of α of 0.1 for the Flinders calibration and 1.0 for the Gilbert calibration.
Using values of power transform λ of less than 1 has the effect of reducing the weight of the errors in high flows, where the flow data are known to be less accurate. Lerat et al. (2013) found that a power transform of ½ led to the best compromise between high and low flow performance in their project. This value has been adopted in Source.

Links

References

Coron, L., V. Andrassian, P. Perrin, J. Lerat, J. Vaze, M. Bourqui and F. Hendrickx (2012) Crash testing hydrological models in contrasted climate conditions: an experiment on 216 Australian catchments. Water Resources Research, 48, W05552, doi:10.1029/ 2011WR011721.

Lerat, J., C.A. Egan, S. Kim, M. Gooda, A. Loy, Q. Shao and C. Petheram (2013) Calibration of river models for the Flinders and Gilbert catchments. A technical report to the Australian Government from the CSIRO Flinders and Gilbert Agricultural Resource Assessment, part of the North Queensland Irrigated Agriculture Strategy. CSIRO Water for a Healthy Country and Sustainable Agriculture flagships, Australia.

Nash, J.E. and J.V. Sutcliffe (1970) River flow forecasting through conceptual models part I — A discussion of principles. Journal of Hydrology, 10 (3), 282–290.