General statistics

Statistics are generated and may be analysed in many parts of Source. This page provides information on the statistical functions available in Source including what they measure and how they're interpreted. For information on how to use statistics in Source, see the relevant section of the user guide.

Types in general, how they're used,

link to optimisation

In Results Manager, statistics are User generated and auto generated

Locations

Where statistics are put in, created, may be analysed

In Results Manager, statistics are categorised as either user or auto generated.

Univariate and Bivariate Statistics

Univariate statistics provide information on single ......

Bivariate statistics .....

It is common for hydrological time series to contain missing values and to have differing start and end dates. Generally, Source calculates bivariate statistics using only data from those time steps for which there are complete data pairs. *TODO: where is this not true? Bivariate statis in charting??

See Calibration analysis - SRG, /wiki/spaces/TIME/pages/56721988,

Nash-Sutcliffe Efficiency (NSE)

Definition

The NSE is a normalised statistic that measures the relative magnitude of the model error variance compared to the measured data variance (Nash and Sutcliffe, 1970). It is commonly used to evaluate the fit of modelled to observed streamflow data, and the definition and discussion below assume that it is being applied in this context. However, the NSE can be used to evaluate the fit between time series of any type.

The NSE is defined as:

Equation 1

where

Q_obs,i is the observed flow for time step i

Q_mod,i is the modelled flow for time step i

N is the number of time steps

The time step size is arbitrary.

Interpretation

The NSE can range between -∞ and 1.

NSE = 1 corresponds to a perfect match between modelled and observed data
NSE = 0 indicates that the model predictions are as accurate as the mean of the observed data
NSE < 0 indicates that the mean of the observed data is a better predictor than the model

The NSE is sensitive to the timing of flow events. It is often applied on a daily time step. Applying it on a longer time step, such as monthly, can be used to evaluate the fit to the monthly pattern of flows without being influenced by the timing of individual runoff events.

Sensitive to extreme values and insensitive to small values. For example, the NSE is generally not suitable for evaluating the fit to low flows as the value will be dominated by the fit to high flows

Links

TBA

NSE of Log Data (NSE Log)

Definition

NSE Log is the standard NSE statistic (equation (1)) applied to the logarithm of flow data:

Equation 2

where

c is a positive constant equal to the maximum of 1 ML and the 10^th percentile of the observed flow

other terms are as defined in equation (1)

As with the standard NSE, the time step size is arbitrary. The NSE Log cannot be applied to time series with negative values, as the logarithm of a number less than or equal to zero is undefined.

Interpretation

Using the logarithm of flows has the effect of reducing the sensitivity of the statistic to high flows and increasing the sensitivity to low and mid-range flows. For this reason, NSE Log is often used where low-flow performance is important. The use of the constant c de-emphasises very small flows, which tend to be unreliable, and avoids numerical problems arising from attempting to calculate the logarithm of zero flows.

The NSE Log can range between -∞ and 1 and the interpretation is the same as for the NSE, but applied to log data.

Links

TBA

Absolute Value of the Relative Bias

Definition

This objective function will produce a match on the overall volume of flow generated but often will produce a poor fit to the timing of flows (Vaze et al., 2011). It has the following form:

Equation X

Interpretation

Links

TBA

Bias Penalty

Definition

The bias penalty objective function is described in Viney et al. (2009). The equation is given by:

Equation X

where B is the absolute value of the relative bias, as defined in equation (4).

In Source, the Bias Penalty is always used in combination with other objective functions and is not available on its own.

Interpretation

Links

Pearson's Correlation

Definition

Interpretation

Links

TBA

Flow Duration

Definition

The Flow Duration statistic is calculated by sorting the observed and modelled data values in increasing order and then calculating the NSE (equation (1)) using the sorted data.

It can be applied for any time step size.

Interpretation

The Flow Duration is insensitive to the timing of flows and instead measures the fit to the overall distribution of flow magnitudes. It is sensitive to high flows and less sensitive to low flows.

Links

TBA

Flow Duration of Log Data (Log Flow Duration)

Definition

The Log Flow Duration statistic is calculated applying the Flow Duration to log transformed data:

Equation X

Interpretation

The Log Flow Duration measures the fit to the overall distribution of flow magnitudes and it is sensitive to low and mid-range flows.

Links

TBA

Sum of Daily Flows and Daily Exceedance (Flow Duration) Curve and Bias (SDEB)

Definition

The SDEB statistic was proposed by Lerat et al. (2013), based on a function introduced by Coron et al. (2012). It combines three terms:

the sum of errors on power transformed flow,
the same sum on sorted flow values and
the relative simulation bias.

The SDEB equation is:

Equation X

where:

α is a weighting factor set to 0.1
λ is an exponent set to 0.5
N is the number of time steps
Q_obs,i is the observed flow for time step i
Q_mod,i is the modelled flow for time step i
R_Qobs,k is the k’th ranked observed flow of a total of N ranked flows
R_Qsim,k is the k’th ranked modelled flow of a total of N ranked flows

The SDEB statistic is designed to be applied to daily data.

Interpretation

The coefficient α and the power transform λ are used to balance the three terms within the objective function.

The weighting factor α is used to reduce the impact of the timing errors on the objective function. This type of error can have a significant effect on the first term in equation (X), where a slight misalignment of observed and simulated peak flow timing can result in large amplitude errors. The second term is based on sorted flow values, which remain unaffected by timing errors. By way of example, in their study of the Flinders and Gilbert Rivers in Northern Australia, Lerat et al. (2013) used values of α of 0.1 for the Flinders calibration and 1.0 for the Gilbert calibration.
Using values of power transform λ of less than 1 has the effect of reducing the weight of the errors in high flows, where the flow data are known to be less accurate. Lerat et al. (2013) found that a power transform of ½ led to the best compromise between high and low flow performance in their project. This value has been adopted in Source.

Links

TBA

References

Coron, L., V. Andrassian, P. Perrin, J. Lerat, J. Vaze, M. Bourqui and F. Hendrickx (2012) Crash testing hydrological models in contrasted climate conditions: an experiment on 216 Australian catchments. Water Resources Research, 48, W05552, doi:10.1029/ 2011WR011721.

Lerat, J., C.A. Egan, S. Kim, M. Gooda, A. Loy, Q. Shao and C. Petheram (2013) Calibration of river models for the Flinders and Gilbert catchments. A technical report to the Australian Government from the CSIRO Flinders and Gilbert Agricultural Resource Assessment, part of the North Queensland Irrigated Agriculture Strategy. CSIRO Water for a Healthy Country and Sustainable Agriculture flagships, Australia.

Nash, J.E. and J.V. Sutcliffe (1970) River flow forecasting through conceptual models part I — A discussion of principles. Journal of Hydrology, 10 (3), 282–290.