Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Statistics are generated and may be analysed in many parts of Source. This page provides information on the statistical functions available in Source including what they measure and how they're interpreted. For information on how to use statistics in Source, see the relevant section of the user guide.

Types in general, how they're used, 

link to optimisation

In Results Manager, statistics are User generated and auto generated

Locations

Where statistics are put in, created, may be analysed 

In Results Manager, statistics are categorised as either user or auto generated. 

Univariate and Bivariate Statistics

Univariate statistics provide information on single ......

Bivariate statistics .....

  • It is common for hydrological time series to contain missing values and to have differing start and end dates. Generally, Source calculates bivariate statistics using only data from those time steps for which there are complete data pairs. *TODO: where is this not true? Bivariate statis in charting??

See Calibration analysis - SRG/wiki/spaces/TIME/pages/56721988

Nash-Sutcliffe Efficiency (NSE) 

Definition

The NSE is a normalised statistic that measures the relative magnitude of the model error variance compared to the measured data variance (Nash and Sutcliffe, 1970). It is commonly used to evaluate the fit of modelled to observed streamflow data, and the definition and discussion below assume that it is being applied in this context. However, the NSE can be used to evaluate the fit between time series of any type.

The NSE defined as:

...

Image Removed

where 

Qobs,i    is the observed flow for time step i

Qmod,i    is the modelled flow for time step i

N           is the number of time steps

The time step size is arbitrary.

Interpretation

The NSE can range between -∞ and 1. 

  • NSE = 1 corresponds to a perfect match between modelled and observed data
  • NSE = 0 indicates that the model predictions are as accurate as the mean of the observed data
  • NSE < 0 indicates that the mean of the observed data is a better predictor than the model

The NSE is sensitive to the timing of flow events. It is often applied on a daily time step. Applying it on a longer time step, such as monthly, can be used to evaluate the fit to the monthly pattern of flows without being influenced by the timing of individual runoff events.

Sensitive to extreme values and insensitive to small values. For example, the NSE is generally not suitable for evaluating the fit to low flows as the value will be dominated by the fit to high flows

NSE of Log Data (NSE Log)

Definition

NSE Log is the standard NSE statistic (equation (1)) applied to the logarithm of flow data:

...

Image Removed

where

 Image Removed

c is a positive constant equal to the maximum of 1 ML and the 10th percentile of the observed flow

other terms are as defined in equation (1)

As with the standard NSE, the time step size is arbitrary. The NSE Log cannot be applied to time series with negative values, as the logarithm of a number less than or equal to zero is undefined.

Interpretation

Using the logarithm of flows has the effect of reducing the sensitivity of the statistic to high flows and increasing the sensitivity to low and mid-range flows. For this reason, NSE Log is often used where low-flow performance is important. The use of the constant c de-emphasises very small flows, which tend to be unreliable, and avoids numerical problems arising from attempting to calculate the logarithm of zero flows.

The NSE Log can range between -∞ and 1 and the interpretation is the same as for the NSE, but applied to log data.

Links

Absolute Bias

Definition

Interpretation

Links

Bias Penalty

Definition

Interpretation

Links

Pearson's Correlation

Definition

Interpretation

Links

Flow Duration

Definition

The Flow Duration statistic is calculated by sorting the observed and modelled data values in increasing order and then calculating the NSE (equation (1)) of the sorted data.

It can be applied for any time step size.

Interpretation

The Flow Duration is insensitive to the timing of flows and instead measures the fit to the overall distribution of flow magnitudes. It is sensitive to high flows and less sensitive to low flows.

Links

Flow Duration of Log Data (Log Flow Duration)

Definition

The Log Flow Duration statistic is calculated applying the Flow Duration to log transformed data:

...

Image Removed

Interpretation

The Log Flow Duration measures the fit to the overall distribution of flow magnitudes and it is sensitive to low and mid-range flows.

Links

Sum of Daily Flows and Daily Exceedance (Flow Duration) Curve and Bias (SDEB)

Definition

The SDEB statistic combines three terms:

  1. the sum of errors on power transformed flow, 
  2. the same sum on sorted flow values and 
  3. the relative simulation bias.

This objective function is based on the function introduced by Coron et al. (2012) and has been successfully applied in a number of projects (e.g. Lerat et al., 2013). It has the following equation:

...

Image Removed

where:

α is a weighting factor set to 0.1
λ is an exponent set to 0.5
RQobs,k is the k’th ranked observed flow of a total of N ranked flows
RQsim,k is the k’th ranked modelled flow of a total of N ranked flows
Other terms are as defined previously.

Interpretation

The coefficient α and the power transform λ are used to balance the three terms within the objective function.

...

Links

References

Coron, L., V. Andrassian, P. Perrin, J. Lerat, J. Vaze, M. Bourqui and F. Hendrickx (2012) Crash testing hydrological models in contrasted climate conditions: an experiment on 216 Australian catchments. Water Resources Research, 48, W05552, doi:10.1029/ 2011WR011721.

Lerat, J., C.A. Egan, S. Kim, M. Gooda, A. Loy, Q. Shao and C. Petheram (2013) Calibration of river models for the Flinders and Gilbert catchments. A technical report to the Australian Government from the CSIRO Flinders and Gilbert Agricultural Resource Assessment, part of the North Queensland Irrigated Agriculture Strategy. CSIRO Water for a Healthy Country and Sustainable Agriculture flagships, Australia.

Nash, J.E. and J.V. Sutcliffe (1970) River flow forecasting through conceptual models part I — A discussion of principles. Journal of Hydrology, 10 (3), 282–290.There are two main categories of statistics in Source:

  • Univariate statistics provide information on a single variable and are intended to summarise and reveal patterns in that variable, see Univariate Statistics SRG.
  • Bivariate statistics compare two variables for the purpose of determining empirical relationships between them, see Bivariate Statistics SRG.

Locations

Statistics are generated and may be analysed in many parts of Source: