Bivariate Statistics SRG
Introduction
The bivariate statistics available in Source were originally intended for the purpose of calibrating hydrological models, particularly for evaluating the fit between observed and modelled streamflow and all statistics were limited to only accepting flow data types as input. Since some of the statistics are generic, user-requested changes over time have removed this limitation on the statistics, allowing the relationship between any time series or ordered variable types to be evaluated and Source will allow a user to select any two time series from the available recorders. However, the user should still consider very carefully the relevance of a chosen bivariate statistic to the data type selected. The Interpretation and Discussion section of each statistic provides background on the relevant use of each statistic.
The main types of bivariate statistics available in Source are:
- Nash-Sutcliffe Efficiency (NSE)
- NSE of Log Data (NSE Log)
- Bivariate Statistics SRG#Relative Bias
- Bivariate Statistics SRG#Bias Penalty
- Pearson's Correlation Coefficient
- NSE of Flow Duration (Flow Duration)
- NSE of Flow Duration of Log Data (Log Flow Duration)
- Sum of Daily Flows, Daily Exceedance (Flow Duration) Curve and Bias (SDEB)
- Kling-Gupta Efficiency (KGE)
Source also offers a range of composite statistics that combine the NSE with other metrics. These are discussed in the section on Composite Bivariate Statistics Involving the NSE:
- NSE Daily & Bias Penalty
- NSE Log Daily & Bias Penalty
- NSE Monthly & Bias Penalty
- NSE Daily & Flow Duration
- NSE Daily & Log Flow Duration
There are three variants of the composite statistic that combine the KGE with Bias Penalty and are discussed in the section on Composite Bivariate Statistics Involving the KGE:
- Trotter
- Split Trotter
- Split Trotter Weighted
Each of the bivariate statistics is described below. For further information, the reader is referred to Vaze et al. (2011, Section 6), who discuss the use and interpretation of most of the bivariate statistics available in Source. It is also recommended that users can gain useful insight into hydrological performance measures and evaluation criteria by exploring Moriasi et al. (2015).
Treatment of Missing Data
By default, Source calculates bivariate statistics using only data from those time steps for which there are complete data pairs (overlapping data). The Statistics tab in the Results Manager allows the user to calculate bivariate statistics using all data or to use only overlapping data (default). Results of bivariate statistics calculated using all values in both time series should be interpreted with caution and this option is more valid for comparing time series using the available univariate statistics.
Nash-Sutcliffe Efficiency (NSE)
Definition
The NSE is a normalised metric that measures the relative magnitude of the model error variance compared to the measured data variance (Nash and Sutcliffe, 1970). It is defined as:
Equation 1 |
where:
Qobs,i is the observed flow for time step i
Qmod,i is the modelled flow for time step i
N is the number of time steps
The time step size is arbitrary and the statistic uses the time step and unit of the input data.
Interpretation
The NSE can range between -∞ and 1, where:
- NSE = 1 corresponds to a perfect match between modelled and observed data
- NSE = 0 indicates that the model predictions are as accurate as the mean of the observed data
- NSE < 0 indicates that the mean of the observed data is a better predictor than the model
The NSE is sensitive to the timing of flow events and to extreme values. It is commonly applied on a daily time step (or shorter) to evaluate the model's ability to represent the timing of flow peaks and recession rates. Applying it on a longer time step, such as monthly, can be used to evaluate the fit to the pattern of flows without considering individual runoff events. The NSE is not suitable for evaluating a model's fit to low flows as the statistic will tend to be dominated by errors in the high flows.
The NSE is not very sensitive to systematic model over- or under-prediction, especially during low flow periods. Moriasi et al (2015) discuss some of the advantages and disadvantages of the NSE statistic as well as presenting an indication of ‘acceptable’ performance range. According to Moriasi et al (2015) the NSE statistic can be applied to various data types including flow and water quality.
Discussion
An alternative, but equivalent, formulation of the NSE is:
Equation 2 |
This formulation obviates the necessity to calculate the average of the observed flows before evaluating the denominator in the traditional version.
NSE of Log Data (NSE Log)
Definition
NSE Log is the standard NSE metric (Equation 1) applied to the logarithm of flow data, based on a form proposed by Croke et al (2005) and Croke et al (2006):
Equation 3 |
where:
c is a positive constant equal to the maximum of 1 ML/d (megalitre/day) (after Lerat et al, 2013) and the 10th percentile (90% flow exceedance) of the observed non-zero flows (after Croke et al, 2006). Other terms are as defined in Equation 1.
As with the standard NSE, the time step size is arbitrary. The NSE Log cannot be applied to time series with negative values, as the logarithm of a number less than or equal to zero is undefined.
Interpretation
Using the logarithm of flows has the effect of reducing the sensitivity of the metric to high flows and increasing the sensitivity to low and mid-range flows. For this reason, NSE Log is often used for model calibration when low-flow performance is important. The use of the constant c de-emphasises very small flows, which tend to be unreliable, and avoids numerical problems with attempting to calculate the logarithm of zero flows.
The NSE Log can range between -∞ and 1 and the interpretation is the same as for the NSE, but applied to log data.
Discussion
The 10th percentile is interpreted as the value below which 10% of the non-zero observed flows occur. Lerat et al (2013) proposed the use of 1ML/d as an option for c as the threshold below which low flow measurements at sand and gravel controlled gauging stations can be considered not appropriate. This form of the NSE can be considered particularly useful in catchments where intermittent or very low flows are common.
The constant value applied in this statistic has been derived with the intent of improving representation of low flows. In keeping with making Source statistics available for all data types, Source will allow the use of this log version of the NSE for non-flow data. However, it should be noted that the statistic was conceptually derived for flow-based values.
Relative Bias
Definition
The relative bias measures the magnitude of the model errors relative to the magnitude of the observations. Use of this statistic is described in Croke et al (2005) and Moriasi (2015). It has the form:
Equation 4 |
where:
Qobs,i is the observed flow for time step i
Qmod,i is the modelled flow for time step i
N is the number of time steps
Interpretation
The relative bias, B, ranges from -∞ to +∞, where:
- B < 0 indicates that the modelled data underestimates the observed data
- B = 0 indicates that the model is unbiased
- B > 0 indicates that the modelled data underestimates the observed data
The relative bias measures the overall error in the volume of modelled flow, it does not measure the model's fit to the timing of flows (Vaze et al., 2011).
Discussion
Common variations of the relative bias are to express it as a percent and/or as an absolute value:
- as a percent - called Volume Bias % in Source. Volume Bias % can be found in the Statistics tab in Results Manager. The bias in the modelled values expressed as a percent of the observed flow volume is defined as:
Equation 5 |
- an absolute value - called Minimise Absolute Bias in Source. Minimise Absolute Bias is available as an objective function for calibration and as one of the statistics in the Statistics tab in Results Manager. The absolute value of the relative bias is defined as:
Equation 6 |
Bias Penalty
Definition
The bias penalty is a log transformation of the absolute value of the relative bias. It was proposed by Viney et al. (2009) and is defined as:
Equation 7 |
where B is the relative bias, as defined in Equation 4.
Interpretation
The bias penalty ranges from 0 to +∞, where a value of 0 indicates that the model is unbiased.
In Source, the bias penalty is always used in combination with the NSE and is not available on its own. It is designed to be used in model calibration to penalise biased solutions. Refer to Viney et al. (2009) for a discussion of the advantages of the bias penalty compared to the absolute value of the relative bias.
Pearson's Correlation Coefficient
Definition
Pearson's correlation coefficient measures the linear correlation between two variables and is available in Source in the Results Manager statistics tab for Bivariate Statistics. The Pearson's correlation coefficient is given by:
Equation 8 |
where:
xi is the value of time series x at time step i
yi is the value of time series y at time step i
The time step size is arbitrary. Although typically in hydrology x references the Observed flows and y references the Modelled flows, Pearson's correlation coefficient is symmetric, meaning that the value will be the same regardless of which time series is defined as x and which as y.
Interpretation
Pearson's correlation ranges from -1 to 1 where:
- r = -1 indicates perfect negative correlation
- r = 0 indicates that there is no correlation between the two variables
- r = +1 indicates perfect positive correlation
Pearson's correlation is sensitive to the relative magnitude of data points in a time series, but not the absolute magnitude. Two time series can have a perfect correlation if they have the same "shape", even if the values are different.
NSE of Flow Duration (Flow Duration)
Definition
The NSE of Flow Duration (Equation 9) is calculated by removing overlapping missing data values, sorting the observed and modelled data values in increasing order and then calculating the NSE (as in Equation 1) of the sorted data.
Equation 9 |
where:
RQobs,k is the k'th ranked observed flow of a total of N ranked flows RQsim,k is the k'th ranked modelled flow of a total of N ranked flows
It can be applied for any time step size.
Interpretation
The Flow Duration measures the fit to the distribution of flow magnitudes and does not consider the timing of flows. It is sensitive to high flows and less sensitive to low flows.
NSE of Flow Duration of Log Data (Log Flow Duration)
Definition
The NSE of the Flow Duration of log data is based on the same equation as the NSE of log data (Equation 3) but is calculated applying the Flow Duration to the log transformed data
Equation 10 |
where c is calculated as in the NSE of Log Data
Interpretation
The Log Flow Duration measures the fit to the distribution of flow magnitudes and it is sensitive to low and mid-range flows.
Sum of Daily Flows, Daily Exceedance (Flow Duration) Curve and Bias (SDEB) (now called Square-root Daily, Exceedance, and Bias)
Definition
The SDEB metric was proposed by Lerat et al. (2013), based on a function introduced by Coron et al. (2012). It combines three terms:
- the sum of errors on power transformed flow,
- the same sum on sorted flow values and
- the relative simulation bias.
The SDEB equation is:
Equation 11 |
where:
α is a weighting factor set to 0.1
λ is an exponent set to 0.5
μ is applied as a power to the bias term (the last brackets above).
N is the number of time steps
Qobs,i is the observed flow for time step i
Qmod,i is the modelled flow for time step i
RQobs,k is the k'th ranked observed flow of a total of N ranked flows
RQsim,k is the k'th ranked modelled flow of a total of N ranked flows
The SDEB metric is designed to be applied to daily data.
Interpretation
The SDEB ranges from 0 to +∞, where a value of 0 indicates a perfect fit between modelled and observed data.
The coefficient α and the power transform λ are used to balance the three terms within the objective function.
- The weighting factor α is used to reduce the impact of the timing errors on the objective function. This type of error can have a significant effect on the first term in equation (X), where a slight misalignment of observed and simulated peak flow timing can result in large amplitude errors. The second term is based on sorted flow values, which remain unaffected by timing errors. By way of example, in their study of the Flinders and Gilbert Rivers in Northern Australia, Lerat et al. (2013) used values of α of 0.1 for the Flinders calibration and 1.0 for the Gilbert calibration.
- Using values of power transform λ of less than 1 has the effect of reducing the weight of the errors in high flows, where the flow data are known to be less accurate. Lerat et al. (2013) found that a power transform of ½ led to the best compromise between high and low flow performance in their project. This value has been adopted in Source.
Kling-Gupta Efficiency (KGE)
The Kling-Gupta Efficiency (KGE, after Gupta et al., 2009) is increasingly being used for model calibration and evaluation . The KGE is given by:
Equation 12 |
where 𝑟 is the linear correlation between observations and simulations (Pearson’s Correlation Coefficient), 𝛼 is the measure of the flow variability error and 𝛽 represents the bias.
𝛼 is given by:
Equation 13 |
where is the standard deviation and Qsim and Qobs are the simulated flows and observed flows respectively.
𝛽 is given by:
Equation 14 |
where represents the mean
Interpretation
Similar to NSE, KGE varies between -∞ and 1, where a value of 1 indicates a perfect agreement between simulations and observations. Generally, it is considered that positive KGE values indicate better agreement between observed and modelled flows. However, KGE <0 does not necessarily indicate worse performance than the mean flow benchmark and unlike NSE, KGE does not have an inherent benchmark value to mark a distinction between ‘good’ and ‘bad’ models (Knoben et al., 2019). Therefore, when using KGE as the objective function it is recommended to have a benchmark value to compare the performance of models.
KGE assesses several aspects of model performance such as correlation, bias and variance. KGE is a more balanced objective function and can measure the accuracy of model predictions and reproduce the variability and timing of observed data. It is less sensitive to extreme values. While NSE may penalize models that have bias even if they have good correlation and variance, KGE separates out the effects of bias, correlation, and variance which provides a clearer picture of model performance.
Composite Bivariate Statistics Involving the NSE
Introduction
Often, when comparing two streamflow hydrographs or other time series data, no single metric measures all the characteristics of interest. Model calibration is usually performed using a composite objective function that combines two or more individual metrics. The SDEB equation is an example. Source also offers a number of other composite statistics that involve various combinations of the NSE. These are:
- NSE Daily & Bias Penalty
- NSE Log Daily & Bias Penalty
- NSE Monthly & Bias Penalty
- NSE Daily & Flow Duration
- NSE Daily & Log Flow Duration
Some of the composite statistics allow the user to choose a weighting that determines the relative importance of each metric in the overall function.
NSE Daily & Bias Penalty
Definition
Equation 15 | NSE Daily & Bias Penalty = NSE Daily – Bias Penalty |
where:
NSE Daily is the NSE of daily flows as defined in Equation 1
Bias Penalty is defined in Equation 7
Interpretation
The NSE Daily & Bias Penalty function is designed to ensure that a model is calibrated primarily to optimise NSE while ensuring a low bias in the total streamflow. However, the function will be strongly influenced by moderate and high flows and by the timing of runoff events, and can result in poor fits to low flows.
NSE Log Daily & Bias Penalty
Definition
Equation 16 | NSE Log Daily & Bias Penalty = NSE Log Daily – Bias Penalty |
where:
NSE Log Daily is the NSE of the logarithm of daily flows, as defined in Equation 3
Bias Penalty is defined in Equation 7
Interpretation
The NSE Log Daily & Bias Penalty function is similar to the NSE Daily & Bias Penalty, but the use of the logarithm of flows puts an increased emphasis on low-flow performance.
NSE Monthly & Bias Penalty
Definition
Equation 17 | NSE Monthly & Bias Penalty = NSE Monthly – Bias Penalty |
where:
NSE Monthly is the NSE of monthly flows, as defined in Equation 1
Bias Penalty is defined in Equation 7
Note that, the aggregation of daily flows to monthly may be performed differently in different areas of Source. The Calibration Wizard, for example, uses the sum of the daily modelled and observed flows for each month.
Interpretation
The NSE Monthly & Bias Penalty function is similar to the NSE Daily & Bias Penalty, but the use of monthly flows means that it is not sensitive to the timing of individual runoff events.
NSE Daily & Flow Duration
Definition
Equation 18 | NSE Daily & Flow Duration = a * NSE Daily + (1 - a) * Flow Duration |
where:
a is a user-defined weighting factor (0 ≤ a ≤ 1)
NSE Daily is the NSE of daily flows as defined in Equation 1
Flow Duration is defined in Equation 9
Interpretation
The NSE Daily & Flow Duration function is designed to balance a model's fit to the timing (and magnitude) of flow events and the overall distribution of flow volumes. It is sensitive to high flows and less sensitive to low flows. The user can choose the relative importance of the two objective function components.
NSE Daily & Log Flow Duration
Definition
Equation 19 | NSE Daily & Log Flow Duration = a * NSE Daily + (1 - a) * Log Flow Duration |
where:
a is a user-defined weighting factor (0 ≤ a ≤ 1)
NSE Daily is the NSE of daily flows as defined in Equation 1
Log Flow Duration is defined Equation 9 and Equation 10
Interpretation
The NSE Daily & Log Flow Duration function is similar to the NSE Daily & Flow Duration. It puts greater emphasis on the distribution of low flow volumes.
Composite Bivariate Statistics Involving the KGE
Like composite bivariate statistics involving NSE, there are some new composite objective functions which involve KGE. Those functions are described below.
Trotter
The objective function Trotter (after Trotter et al., 2023) aims at solving the tendency of the NSE and KGE to preference match high flows by the inclusion of a low flow component. Therefore, Trotter is used to ensure capturing of both high-flow and low-flow aspects of hydrograph in a model as well as obtain minimal volumetric bias (Trotter et al., 2023). The objective function T is given by:
Equation 20 |
Interpretation
The first part of the model efficiency E as in the above equation is a combination of the mean of KGE of direct flows and KGE of fifth root of flows. The use of the fifth root of flows is intended to provide greater emphasis to small flows and is a better alternative than the more common inverse or log transformations for zero-flow conditions. The fifth root of KGE is calculated by calculating the fifth root of observed and simulated data. The second part of the equation consists of bias penalisation (after Viney et al., 2009) for reducing the efficiency value when the when the volumetric bias (B) between the observed and simulated flows deviates from zero. The objective function combines advantages of both KGE and Bias Penalty.
Split Trotter
The objective function Split Trotter (after Fowler et al., 2018) is a time-based meta-objective function (an objective function that considers different aspects of a flow regime and combines them into a single-objective function) that explicitly considers different subperiods of the calibration period. This function aims to rectify the tendency of objective functions with least squares to ignore dry years in the calibration process. This is achieved by calculating the objective function (Trotter) for each subperiod and averaging them over all subperiods. The Split Trotter function ST is defined as:
Equation 21 |
where, Ti is the Trotter value for the ith subperiod, tsi is the number of time steps in the ith subperiod.
Interpretation
The numerator gives a weightage for the objective function based on the number of time steps in a subperiod and the denominator is the sum of data points over all subperiods.
In Split Trotter, it is imperative to define a start date of water year as the objective function calculates Trotter for every subperiod which is one year. If there are ‘offcuts’ such as partial years (case where water year gets truncated because of calibration start or end date) or a case where the water year is entirely within the calibration period but is missing observed data, then the Trotter calculated for those offcuts are weighted by the number of days from that offcut year that are present in the period. Thus, full year periods will get more weightage than partial years when calculating ST. ST explicitly considers model performance in separate subperiods and hence avoids undue focus on some subperiods in the calibration data.
The conditions for calculating the Split Trotter are:
- Performs the calculation only if there are more than three data points present for a split period
- If either one of the timeseries (simulated or observed) is constant for a split period, that part of data is ignored for the calculation as Trotter returns NaN in such cases.
Split Trotter Weighted
The objective function Split Trotter Weighted aims to place additional emphasis on model fitting to drier years by assigning a weight to every year. The objective function STW is defined as:
Equation 22 |
where, is the weight for the ith subperiod, calculated as:
Equation 23 |
where, Qobsi is the aggregate flows or the subperiod i for the observed time series and the other variables are as explained above.
Similar to Split Trotter, a water year needs to be defined for Split Trotter Weighted as well. The offcuts in the observed time series are dealt in the same way as that of Split Trotter.
The conditions for calculating the Split Trotter Weighted are same as that of Split Trotter.
References
Coron, L., V. Andrassian, P. Perrin, J. Lerat, J. Vaze, M. Bourqui and F. Hendrickx (2012) Crash testing hydrological models in contrasted climate conditions: an experiment on 216 Australian catchments. Water Resources Research, 48, W05552, doi:10.1029/ 2011WR011721.
Croke, B.F.W, F. Andrews, J. Spate and S.M. Cuddy (2005) IHACRES User Guide. Technical Report 2005/19. Second Edition. iCAM, School of Resources, Environment and Society, The Australian National University, Canberra. https://toolkit.ewater.org.au/Tools/IHACRES
B. F. W. Croke, F. Andrews, A. J. Jakeman, S. Cuddy and A. Luddy (2006) Redesign of the IHACRES rainfall-runoff model. Engineers Australia 29th Hydrology and Water Resources Symposium, 21–23 February 2005, Canberra
Fowler, K., G. Coxon, J. Freer, M. Peel, T. Wagener, A. Western, R. Woods and L. Zhang (2018) Simulating Runoff Under Changing Climatic Conditions: A Framework for Model Improvement. Water Resources Research, 54, 9812–9832. https://doi.org/10.1029/2018WR023989
Gupta, H.V., H. Kling, K.K. Yilmaz and G.F. Martinez (2009) Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. Journal of Hydrology, 377, 80-91, doi: http://dx.doi.org/10.1016/j.jhydrol.2009.08.003
Knoben, W.J.M., J.E. Freer and R.A. Woods (2019) Technical note: Inherent benchmark or not? Comparing Nash-Sutcliffe and Kling-Gupta efficiency scores. Hydrology and Earth System Sciences, 23(10), 4323-4331, doi: https://doi.org/10.5194/hess-23-4323-2019
Lerat, J., C.A. Egan, S. Kim, M. Gooda, A. Loy, Q. Shao and C. Petheram (2013) Calibration of river models for the Flinders and Gilbert catchments. A technical report to the Australian Government from the CSIRO Flinders and Gilbert Agricultural Resource Assessment, part of the North Queensland Irrigated Agriculture Strategy. CSIRO Water for a Healthy Country and Sustainable Agriculture flagships, Australia.
Moriasi, D.N., M. W. Gitau, N. Pai and P. Daggupati (2015) Hydrologic and water quality models: performance measures and evaluation criteria. Transactions of the ASABE, 58 (6), 1763-1785. doi: 10.13031/trans.58.10715
Nash, J.E. and J.V. Sutcliffe (1970) River flow forecasting through conceptual models part I — A discussion of principles. Journal of Hydrology, 10 (3), 282–290.
Trotter, L., M. Saft, M.C. Peel and K.J.A Fowler (2023) Symptoms of Performance Degradation During Multi-Annual Drought: A Large-Sample, Multi-Model Study. Water Resources Research, 59, e2021WR031845. https://doi.org/10.1029/2021WR031845
Vaze, J., P. Jordan, R. Beecham, A. Frost, G. Summerell (2011) Guidelines for rainfall-runoff modelling: Towards best practice model application. eWater Cooperative Research Centre, Canberra, ACT. ISBN 978-1-921543-51-7. Available via www.ewater.org.au.
Viney, N.R., J-M. Perraud, J. Vaze, F.H.S Chiew, D.A. Post and A. Yang (2009) The usefulness of bias constraints in model calibration for regionalisation to ungauged catchments. In: 18th World IMACS Congress and MODSIM09 International Congress on Modelling and Simulation, July 2009, Cairns: Modelling and Simulation Society of Australian and New Zealand and International Association for Mathematics and Computers in Simulation: 3421-3427.