By Eric Mazzi *
In Issue #10 of the M&V Focus, an article published by John Avina presents an analysis concluding that Cv(RMSE) requirements should be relaxed for Option C regression models. Avina’s analysis of Cv(RMSE) is introduced and framed using several egregious claims and significant omissions. The effect of accepting these claims, omissions, and conclusions is nothing short of a rejection of basic, well-proven scientific methods and principles. In this article I describe these egregious claims and omissions. My purpose is to encourage M&V stakeholders, such as facility owners, M&V practitioners, and financing organizations to maintain the integrity of the practice of M&V by continuing to accept and utilize science-based practices embedded in all M&V protocols.
Before listing the egregious claims and omissions in Avina’s M&V Focus article, it’s important to understand this is only his latest article promoting a thesis to discard the use of proven statistical tools and scientific uncertainty in the practice of M&V. In 2021, Avina published an article1 “Why do we calculate uncertainty?” where he concludes about uncertainty that “It is an arbitrary concept, which really should not be applied at this time.” Then in 2022, he published another article2 dismissing R2, Cv(RMSE) and fractional savings uncertainty (FSU) as being unscientific. Avina wrote that these three parameters “…are not based in reality … inconsistent, unscientific, arbitrary …” He goes on to say “The entire concept of using these indicators to prove the soundness of a regression model is unscientific…” and “They are invented unscientific concepts that can never be tested against reality.” More than ten times in his article he describes these three parameters as unscientific or having no basis in reality. He did not stop there, and went further to state “They are agreed upon figments of our collective imagination composed for us by authorities such as ASHRAE and EVO.” He stated that EVO and ASHRAE, by incorporating these three parameters in their protocols are acting “…akin to the early Church fathers (politicians in practice), who, in the 4th century Council of Nicaea selected the Christian canon and the gospels that everyone in the Roman Empire had to believe in.”
It is important to highlight that Cv(RMSE) as it is used in M&V is the ratio of the standard error of an OLS regression model divided by the mean of measured energy values. Avina accepts that metered numbers are scientific, but he argues forcefully that Cv(RMSE) is not scientific. This means he argues that standard errors are unscientific. Moreover, if standard errors are unscientific, then p-values must be as well because p-values and standard errors are interrelated in statistics. However, the use of standard errors (as well as R2 and p-values) for statistical models are ubiquitous in both research and applied sciences such as econometrics, hydrology, epidemiology, biostatistics, modern data science, finance, and countless other scientific disciplines. Avina’s unsubstantiated dismissal of standard errors as being unscientific is wrong. Similarly, the concept of uncertainty is central to the practice of all sciences, irrespective as to whether statistical tools are used or not. I summarized the scientific basis of uncertainty and cited several sources in my reply to Avina’s 2022 article3. Declaring the use of uncertainty as unscientific is an astonishing claim that is greatly misguided.
In the context of declaring R2, Cv(RMSE), and uncertainty as being unscientific in his previous articles, Avina makes the following egregious claims and omissions in his M&V Focus article:
1. In his Background material, Avina begins by stating “In the past I have questioned using Cv(RMSE) as a means of deciding whether a linear regression model is acceptable to use or not.” To state that he only “questioned” the use of Cv(RMSE) is a reframing that grossly misrepresents what he actually wrote, as cited and quoted above. The reality is that he dismissed Cv(RMSE), R2 and FSU as unscientific figments of imagination with no basis in real world systems.
2. Next, Avina claimed that I am in agreement with his thesis. He stated that “Recently, Professor Eric Mazzi wrote that many in the statistics community are starting to question the value of R2 and Cv(RMSE) as measures to determine whether to use a regression model or not. ‘Statistically significant’ is becoming an outmoded term. So, perhaps I am not alone on this point after all. It appears others are realizing this as well.” This is not true. Readers are encouraged to actually read my article3. The article carefully describes the scientific foundations of R2, Cv(RMSE), and uncertainty. What I concluded was “Parameters such as R2 and Cv(RMSE) are essential elements of the process to relate the statistical model to a real-world system. Put another way, using a statistical model [e.g., Electricity kWh/day = α + β * (CDD/day)] without considering any parameters to assess the model acceptability would be scientifically unsound. As such, I disagree with Avina’s assertion …” I stated that these parameters are “essential,” based on “sound science,” and I explicitly stated that “I disagree” with Avina’s thesis. It is a false statement to say that we are in agreement.
3. In the same quote as point #2 above, Avina implied that statisticians were also in agreement with him because some leading statisticians have argued to discontinue the use of the phrase “statistically significant.” Avina referred to an article published in the American Statistician4 which I had cited in my article. What Wasserstein and colleagues actually argued is that the practice of using rigid thresholds for statistical parameters to accept or reject models, such as p < 0.05, should be avoided. For example, they stated that p = 0.051 should not automatically invalidate a statistical model. They did not argue that p-values are unscientific with no basis in reality. Wasserstein highlighted that using the phrase “statistically significant” has some drawbacks, and recommended alternative ways of describing the use of statistical models to draw scientific conclusions. The Wasserstein article explicitly described the use of standard errors and uncertainty as sound science, while Avina has unambiguously declared both of these parameters to be unscientific. In fact, there is an entire section in the Wassertein article titled “Accept Uncertainty” which readers are encouraged to read. Avina’s implied claim of agreement with statisticians is not based on reality.
4. In his M&V Focus article, Avina claimed that “The general consensus of the experts is that the R2 value should be ignored…” This is yet another egregious claim. Avina did not identify these experts or provide any citations. M&V stakeholders should be aware that all commonly used M&V protocols including ASHRAE5, IPMVP6, ISO7, FEMP8, and DOE9 all specify the use of R2. Avina omits mentioning that all of these protocols specify the use of R2, and none state to ignore R2. Avina may be referring to one past article in M&V Focus10. However, it should be noted that this article argued for the use of Cv(RMSE) and uncertainty in lieu of R2 (e.g. “.. the importance of CvRMSE in assessing savings uncertainty…”), while Avina claimed that Cv(RMSE) and uncertainty are unscientific. There is no evidence of a consensus to ignore R2. In fact, the opposite is true.
5. The stated purpose of Avina’s article is to examine the case for relaxing the value of Cv(RMSE) used to accept or reject a regression model. However, Avina omitted mentioning that there are widely-used and proven practices for addressing situations where regression model criteria are not met, such as low values for R2 or high values of Cv(RMSE)11. These practices include seeking additional independent variables, eliminating poor variables, shifting or extending measurement periods, considering different model forms, and collecting additional data. Omitting any mention of these practices indicates that Avina’s article is biased.
Avina’s erroneous claims and omissions in the introduction of the analysis of Cv(RMSE) represent an inaccurate and biased framing of the utility of Cv(RMSE), as well as an unsupported dismissal of the validity of statistical parameters and uncertainty in general. It is not surprising that Avina’s analysis concludes there is evidence to relax Cv(RMSE) criteria. He frames Cv(RMSE) as being an unscientific figment of imagination that is not based on reality, and infers that it is used because it is imposed by EVO and ASHRAE as authorities behaving akin to 4th century religious politicians. With this framing, what other conclusion could be drawn? His article is a clear case of an analysis conducted to support a pre-determined conclusion.
M&V of energy projects is a critical function with real-world importance, such as climate change mitigation and responsible financing of green projects. Stakeholders such as facility owners, M&V practitioners, and financing organizations are encouraged to maintain the technical integrity of M&V. Integrity will be preserved by utilizing proven, science-based practices which are used in countless, practical scientific applications as well as essentially all M&V protocols.
REFERENCES
1. Avina J. (2021) “Why Do We Calculate Uncertainty” International Journal of Energy Management, Vol 3, Issue #3.
2. Avina J. (2022) “Statistics and Reality— Addressing the Inherent Flaws of Statistical Methods Used in Measurement and Verification” IJEM Vol 4, Issue #1.
3. Mazzi E. (2022) “Commentary on Article “Statistics and Reality—Addressing the Inherent Flaws of Statistical Methods Used in Measurement and Verification” IJEM Vol 4, Issue #2.
4. Wasserstein et al (2019). “Moving to a World Beyond ‘p < 0.05’ ” The American Statistician, 73:sup1, 1-19.
5. ASHRAE 14-2014 “Measurement of Energy, Demand, and Water Savings” explicitly describes the use of R2.
6. “International Performance Measurement & Verification Protocol Core Concepts” EVO 10000 – 1:2022.
7. ISO 17741 “General technical rules for measurement, calculation and verification of energy savings of projects” 2016-05-01. This protocol explicitly cites the use of EVO’s IPMVP Uncertainty Guide, which includes R2.
8. FEMP is “M&V Guidelines: Measurement and Verification for Performance-Based Contracts Version 4.0” Prepared for the U.S. Department of Energy Federal Energy Management Program. November, 2015. This protocol explicitly cites the use of EVO’s IPMVP Uncertainty Guide, which includes R2.
9. U.S. Department of Energy and Cadmus (2018) “The Uniform Methods Project: Methods for Determining Energy Efficiency Savings for Specific Measures” explicitly includes R2. Also see: DOE’s “50001 Ready Measurement & Verification Protocol” (19 April, 2017).
10. Stetz, M. (2019) “Why r2 Doesn’t Matter” M&V Focus Issue #5.
11. See these three sources: 1) “Uncertainty Assessment for IPMVP” EVO 101000 – 1:2019, section 1.8; 2) Bonneville Power Authority (2018) “Regression for M&V: Reference Guide” (Contract Number 00077045); and 3) DOE (2018) “The Uniform Methods Project: Methods for Determining Energy Efficiency Savings for Specific Measures” Chapter 14, p.16.
(*) Eric Mazzi is a member of EVO's IPMVP Committee.