EVO

Q: I am seeking clarification regarding the evaluation of R² and CV(RMSE) in the M&V guidelines. Specifically, is the evaluation of these metrics intended to be performed on the full training dataset, or should it incorporate the testing dataset or cross-validation techniques?

In machine learning practice, there are well-established reasons to avoid relying solely on the full training dataset for model evaluation, as it can lead to overfitting and does not necessarily reflect the model's performance on unseen data. This approach contrasts with my current understanding of the IPVMP guidelines, which seem to imply an evaluation on the training set alone.

Could you please provide guidance on this matter to align our model evaluation process with the best practices in the field? Thank you for your time and expertise.

A: The evaluation of R2 and CV(RMSE) is based on ordinary least squares regression models, and their values should be calculated and checked for the training period data sets from which they are developed. This is a minimum set of criteria for evaluating models.

IPMVP recognizes that there are additional ways to check model validity, including out-of-sample testing and cross-validation. If done properly, these methods provide additional information about a model’s prediction capabilities and validity. IPMVP currently has no recommendations for how these methods should be performed or what the criteria would be. However, IPMVP does not restrict M&V practitioners from using more advanced methods to demonstrate the validity of models used in the M&V analysis.

IPMVP agrees with your point about machine learning algorithms and encourages you to address this concern using more advanced methods to demonstrate model validity when using such modeling methods. IPMVP is working to assess and include these advanced model validity methods in future guidance documents.