Understanding robustness of analyses through cheap alternatives

The machinery for performing statistical or machine-learning-based analyses can be expensive to run in terms of the time taken and the computing resource required. In any analysis, we have to make assumptions about distributions and to set parameters. As these can be difficult to justify, we should worry about the robustness of our results: what if we changed this assumption?, what if we doubled this parameter? If the analysis is expensive, we will struggle to explore these possibilities.

Instead of ignoring this issue, we could find cheaper alternatives to the full analyses and try to understand the robustness of our analyses through exploration of that alternative.

An illustration of using a cheap alternative can be found when using a mode as a proxy for a mean. Generally speaking, modes are easy to compute because they involve differentiation whereas means involve trickier integrals. In this illustrative analysis, we have done some complex modelling and we have found that, if we set a parameter \(\beta\) to be 10, then our posterior mean for \(\theta\) is 1 and the mode is 0.9. We want to know how our results would have changed as we vary alpha over the range (1,20). We can afford to run the full analysis to get the posterior mean at \(\beta\in \{2,10,15\}\). We can get the mode at any value of \(\beta\) instantly:

The cheap approximation does a good job of capturing the dynamic of changing \(\beta\) at a fraction of the cost. Further, we could model the discrepancy between the approximation and the mean to get a corrected version of the posterior mode that is (almost) identical to the posterior mean (in this case, the discrepancy is simply \(1/\beta\)).

Research angles

What does robustness mean for statistical and machine-learning-based analyses?,
Characterisation of the gap between analyses,
Building models that link analyses of differing resolution.

Prerequisites

Exposure to different modes of statistical inference,
Confidence in using the statistical computing program R.

The following could be useful but are not necessary:

Ability to run MCMC algorithms for Bayesian analyses,
Familiarity with some machine learning techniques (e.g., random forests, k-nearest neighbours, etc.).

Some references

Csilléry, K., Blum, M.G., Gaggiotti, O.E. and François, O. (2010). Approximate Bayesian computation (ABC) in practice. Trends in ecology & evolution, 25.
Saltelli, A., Chan, K., Scott, E.M., et al. (2000). Sensitivity Analysis. New York: Wiley.
Vernon, I. and Gosling, J.P. (2023). A Bayesian computer model analysis of robust Bayesian analyses. Bayesian Analysis, 18.

Understanding robustness of analyses through cheap alternatives

John Paul Gosling

2025/26

Research angles

Prerequisites

Some references