Share event via these services

Probability Distribution Forecasts: Learning with Random Forests and Graphical Assessment

Thu Jul 8, 2021 9:15 - 9:35 (Time zone: UTC)

Regular talks > Track B

Who Lang, Moritz N.

Where Slack #talk_stats Show on map

Forecasts in terms of entire probability distributions (often called ""probabilistic forecasts"" for short) - as opposed to predictions of only the mean of these distributions - are of prime importance in many different disciplines from natural sciences to social sciences and beyond. Hence, distributional regression models have been receiving increasing interest over the last decade. Here, we make contributions to two common challenges in distributional regression modeling:

Obtaining sufficiently flexible regression models that can capture complex patterns in a data-driven way.
Assessing the goodness-of-fit of distributional models both in-sample and out-of-sample using visualizations that bring out potential deficits of these models.

Regarding challenge 1, we present the R package ""disttree"" (Schlosser et al. 2021), that implements distributional trees and forests (Schlosser et al. 2019). These blend the recursive partitioning strategy of classical regression trees and random forests with distributional modeling. The resulting tree-based models can capture nonlinear effects and interactions and automatically select the relevant covariates that determine differences in the underlying distributional parameters.

For graphically evaluating the goodness-of-fit of the resulting probabilistic forecasts (challenge 2), the R package ""topmodels"" (Zeileis et al. 2021) is introduced, providing extensible probabilistic forecasting infrastructure and corresponding diagnostic graphics such as Q-Q plots of randomized residuals, PIT (probability integral transform) histograms, reliability diagrams, and rootograms. In addition to distributional trees and forests other models can be plugged into these displays, which can be rendered both in base R graphics and ""ggplot2"" (Wickham 2016).

Tags Statistical models

Session name 7B: Statistical modeling in R

Shared by: Reader, created , last updated

useR! 2021

Probability Distribution Forecasts: Learning with Random Forests and Graphical Assessment