Oreum Industries Reference Project, 2024Q3


000_Intro.ipynb¶

Oreum Reference - Copula Regression oreum_copula¶

Demonstrate Bayesian Copula Regression Modelling using Bayesian inference and a Bayesian workflow, specifically using the pymc & arviz ecosystem.

This Intro can also be used for verbal presentation and discussion purposes, ideally followed by a deeper technical walkthrough of the project in a long-form style. Because this project is a reference, it contains huge amounts of detail which is not worthwhile to summarise too much.

The interested reader should refer to the project notebooks where we evaluate the behaviour and performance of the models throughout the workflows, including several state-of-the-art methods unavailable to conventional max-likelihood / machine-learning models.

PDF version

Oreum Industries: Technical Resources

No description has been provided for this image


What is Copula Regression?¶

We seek to create principled models that provide explanatory inference and predictions of Marginal distributions $M$ that are jointly coupled by a Latent Copula $C$, using quantified uncertainty to support real-world decision-making.

No description has been provided for this image

Motivation:

  • A classic use-case for this model architecture (in the 2-dimensional setting) is insurance claims aka incurred loss
  • We decompose the dollar value of claims into two marginal distributions: the frequency, and severity of expected loss cost, because these measures are intuitive and can behave differently, with a (highly important) degree of covariance $\Sigma$
  • If we use a naive model that doesn't account for the covariance between frequency and severity, then the model predictions for expected loss cost can be hugely wrong!

Quick Aside on decomposition of claims frequency and severity¶

We can create different decompositions for different purposes, and according to the data available. A useful one shown here is the ratio of losses per unit TIV, to generalise to policies of different TIV

$$ \begin{aligned} frq_{i} &= \frac{claim\_ct_{i}}{TIV_{i}} \ , & sev_{i} &= \frac{incurred\_total_{i}}{claim\_ct_{i}} \\ \\ \mathbb{E}_{\text{loss} \ i} &= frq_{i} * sev_{i} = \frac{incurred\_total_{i}}{TIV_{i}} \\ \end{aligned} $$

where:

  • Each policy $i \in n$ (the dataset of all policies) can have it's own (policy-level) frequency ($frq_{i} \geq 0$) and severity ($sev_{i} \geq 0$) of claim (and thus policy-level $\mathbb{E}_{\text{loss i}} \geq 0$)
  • Note $frq$ and $sev$ tend to be zero-augmented distributions (where no loss is experienced): this is a very important aspect to include in real-world model architectures
  • $claim\_ct_{i} \geq 0$ is the count of claims incurred for policy $i$
  • $TIV_{i} \gt 0$ is the Total Insured Value (TIV) for policy $i$
  • $incurred\_total_{i} \geq 0$ is the total incurred losses for policy $i$

Plug:

  • Oreum Industries is releasing a forecasting product Vulcan for DUA pricing & portfolio management that takes 3 simple tables (policy, premium_transactions, claims_transactions) and outputs probabilistic forecasts of Expected Loss Cost at policy-level

  • At the heart of the solution is a holistic, principled, open-box Bayesian copula model (for covarying claims frq-sev) that runs quickly via API to integrate within modern data processing pipelines and dashboards

  • We incorporate underwriter expertise; create hierarchical generalised linear regression on business-relevant parameters with high cardinality; allow for zero-claims inflation; auto-impute missing data; and more generally allow for small noisy datasets typical of DUA business.

Back to this presentation's focus on the copula function¶

No description has been provided for this image

Demonstration:

  • In this notebook:
    • We create a small synthetic dataset of observations of two marginals $M_{0}, M_{1}$ which have covariance $\Sigma$, and also (because we can) a version of the marginals $M_{0x}, M_{1x}$ without covariance
    • We compare the resulting values of the joint product $y = M_{0} * M_{1}$ vs $y = M_{0x} * M_{1x}$ and see that impact of ignoring the covariance is substantial.
  • In the rest of the reference guide:
    • We create a series of principled copula models using advanced architectures and Bayesian inference to fit to the data and estimate the covariance on $M_{0}, M_{1}$
    • The first model is naive and ignores the covariance, the final model is very sophisticated and estimates the covariance
    • We demonstrate a substantial 33 percentage-point improvement in model accuracy when using a copula-based model
    • This correct estimation would likely make the difference between profitable pricing / accurate reserving, or greatly loss-making business over a portfolio.

General project approach

The emphasis in this project is to build a variety of models of increasing sophistication and demonstrate their usage. We strike a balance between building up concepts & methods vs practical application & worked examples in a pymc-based Bayesian workflow.

We don't focus on specific analysis of the dataset, nor try to infer too much. The dataset is simply a good substrate on which to learn and demonstrate the variety of model architectures used herein.

We evaluate the behaviour and performance of the models throughout the workflows, including several state-of-the-art methods unavailable to conventional max-likelihood / machine-learning models

This series of Notebooks covers

  • 000_Intro.ipynb: Orientiation and fundamental concepts
  • 100_ModelA0.ipynb: Core (naive) architecture: Create priors, marginal likelihoods, but no copula
  • 101_ModelA1.ipynb: Partial architecture (extends ModelA0): Include Gaussian copula (w/ Jacobian adjustment), and several technical innovations to let pymc work with the transformations
  • 102_ModelA2.ipynb: Full architecture (extends ModelA1): Include Jacobian Adjustment on transformed observations

In this Notebook

We dive straight into Orientation and Fundamental General Abstractions with a simple real-world observational censored dataset, and then go on to demonstrate the theory and usage of an increasing sophistication of models.

Contents¶

  • Setup

  • Preamble: Why Bayes?

  • 1. Orientation: Copula Functions and Their Behaviour

  • 2. Brief Technical Summary: The Copula Model designed in this Project



Preamble: Why Bayes?¶

We gain massive advantage by using a Bayesian Framework¶

We specifically use Bayesian Inference rather than Frequentist Max-Likelihood methods for many reasons, including:

Bayesian Inference Frequentist Max-Likelihood
General formulation $\rightarrow$
Desirable Trait $\downarrow$
Bayes' Rule $$\underbrace{P(\hat{\mathcal{H}}\|D)}_{\text{posterior}} = \frac{\overbrace{P(D\|\mathcal{H})}^{\text{likelihood}} \cdot \overbrace{P(\mathcal{H})}^{\text{prior}}}{\underbrace{P(D)}_{\text{evidence}}} $$ MLE $$\hat{\mathcal{H}}^{\text{MLE}} \propto \arg\max_{\mathcal{H}} P(D\|\mathcal{H})$$
Principled model structure represents hypothesis about the data-generating process Very strong
Can build bespoke arbitrary and hierarchical structures of parameters to map to the real-world data-generating process.
Weak
Can only state structure under strict limited assumptions of model statistical validity.
Model parameters and their initial values represent domain expert knowledge Very strong
Marginal prior distributions represent real-world probability of parameter values before any data is seen.
Very weak
No concept of priors. Lack of joint probability distribution can lead to discontinuities in parameter values.
Robust parameter fitting process Strong
Estimate full joint posterior probability mass distribution for parameters - more stable and representative of the expectation for the parameter values. Sampling can be a computationally expensive process.
Weak
Estimate single-point max-aposterioi-likelihood (density) of parameters - this can be far outside the probability mass and so is prone to overfitting and only correct in the limit of infinite data. But optimization method can be computationally cheap.
Fitted parameters have meaningful summary statistics for inference Very strong
Full marginal probability distributions can be interpreted exactly as probabilities.
Weak
Point estimates only have meaningful summary statistics under strict limited assumptions of model statistical validity.

continues ...

... continued

Desirable Trait Bayesian Inference Frequentist Max-Likelihood
Robust model evaluation process Strong
Use entire dataset, evaluate via Leave-One-Out Cross Validation (best theoretically possible).
Weak
Cross-validation rarely seen in practice, even if used, rarely better than 5-fold CV. Simplistic method can be computationally cheap.
Predictions made with quantified variance Very strong
Predictions made using full posterior probability distributions, so predictions have full empirical probability distributions.
Weak
Predictions using point estimates can be bootstrapped, but predictions only have interpretation under strict limited assumptions of model validity.
Handle imbalanced, high cardinality & hierarchical factor features Very strong
Can introduce partial-pooling to automatically balance factors through hierarchical priors.
Weak
Difficult to introduce partial-pooling (aka mixed random effects) without affecting strict limited assumptions of model validity.
Handle skewed / multimodal / extreme value target variable Very strong
Represent the model likelihood as any arbitrary probability distribution, including mixture (compound) functions e.g. a zero-inflated Weibull.
Weak
Represent model likelihood with a usually very limited set of distributions. Very difficult to create mixture compound functions.
Handle small datasets Very strong
Bayesian concept assumes that there is a probable range of values for each parameter, and that we evidence our prior on any amount of data (even very small counts).
Very weak
Frequentist concept assumes that there is a single true value for each parameter and that we only discover that value in the limit (of infinite observations).
Automatically impute missing data Very strong
Establish a prior for each datapoint, evidence on the available data within the context of the model, to automatically impute missing values.
Very weak
No inherent method. Usually impute as a pre-processing step with weak non-modelled methods.

Practical Implementations of Bayesian Inference¶

We briefly referenced Bayes Rule above, which is a useful mnemonic when discussing Bayesian Inference, but in practice the crux of putting these advanced statistical techniques into practice is estimating the evidence $P(D)$ i.e. the probability of observing the data that we use to evidence the model

$$ \begin{aligned} \underbrace{P(\hat{\mathcal{H}}|D)}_{\text{posterior}} &= \frac{\overbrace{P(D|\mathcal{H})}^{\text{likelihood}} \cdot \overbrace{P(\mathcal{H})}^{\text{prior}}}{\underbrace{P(D)}_{\text{evidence}}} \\ \\ \text{...where:} \\ \\ P(D) &\sim \int_{\Theta} P(D, \theta)\ d\theta \\ \end{aligned} $$

This joint probability $P(D, \theta)$ of data $D$ and parameters $\theta$ requires an almost impossible-to-solve integral over parameter-space $\Theta$. Rather than attempt to calculate that integral, we do something that sounds far more difficult, but given modern computing capabilities is actually practical.

We use a Bleeding-edge MCMC Toolkit for Bayesian Inference: pymc & arviz¶

We use Markov Chain Monte-Carlo (MCMC) sampling to take a series of ergodic, partly-reversible, partly-randomised samples of model parameters $\theta$, and at each step compute the ratio of log-likelihoods $\log P(D|\mathcal{H})$ between a starting position (current values) $\theta_{p0}$ and proposed "sampled" position $\theta_{p}$ in parameter space, so as to reduce that log-likelihood (whilst exploring the parameter space).

This results in a posterior estimate $P(\hat{\theta}|D)$:

$$ \begin{aligned} P(\hat{\theta}|D) &\sim \frac{\overbrace{P(D|\theta_{p})}^{\text{likelihood @ proposal}} \cdot \overbrace{P(\theta_{p})}^{\text{prior @ proposal}}} {\underbrace{P(D|\theta_{p0})}_{\text{likelihood @ current}} \cdot \underbrace{P(\theta_{p0})}_{\text{prior @ current}}} \end{aligned} $$

This is the heart of MCMC sampling: for detailed practical explanations see Betancourt, 2021, Carroll, 2019, and Tweicki, 2015

We use the bleeding-edge pymc and arviz Python packages to provide the full Bayesian toolkit that we require, including advanced sampling, probabilistic programming, statistical inferences, model evaluation and comparison, and more.

No description has been provided for this image



1. Orientation: Copula Functions and Their Behaviour¶

1.1 Create Synthetic Copula Dataset¶

We can learn a lot by creating a synthetic copula dataset using a "forward-pass":

  1. Start with random data $C$ sampled from the PDF of a Latent Copula function $\square_{\mathfrak{C}}$ with 2-dimensions $$(C_{0}, C_{1}) \sim \square_{\mathfrak{C}}$$

  2. Transform each dimension of the coupled data through the CDF of the copula function $\Phi_{\mathfrak{C}}$ to yield data distributed according to a Latent Uniform distribution $U$ $$(U_{0}, U_{1}) = \Phi_{\mathfrak{C}}(C_{0}, C_{1})$$

  3. Transform each dimension of now-uniform data through the Inverse CDF of our chosen marginal distribution $\Phi^{-1}_{\mathfrak{M}}$ to yield data distributed according to "observed" Marginal distribution(s) $M$ $$(M_{0}, M_{1}) = \Phi^{-1}_{\mathfrak{M}}(U_{0}, U_{1})$$

In the following slides we'll plot the distributions and describe the transformations. Also see project class synthetic.create_copula.CopulaBuilder for details

Note we create $60$ observations split into 2 sets: $50$ for train (in-sample) and $10$ for holdout (out-of-sample)

1.2 Visualise the Synthetic Observations¶

1.2.1 View the Latent Copula (an MvN)¶

In this forward-pass to create the synthetic data, we firstly create $50$ observations of of a 2-dimensional Multivariate Normal distribution with covariance $\Sigma$

$$ \begin{aligned} (C_{0}, C_{1}) &\sim \square_{\mathfrak{C}} \\ &\sim \text{MultivariateNormal}(\mu, \Sigma, \text{shape}=2) \end{aligned} $$

This forms our Latent Copula (a Gaussian), and this is where we could get creative and use any number of alternative copula functions from the literature (e.g. Clayton, Frank, Gumbel, etc) or even create our own: the copula marginals dont have to be the same distribution

No description has been provided for this image

Observe:

  • Note the standard Normal(0,1) scaling on the marginals
  • Note the empirically-observed correlation $\rho \approx -0.7$ as defined in c_cov

1.2.2 View the Uniform-Transformed Marginals¶

In this forward-pass to create the synthetic data, next we pass each dimension of the Latent Copula $C$ through the CDF of it's own function $\Phi_{\mathfrak{C}}$ to get a Latent Uniform distribution $U$

$$ \begin{aligned} (U_{0}, U_{1}) &= \Phi_{\mathfrak{C}}(C_{0}, C_{1}) \\ &= \text{NormalCDF}(C_{0}, C_{1}) \end{aligned} $$

Regardless of the latent copula, this intermediate step will result in 2 Uniform marginals (which still have covariance)

No description has been provided for this image

Observe:

  • Now the marginals are uniform, but the correlation remains

1.2.3 View the Observed Marginals m0, m1 (post transformation)¶

In this forward-pass to create the synthetic data, next we pass each dimension of the Latent Uniform $U$ through the Inverse CDF of the marginal distribution $\Phi^{-1}_{\mathfrak{M}}$ to get the Marginal distribution(s) in $M$

$$ \begin{aligned} (M_{0}, M_{1}) &= \Phi^{-1}_{\mathfrak{M}}(U_{0}, U_{1}) \\ &= \text{LogNormalInvCDF}(U_{0}, U_{1}) \end{aligned} $$

The marginal distribution(s) $M$ can be anything. In practice we tend to use right-tailed distributions in the Exponential family, here a LogNormal. We can, of course, use different distributions on each marginal - there's no constraint to be the same - but we use the same ones here. This is the data that we would observe in the real-world dataset:

No description has been provided for this image

Observe

  • Marginals now have unique long-tail distributions, and the correlation remains

1.2.4 View the Marginals $Mx$ if they were synthesized without a Copula¶

In project class synthetic.create_copula.CopulaBuilder we also synthesize uncorrelated observations using the same transformation and final marginals $M$, so that we can visually compare the different effects.

$$ \begin{aligned} (C^{\chi}_{0}, C^{\chi}_{1}) &\sim \text{Normal}(\mu, \sigma, \text{shape}=2) \\ (U^{\chi}_{0}, U^{\chi}_{1}) &= \text{NormalCDF}(C^{\chi}_{0}, C^{\chi}_{1}) \\ (M^{\chi}_{0}, M^{\chi}_{1}) &= \text{LogNormalInvCDF}(U^{\chi}_{0}, U^{\chi}_{1}) \\ \end{aligned} $$

Uncorrelated marginals $M^{\chi}$ individually look the same as $M$. We have to look at the joint distribution to see the difference

No description has been provided for this image

Observe

  • Spherical joint distribution, no correlation between our marginals here

1.2.5 Overplot Marginals Correlated ($M$) vs Uncorrelated ($M^{\chi}$) to Highlight the Differences¶

No description has been provided for this image

Observe

  • The marginals look almost identical, but the joint distribution is very different
  • We might say "so what?" because we can always jointplot our marginals $M$ and see that there is correlation
  • The huge impact is that these lead to a very different joint product $y$...

1.3 Compare the Impact on Joint Product $y$¶

If we build a model $\mathcal{H}$ that fits to marginals $M$, but does not account for the correlation, $\mathcal{H}$ will behave as if we fit it on uncorrelated marginals $M^{\chi}$. The predicted differences in $M$ and $M^{\chi}$ won't look too different on the marginals, but the joint products $y = M_{0} \cdot M_{1}$ vs $y^{\chi} = M^{\chi}_{0} \cdot M^{\chi}_{1}$ can become very different:

No description has been provided for this image

Observe:

  • This customer diagnostic combination plot shows:
    • Pointplot (left): The bootstrapped sums $\sum_{i}y_{i}$ vs $\sum_{i}y^{\chi}_{i}$
    • Boxplot (center): The individual values $y_{i}$ vs $y^{\chi}_{i}$
    • Countplot (right): The counts of observations $i$
  • The (bootstrapped) sum of y_uncorr ($\mu \approx 700$) is almost always much higher than for y_corr ($\mu \approx 500$)
  • This shows that if our model $\mathcal{H}$ were to estimate marginals correctly but ignore the covariance, it would erroneously mis-estimate the joint distribution total value $y$. Here that mistake is to overestimate.

View the overestimate $\delta = \sum_{i}y^{\chi}_{i} - \sum_{i}y_{i}$¶

Let's view the bootstapped overestimate delta = y_uncorr = y_corr

No description has been provided for this image

Observe:

If we imagine this to be a portfolio of $50$ policies, and the value of interest is an Expected Loss Cost $y = \mathbb{E}_{\text{loss}}$, and the units are dollars, then:

  • If we were to use a model that ignores covariance, we might get a portfolio estimate of $\mathbb{E}_{\text{loss}} \approx 200$ dollars higher than if we were to use a better model that handles covariance with a copula function
  • This overestimate is a substantial $\frac{700}{500} \approx \mathbb{+40\%}$ and would likely make the difference between profitable pricing / accurate reserving, or greatly loss-making business over the portfolio.


2. Brief Technical Summary: The Copula Model designed in this Project¶

Again, this Intro is for verbal presentation and discussion purposes only - ideally followed by a deeper technical walkthrough of the project in a long-form style. Because this project is a reference, it contains huge amounts of detail which is not worthwhile to summarise too much.

The interested reader should refer to the project notebooks where state the architecture in full, we evaluate the behaviour and performance of the models in a consistent Bayesian workflow, including several state-of-the-art methods unavailable to conventional max-likelihood / machine-learning models.

Here we can can highlight a very tangible impact of our results of using a Copula model (ModelA2) vs a Naive model (ModelA0)

2.1 Brief Orientation on Model Workflow and Architecture¶

General Approach¶

  • We create a synthetic dataset with 60 observations: these have exogenous values on 2 marginals $M_{0}$, $M_{1}$
  • We create 3 models of increasing sophistication to estimate $\hat{M_{0}}$, $\hat{M_{1}}$ and thus the joint product $\hat{y} = \hat{M_{0}} \cdot \hat{M_{1}}$
  • The simplest naive model (ModelA0) does not include a copula function, and the most sophisticated model ModelA2 does

...

  • We define a training set of 50 random observations, fit the models, and view the forecasted predictions on a holdout set of 10 observations
  • We fully evaluate the models in the project notebooks using a variety of sophisticated techniques including In-sample Prior & Posterior Retrodictive ECDF plots, LOO-PIT calculations & plots, and more convential coverage, RSME and R2 calculations. This forecast on the holdout is not a formal model evaluation
  • However for discussion and elucidation we can plot the bootstrapped sum of the actual values $\sum{y}_{\text{holdout}}$ and compare to the posterior predictions $\sum{\hat{y}}_{\text{holdout}}$ of the two models

General Architecture¶

In contrast to the "forward-pass" that we use to synthesize the data, for the model we must of course start with the only data that we have (the observed marginals $M$) and work in a "backwards-pass" toward the copula.

  1. Define 2 marginal distributions $(\mathfrak{M}_{0}, \mathfrak{M}_{1})$ (here for convenience we use the same family (Lognormals) for each, so we will represent as simply $\mathfrak{M}$). Each marginal is parameterised by a linear submodel $\beta^{T}\mathbb{x}_{ij}$ to allow linear regression onto selected features $j$ $$\mathfrak{M} = \text{LogNormal}(\mu=\beta^{T}\mathbb{x}_{ij}, \sigma)$$

  2. Transform each dimension of the observed marginal data $M$ through the CDF of the marginal distribution(s) to yield data distributed according to a Latent Uniform distribution $U$ $$(U_{0}, U_{1}) = \Phi_{\mathfrak{M}}(M_{0}, M_{1})$$

  3. Transform each dimension of now-uniform data $U$ through the Inverse CDF of the copula distribution(s) $(\mathfrak{C}_{0}, \mathfrak{C}_{1})$ (here for convenience we use the same family (Normal aka a Gaussian Copula) for each, so we will represent as simply $\mathfrak{C}$) $$(C_{0}, C_{1}) = \Phi^{-1}_{\mathfrak{C}}(U_{0}, U_{1})$$

  4. Evidence the transformed data $C$ against the copula function $\log \mathcal{L}\ \mathfrak{C}$

  5. For stability and correctness, we also evidence at the marginals and minimise a Jacobian adjustment on the double-transformed data.

  6. Importantly, and unlike other model specifications in the Bayesian literature, we preserve the full posterior distribution(s) all the way through the model specification, without ever having to collapse to point estimates

Math Specification¶

Marginals $M_{k}, k \in {0, 1}$ both $\text{LogNormal}$, evidence likelihood:

$$ \begin{aligned} \beta_{M_{k}}^{j0} &\sim \text{Normal}(\mu, \sigma) \\ \sigma_{M_{k}} &\sim \text{InverseGamma}(\alpha, \beta) \\ \mathcal{L}\ \hat{M_{k}} &\sim \text{LogNormal}(\beta_{M_{k}}^{T}\mathbf{x}_{i}^{jk}, \sigma_{M_{k}}) \end{aligned} $$

Transform observed $M_{k}$ to Latent $\text{Uniform}$ $U_{k}$ via their assumed CDFs:

$$ \begin{aligned} U_{k} &= \mathfrak{m_{k}}\Phi(M_{k}) \end{aligned} $$

Transform $\text{Uniform}$ Latent $\text{Normal}$ $C_{k}$ via a Normal InverseCDF

$$ \begin{aligned} C_{k} &= \text{MvNormal}(\mu=0, \sigma=1)\Phi^{-1}(U_{k}) \end{aligned} $$

Create a latent covariance structure $\Sigma$

$$ \begin{aligned} L &\sim \text{LKJCholesky}(2), \; R \sim \text{LKJCorr}(2) \\ \sigma &\sim \text{InverseGamma}(\alpha, \beta) \\ \Sigma &\sim LL^{T} = diag(\sigma) * R * diag(\sigma) \end{aligned} $$

Evaluate likelihood $\hat{C}$ of transformed marginals using MVNormal:

$$ \begin{aligned} \mathcal{L}\ \hat{C} &\sim \text{MvNormal}(\mu=0, \Sigma, \text{observed}=C_{k}) \end{aligned} $$

Plate Notation¶

No description has been provided for this image

Refer to Notebook 102_ModelA2.ipynb and project class models.copula.ModelA2 for full details.

This advanced, fully Bayesian architecture allows for:

  • Regression via linear submodels $\beta^{T}\mathbf{x}$ on the marginals of mhat
  • Efficient covariance via an $\text{LKJCholeskyCovariance}$ structure
  • A natural transformation from M -> U -> C including Jacobian Adjustment

2.2 Compare Estimated $\hat{y}$ ModelA0 vs ModelA2¶

ModelA0¶

This model sets a baseline for performance: it uses the same marginals but does not have a copula. This is "the best that one could do" with a naive non-copula architecture, and the performance / results are analogous to $M^{\chi}$ that we discussed in $\S1$

No description has been provided for this image

Observe:

  • Now we can clearly see the impact: although the in-sample model fit was acceptable, the combined value $y$ is way off, because this model ignores copula correlation between the marginals
  • The mean of $\overline{\hat{y}}$ is $\mu \approx 13.4$, is very different (and sits outside of) the bootstrapped actual data $\overline{\hat{y}}$ is $\mu \approx 9.6$
  • Comparing means we have a $\frac{13.4}{9.6} \approx +40\%$ overestimate!
  • We do see that the PPC distribution envelops the bootstrapped actual data, which is promising, and means the model wouldn't necessarily be wrong to use, but there is clearly room to improve!

ModelA2¶

This is our most advanced model with a copula architecture: the performance / results are analogous to $M$ that we discussed in $\S1$

No description has been provided for this image

Observe:

  • Now we can clearly see the impact: the Jacobian adjustment has allowed mdla2 to estimate a much more precise and accurate value for $\hat{y}$
  • The mean of $\overline{\hat{y}}$ is $\mu \approx 10.4$, and maps nicely with the bootstrapped actual data $\overline{\hat{y}}$ is $\mu \approx 9.6$
  • Comparing means, we get $\frac{10.4}{9.6} \approx +7\%$ overestimate
  • This is substantially better than mdla0, and also meaningfully improves on mdla1

ModelA2 vs ModelA0¶

In the above, we see a reduction in the mean overestimate of $y$ from $40\%$ down to $7\%$: a 33 percentage point drop and $83\%$ relative improvement!

This is a huge difference on this very small and simple dataset, and found only by correctly modelling the covariance using a copula and a sophisticated model architecture.

2.3 Supercharged Predictions with Quantified Uncertainty: Exceedance Curve¶

Now we pull back to demonstrate the power and utility of using a Bayesian model in the first place: because the predicted output values individually & jointly have quantified uncertainty aka empirical probability. We can sum these and create an Exceedance Curve (1 - ECDF)

No description has been provided for this image

We can read this curve horizontally to determine the value $y$ at a particular probability $P$:

  • $P_{@0.50}\ \overline{\hat{y}} \geq 10.0$, (aka $\overline{\hat{y}} \approx 10.0$ @ 1-in-2), and much closer to the actual data
  • $P_{@0.05}\ \overline{\hat{y}} \geq 14.8$, (aka $\overline{\hat{y}} \approx 14.8$ @ 1-in-20)

Or we can read this curve vertically to determine the empirical probability $P$ of achieving a particular value $y$:

  • $P(\overline{\hat{y}} > 15) \approx 0.05$: i.e. if we're worried about $\overline{\hat{y}} > 15$, we might be less concerned because the probability is approx 1-in-20
  • $P(\overline{\hat{y}} > 17) \approx 0.01$: i.e. if we're worried about $\overline{\hat{y}} > 17$, we might be sanguine because the probability is approx 1-in-100

This appears substantially tighter and more robust than mdla0 and mdla1



Next Steps¶

Now the interested reader should dig into the Notebooks in project reference oreum_copula

There we demonstrate the full E2E workflow for models of increasing sophistication, including several state-of-the-art methods unavailable to conventional max-likelihood / machine-learning models.

  • 100_ModelA0.ipynb: Core (naive) architecture: Create priors, marginal likelihoods, but no copula
  • 101_ModelA1.ipynb: Partial architecture (extends ModelA0): Include Gaussian copula (w/ Jacobian adjustment), and several technical innovations to let pymc work with the transformations
  • 102_ModelA2.ipynb: Full architecture (extends ModelA1): Include Jacobian Adjustment on transformed observations

No description has been provided for this image


Oreum Industries © 2024