Oreum Industries Internal Project, 2024Q3

000_Overview.ipynb¶

Oreum Case Studies - Lung Cancer Survival Regression `oreum_cs_lung`¶

Demonstrate Survival Regression Modelling using Bayesian inference and a Bayesian workflow, specifically using the pymc & arviz ecosystem.

Here we report a brief overview of a full case study oreum_cs_lung in which we demonstrate an E2E workflow for survival regression models of increasing sophistication.

This overview is for quick discussion purposes only, and ideally should accompany a deeper technical walkthrough of the case study in a long-form style. There we evaluate the behaviour and performance of the models throughout the workflows, including several state-of-the-art methods unavailable to conventional max-likelihood / machine-learning models.

PDF version

Oreum Industries: Technical Resources

No description has been provided for this image

We use a complicated, real-world dataset: the NCCTG Lung Cancer Dataset lung from the survival R package via statsmodels. This complicated dataset requires multiple advanced modelling methods to handle and mitigate bad / messy data.

This case study also inspired us to create two original novel contributions to the pymc project:

A detailed worked example on handling ordinals in pymc-examples in GLM-ordinal-features
A detailed worked example on handling missing data in pymc-examples in GLM-missing-values-in-covariates

See comprehensive explanations and deep technical demonstrations of various survival models in our public project Oreum Survival including Accelerated Failure Time and Piecewise Regression Models

Preamble¶

What is Survival Regression?¶

See comprehensive explanations and deep technical demonstrations in our public project Oreum Survival

Essentially, we seek to create principled models that provide explanatory inference and predictions of the Survival Function $\hat{S}(t)$ and Expected Time-to-Event $\hat{\mathbb{E}}_{t}$ with quantified uncertainty to support real-world decision-making.

No description has been provided for this image

Illustrative Survival Function ${S}(t)$ and Expected Time-to-Event $\mathbb{E}_{t}$¶

Taken from notebook oreum_cs_lung 300_ModelA_E2E.ipynb and shown here just to give the reader a feel for the outputs and the problem-space. All estimates are made with quantified uncertainty, which reveals (rather than hides) the inherent noise in real-world data and processes.

We gain massive advantage by using a Bayesian Framework¶

We specifically use Bayesian Inference rather than Frequentist Max-Likelihood methods for many reasons, including:

	Bayesian Inference	Frequentist Max-Likelihood
General formulation $\rightarrow$ Desirable Trait $\downarrow$	Bayes' Rule $$\underbrace{P(\hat{\mathcal{H}}\\|D)}_{\text{posterior}} = \frac{\overbrace{P(D\\|\mathcal{H})}^{\text{likelihood}} \cdot \overbrace{P(\mathcal{H})}^{\text{prior}}}{\underbrace{P(D)}_{\text{evidence}}} $$	MLE $$\hat{\mathcal{H}}^{\text{MLE}} \propto \arg\max_{\mathcal{H}} P(D\\|\mathcal{H})$$
Principled model structure represents hypothesis about the data-generating process	Very strong Can build bespoke arbitrary and hierarchical structures of parameters to map to the real-world data-generating process.	Weak Can only state structure under strict limited assumptions of model statistical validity.
Model parameters and their initial values represent domain expert knowledge	Very strong Marginal prior distributions represent real-world probability of parameter values before any data is seen.	Very weak No concept of priors. Lack of joint probability distribution can lead to discontinuities in parameter values.
Robust parameter fitting process	Strong Estimate full joint posterior probability mass distribution for parameters - more stable and representative of the expectation for the parameter values. Sampling can be a computationally expensive process.	Weak Estimate single-point max-aposterioi-likelihood (density) of parameters - this can be far outside the probability mass and so is prone to overfitting and only correct in the limit of infinite data. But optimization method can be computationally cheap.
Fitted parameters have meaningful summary statistics for inference	Very strong Full marginal probability distributions can be interpreted exactly as probabilities.	Weak Point estimates only have meaningful summary statistics under strict limited assumptions of model statistical validity.

continues ...

... continued

Desirable Trait	Bayesian Inference	Frequentist Max-Likelihood
Robust model evaluation process	Strong Use entire dataset, evaluate via Leave-One-Out Cross Validation (best theoretically possible).	Weak Cross-validation rarely seen in practice, even if used, rarely better than 5-fold CV. Simplistic method can be computationally cheap.
Predictions made with quantified variance	Very strong Predictions made using full posterior probability distributions, so predictions have full empirical probability distributions.	Weak Predictions using point estimates can be bootstrapped, but predictions only have interpretation under strict limited assumptions of model validity.
Handle imbalanced, high cardinality & hierarchical factor features	Very strong Can introduce partial-pooling to automatically balance factors through hierarchical priors.	Weak Difficult to introduce partial-pooling (aka mixed random effects) without affecting strict limited assumptions of model validity.
Handle skewed / multimodal / extreme value target variable	Very strong Represent the model likelihood as any arbitrary probability distribution, including mixture (compound) functions e.g. a zero-inflated Weibull.	Weak Represent model likelihood with a usually very limited set of distributions. Very difficult to create mixture compound functions.
Handle small datasets	Very strong Bayesian concept assumes that there is a probable range of values for each parameter, and that we evidence our prior on any amount of data (even very small counts).	Very weak Frequentist concept assumes that there is a single true value for each parameter and that we only discover that value in the limit (of infinite observations).
Automatically impute missing data	Very strong Establish a prior for each datapoint, evidence on the available data within the context of the model, to automatically impute missing values.	Very weak No inherent method. Usually impute as a pre-processing step with weak non-modelled methods.

Practical Implementations of Bayesian Inference¶

We briefly referenced Bayes Rule above, which is a useful mnemonic when discussing Bayesian Inference, but in practice the crux of putting these advanced statistical techniques into practice is estimating the evidence $P(D)$ i.e. the probability of observing the data that we use to evidence the model

$$ \begin{aligned} \underbrace{P(\hat{\mathcal{H}}|D)}_{\text{posterior}} &= \frac{\overbrace{P(D|\mathcal{H})}^{\text{likelihood}} \cdot \overbrace{P(\mathcal{H})}^{\text{prior}}}{\underbrace{P(D)}_{\text{evidence}}} \\ \\ \text{...where:} \\ \\ P(D) &\sim \int_{\Theta} P(D, \theta)\ d\theta \\ \end{aligned} $$

This joint probability $P(D, \theta)$ of data $D$ and parameters $\theta$ requires an almost impossible-to-solve integral over parameter-space $\Theta$. Rather than attempt to calculate that integral, we do something that sounds far more difficult, but given modern computing capabilities is actually practical.

We use a Bleeding-edge MCMC Toolkit for Bayesian Inference: `pymc` & `arviz`¶

We use Markov Chain Monte-Carlo (MCMC) sampling to take a series of ergodic, partly-reversible, partly-randomised samples of model parameters $\theta$, and at each step compute the ratio of log-likelihoods $\log P(D|\mathcal{H})$ between a starting position (current values) $\theta_{p0}$ and proposed "sampled" position $\theta_{p}$ in parameter space, so as to reduce that log-likelihood (whilst exploring the parameter space).

This results in a posterior estimate $P(\hat{\theta}|D)$:

$$ \begin{aligned} P(\hat{\theta}|D) &\sim \frac{\overbrace{P(D|\theta_{p})}^{\text{likelihood @ proposal}} \cdot \overbrace{P(\theta_{p})}^{\text{prior @ proposal}}} {\underbrace{P(D|\theta_{p0})}_{\text{likelihood @ current}} \cdot \underbrace{P(\theta_{p0})}_{\text{prior @ current}}} \end{aligned} $$

This is the heart of MCMC sampling: for detailed practical explanations see Betancourt, 2021 and Tweicki, 2015

We use the bleeding-edge pymc and arviz Python packages to provide the full Bayesian toolkit that we require, including advanced sampling, probabilistic programming, statistical inferences, model evaluation and comparison, and more.

1. The Dataset and Problem-Space¶

1.1 Dataset¶

As noted above, We use a complicated, real-world dataset that requires multiple advanced modelling methods to handle and mitigate bad / messy data.

We use the NCCTG Lung Cancer Dataset lung from the survival R package via statsmodels.

For illustration: table of the dataset (post-cleaning)¶

See oreum_cs_lung 100_CurateExtractClean.ipynb for details

	duration	death	age_at_start	calories_consumed_amt	ecog_physician_cat	inst_id	karno_patient_amt	karno_physician_amt	male	weight_loss_prior_amt
pid
p000	306	True	74	1175.0	c1	i3	100.0	90.0	True	NaN
p001	455	True	68	1225.0	c0	i3	90.0	90.0	True	15.0
p002	1010	False	56	NaN	c0	i3	90.0	90.0	True	15.0
p225	105	False	75	1025.0	c2	i32	70.0	60.0	False	5.0
p226	174	False	66	1075.0	c1	i6	100.0	90.0	True	1.0
p227	177	False	58	1060.0	c1	i22	90.0	80.0	False	0.0

'Shape: (228, 10), Memsize 0.0 MB'

Explanation of the features for each of the 228 observations, post-cleaning (which includes cleaning dirty values forcing correct datatypes, renaming to be more clear, etc)

Endogenous (target) features captured at the end of the study period

duration: Time-to-event observed during study period (days) (can be censored)
death:  Death observed during study period

Exogenous (predictor) features captured at the start of the study period

age_at_start:	            Age in years
calories_consumed_count:    Calories consumed at meals
ecog_physician_cat:	    ECOG  score as rated by the physician. c0 to c4
inst_id:	            Institution identifier code
karno_patient_amt:	    Karnofsky score as rated by patient {0, ..., 100}
karno_physician_amt:	    Karnofsky score as rated by physician {0, ..., 100}
male:                       Sex of patient is male (True/False)
weight_loss_prior_amt:      Weight loss in last six months (pounds)

The medical details behind this data are explained and linked further in the dataset page

1.2. Time to Event $t$ and Event Density $\pi$¶

The 228 individuals have an observed "lifetime" aka time-to-event during the study, measured in days.

For each individual this starts when they enter the study (not necessarily all on the same day), and ends either at death (death=True) or when the study ends.

Lifetimes (Censored Time-to-Event)¶

Empirical event density $\pi$¶

1.3 Modelled Survival Function $\hat{S}(t)$ and Expected Time-to-Event $\hat{\mathbb{E}}_{t}$¶

We seek to estimate $\pi$ and thus the Expected Time-to-Event $\hat{\mathbb{E}}_{t}$

As a side-effect of the mathematical relationships, we also get a Survival Function $\hat{S}(t)$ which can be a more familiar representation for newcomers to understand.

The model produces these estimates with quantified uncertainty per individual, but we can substitute a simplified dataset to effectively roll these up to look at the differences between groups according to exogenous feature.

For example, forecasted Expected Time-to-Event $\hat{\mathbb{E}}_{t}$ and Survival Function $\hat{S}(t)$ for male vs female:¶

2. The Modelling Work¶

2.1 Architectures¶

We use an Accelerated Failure Time (AFT) model, specifically a censored Weibull-distribution on the Event Density $\pi$.

The AFT base model is explained in great detail with examples in our public reference project oreum_survival 001_BayesianSurvival_AFT.ipynb

As noted above, the purpose of this case study is to demonstrate an E2E workflow for models of increasing sophistication. We evaluate the behaviour and performance of the models throughout the workflows, including several state-of-the-art methods unavailable to conventional max-likelihood / machine learning models.

For illustration: math and plate notation diagram of the most basic model (`Model A0`)¶

See oreum_cs_lung 300_ModelA_E2E.ipynb for details and the full workflow. See oreum_cs_lung src/model/lung.py for the code

Snippet of the general math for the censored Weibull likelihood and the linear-submodel for regression onto features:

$$ \begin{aligned} \sigma_{\beta} &\sim \text{Gamma}(\alpha=1, \beta=2) \\ \beta &\sim \text{Normal}(\mu=0, \sigma=\sigma_{\beta}) \\ \alpha &\sim \exp (\mathbf{\beta}^{T}\mathbf{x}) \\ \gamma &\sim \text{Gamma}(\alpha=1, \beta=200) \\ \\ \hat{\pi}(t) &\sim \alpha \gamma \left(\gamma t_{i} \right)^{\alpha - 1,\ {d_{i}}} \cdot \exp \left( - (\gamma t)^{\alpha}\right) \\ &\sim \text{Weibull}(t\ |\ \alpha, \gamma) \\ \end{aligned} $$

Plate notation of simplest model `ModelA0`¶

2.2 Bayesian Workflow¶

Throughout the case study we employ at reasonably complete Bayesian Workflow (see Gelman et al, 2020 and Betancourt, 2020) using a cyclical process of model evaluation and improvement.

2.3 The Model Variants and Technical Evaluation¶

Generally, we want to use as many observations and features from the dataset as possible, although obviously we have to mitigate overfitting (we use L2 regularization) and resource constraints.

In general we aim to create models with increasing sophistication and ability to handle the nature of the dataset. In particular in this dataset we encounter numerics, booleans, categoricals, ordinals and missing values!

In this particular case study we ended up with 5 different model variants:

`Model A`¶

Establish the core model architecture for Accelerated-Failure-Time (AFT) including:
- Weibull likelihood on event density
- Event censoring via CustomDist
- Linear submodel on $\alpha$ param
In particular create 3 sub-variants using the (complete) numeric and boolean values available:
- male_t_true
- age_at_start
Linear component: 1 + male_t_true + age_at_start + age_at_start^2

`Model B`¶

Extend Model A to include (missing-value / incomplete) numeric values
We use a sophisticated technique of hierarchical auto-imputation to fill-in missing values of calories_consumed_amt, weight_loss_prior_amt within the model itself (not a pre-processing step!)
Linear component: 1 + male_t_true + age_at_start + age_at_start^2 + calories_consumed_amt + weight_loss_prior_amt

`Model C`¶

Extend Model B to include (complete) categorical values
We use a sophisticated technique with a hierarchical prior (aka mixed random effects) to include the categorical feature inst_id
Interestingly, we see the model performance decrease, likely because the cardinality is too high given the small dataset, and in the real-world this would encourage us to seek more information from the data source to learn latent groupings within the inst_id and apply further hierarchies into the model
Linear component: 0 + inst_id + male_t_true + age_at_start + age_at_start^2 + calories_consumed_amt + weight_loss_prior_amt

`Model D`¶

Extend Model C to include (complete) ordinal values
We use a sophisticated technique with Dirichlet allocator to include ordinal categorical features ecog_physician_cat, karno_physician_cat, karno_patient_cat
These are present due to misuse of a subjective value that's been mapped to a metric scale, also see pymc-examples GLM-ordinal-features.ipynb for detailed discussion and worked examples
Linear component: 0 + inst_id + ecog_physician_cat + karno_physician_cat + karno_patient_cat + 'male + age_at_start + np.power(age_at_start,2) + calories_consumed_amt + weight_loss_prior_amt + death

`Model E`¶

Simplify Model C to remove inst_id, to make use of the innovations in Model D and remove the performance drop of Model C
We compare the models Model A0 vs Model E on an in-sample dataset to eyeball the results

For illustration: plate notation diagram of the most complicated model in this case study (`Model D`)¶

See oreum_cs_lung 303_ModelD_E2E.ipynb for details and the full workflow, and note we actually pull this back to a simpler Model E which achieves the best results

In-sample Model Evaluation¶

We create a variety of models according to the nature of the data and compare their performance using advanced statistical reasoning.

At the end we can make a quantified evaluation and see that Model E performs the best for this dataset.

See oreum_cs_lung304_ModelE_E2E.ipynb for details of this in-sample LOO-PIT evaluation method

3. Using the Model Outputs¶

3.1 Inference via the linear coefficients¶

We used a 2-parameter form of the Weibull with a linear regression onto $\alpha \sim \exp (\mathbf{\beta}^{T}\mathbf{x})$.

When:

$\alpha < 1$, hazard decreases over time e.g. early mortality
$\alpha = 1$, hazard is constant $\lambda \sim \gamma$ and the model reduces to the Exponential model
$\alpha > 1$, hazard increases over time e.g. later mortality

So we can view the relative coefficient values to make inferences about the correlation (not causation) of features with changes in $\alpha$

3.1.1 Interpret effect of Simple Linear Coefficients¶

First let's view the posterior values of beta_* to infer how the numeric values affect $\alpha$

Observe:

beta: intercept: $\mathbb{E} \sim 0.053,\ 0 \in HDI_{94}$, mild effect $\alpha > 1$ i.e. overall increasing hazard fn $\lambda$ over duration (later mortality)
beta: male_t_true: $\mathbb{E} \sim -0.15,\ 0 \in HDI_{94}, ,\ 0 \notin HDI_{80}$, appreciable effect, earlier mortality for males
beta: age_at_start: $\mathbb{E} \sim -0.053,\ 0 \in HDI_{94}$, minimal effect, slightly earlier mortality for higher ages
beta: age_at_start_2_power: $\mathbb{E} \sim -0.12,\ 0 \in HDI_{94}$, mild effect, earlier mortality for higher ages
beta_mv: calories_consumed_amt: $\mathbb{E} \sim 0.21,\ 0 \in HDI_{94},\ 0 \notin HDI_{80}$, stronger effect, later mortality with higher calories
beta_mv: weight_loss_prior_amt: $\mathbb{E} \sim 0.015,\ 0 \in HDI_{94}$ minimal effect, slightly later mortality with higher calories
beta_ecog_physician_cat: $\mathbb{E} \sim -0.012,\ 0 \in HDI_{94}$ minimal effect, slightly earlier mortality with higher ecog physician score
beta_karno_physician_cat: $\mathbb{E} \sim 0.23,\ 0 \in HDI_{94}$, appreciable effect, later mortality with higher karno physician score
beta_karno_patient_cat: $\mathbb{E} \sim 0.17,\ 0 \in HDI_{94}$, appreciable effect, later mortality with higher karno patient score

We won't attempt to interpret the coefficients much further because we're not physicians and this isn't our data / medical study.

However, we can note that beta: male_t_true, beta: age_at_start_2_power, beta_mv: calories_consumed_amt, beta_karno_physician_cat, beta_karno_patient_cat all have appreciable correlating effects with mortality in this model - and would be good candidate features for further exploration, model finessing, and further medical study

Let's view those beta coefficients in a forestplot to gain a better understanding of the relative effects we just noted

Observe:

This is the same info we saw above, but let's highlight the biggest movers:

beta: male_t_true: $\mathbb{E} \sim -0.15,\ 0 \in HDI_{94}, ,\ 0 \notin HDI_{80}$, appreciable effect, earlier mortality for males
beta: age_at_start_2_power: $\mathbb{E} \sim -0.12,\ 0 \in HDI_{94}$, mild effect, earlier mortality for higher ages
beta_karno_physician_cat: $\mathbb{E} \sim 0.23,\ 0 \in HDI_{94}$, appreciable effect, later mortality with higher karno physician score
beta_karno_patient_cat: $\mathbb{E} \sim 0.17,\ 0 \in HDI_{94}$, appreciable effect, later mortality with higher karno patient score

3.1.2 Interpret effects of ordinal values¶

We can also view the posterior values of nu_* to infer how the ordinal categorical values in beta_ecog_physician_cat, beta_karno_physician_cat, beta_karno_patient_cat affect $\alpha$

Observe:

nu_ecog_physician_cat: as c0 -> c4 there's barely any impact, except for widening uncertainty. As we see in 100_Curate_ExtractClean.ipynb $\S3.2$, the upper values c3 & c4 are almost not observed, so this huge uncertainty is expected, however, there's no appreciable difference within c0, c1, c2 suggesting it's not a worthwhile metric for our model usage
nu_karno_physician_cat: as c0 -> c100 there's a fairly linear increase on \alpha (later mortality). The near-linear response suggests this ordinal metric is being used quite reliably
nu_karno_patient_cat: as c0 -> c100 there's a fairly linear increase on \alpha (later mortality). The near-linear response suggests this ordinal metric is being used quite reliably, but the effect is weaker than nu_karno_physician_cat which makes sense if we assume that the physician is a better judge of health

For more detail and discussion on ordinals, see our original novel contribution to pymc-examples in GLM-ordinal-features

3.1.3 Interpret auto-imputed missing values¶

We can also view the posterior values of x_mv_mu * to infer how the model has auto-imputed hierarchical expected values for the missing data in features calories_consumed_amt and weight_loss_prior_amt.

For brevity we wont look into the auto-imputed posterior values x_mv for individual observations, and refer the reader to 304_ModelE_E2E.ipynb for full detail

Observe:

calories_consumed_amt: $\mathbb{E} \sim 0.0034,\ 0 \in HDI_{94}$
weight_loss_prior_amt: $\mathbb{E} \sim -0.0004,\ 0 \in HDI_{94}$
In both cases with zscored the data prior to modelling, so these hierarchical values effectively 0 are exactly what we would expect to see if the missing values are Missing at Random (MAR). This supports our choice to include these features without fear that we are introducing systemic issues in the data

For more deatil and discussion on handling missing data, see our original novel contribution to pymc-examples in GLM-missing-values-in-covariates

3.2 Prediction of Expected Time-to-Event $\hat{\mathbb{E}}_{t}$ for individual observations¶

We also engineered our model to be able to make predictions on out-of-sample (aka previously unseen) data of Expected Time-to-Event $\hat{\mathbb{E}}_{t}$ (which is formally in the model as $\hat{\pi}(t)$

This is not a technical evaluation method, and the proper technical model evaluation is discussed above $\S2.3$, but here we can eyeball the predictions on a subset of data vs the true values, and get a feel for what we can do with the predictive outputs.

Each prediction is of course a distribution over a range of values, quantifying the uncertainty in the prediction. We can use the mean as the expected value, and the distribution as a measure of the uncertainty. We can test this distribution against the true data and comment on the model calibration.

Assuming a well-calibrated model, we can use the distribution as an exceedance curve wherein we choose to use the predicted value at e.g. the 90th percentile, so that e.g. 90% of the time the true value is below our predicted value. This is critical in risk evaluation etc.

Observe:

This is a random subset of the full dataset, which itself is very small and noisy, so we shouldn't expect perfection
Nonetheless, we see useful predictions where the $HDI_{94}$ has always captured the true value, and in most cases the much smaller $HDI_{50}$ has captured the correct value
The ability of our principled model to make useful predictions on out-of-sample data is incredibly powerful

Next Steps¶

Now the interested reader should dig into the full case study Notebooks in project oreum_cs_lung

There we demonstrate the full E2E workflow for models of increasing sophistication, including several state-of-the-art methods unavailable to conventional max-likelihood / machine-learning models.

100_Curate_ExtractClean.ipynb
200_EDA_Survival.ipynb
300_ModelA_E2E.ipynb
301_ModelB_E2E.ipynb
302_ModelC_E2E.ipynb
303_ModelD_E2E.ipynb
304_ModelE_E2E.ipynb

No description has been provided for this image

000_Overview.ipynb¶

Oreum Case Studies - Lung Cancer Survival Regression oreum_cs_lung¶

Contents¶

Preamble¶

What is Survival Regression?¶

Illustrative Survival Function ${S}(t)$ and Expected Time-to-Event $\mathbb{E}_{t}$¶

We gain massive advantage by using a Bayesian Framework¶

Practical Implementations of Bayesian Inference¶

We use a Bleeding-edge MCMC Toolkit for Bayesian Inference: pymc & arviz¶

1. The Dataset and Problem-Space¶

1.1 Dataset¶

For illustration: table of the dataset (post-cleaning)¶

1.2. Time to Event $t$ and Event Density $\pi$¶

Lifetimes (Censored Time-to-Event)¶

Empirical event density $\pi$¶

1.3 Modelled Survival Function $\hat{S}(t)$ and Expected Time-to-Event $\hat{\mathbb{E}}_{t}$¶

For example, forecasted Expected Time-to-Event $\hat{\mathbb{E}}_{t}$ and Survival Function $\hat{S}(t)$ for male vs female:¶

2. The Modelling Work¶

2.1 Architectures¶

For illustration: math and plate notation diagram of the most basic model (Model A0)¶

Plate notation of simplest model ModelA0¶

2.2 Bayesian Workflow¶

2.3 The Model Variants and Technical Evaluation¶

Model A¶

Model B¶

Model C¶

Model D¶

Model E¶

For illustration: plate notation diagram of the most complicated model in this case study (Model D)¶

In-sample Model Evaluation¶

3. Using the Model Outputs¶

3.1 Inference via the linear coefficients¶

3.1.1 Interpret effect of Simple Linear Coefficients¶

3.1.2 Interpret effects of ordinal values¶

3.1.3 Interpret auto-imputed missing values¶

3.2 Prediction of Expected Time-to-Event $\hat{\mathbb{E}}_{t}$ for individual observations¶

Next Steps¶

Oreum Case Studies - Lung Cancer Survival Regression `oreum_cs_lung`¶

We use a Bleeding-edge MCMC Toolkit for Bayesian Inference: `pymc` & `arviz`¶

For illustration: math and plate notation diagram of the most basic model (`Model A0`)¶

Plate notation of simplest model `ModelA0`¶

`Model A`¶

`Model B`¶

`Model C`¶

`Model D`¶

`Model E`¶

For illustration: plate notation diagram of the most complicated model in this case study (`Model D`)¶