Bayesian Inference and Graphical ModelsProbabilistic Programming
Using MCMC to do inference can be hard work, as we saw in the previous section. However, if we take the Bayesian approach where and are random variables just like , then the only thing the user really needs to specify are the priors, the structure of the model, and the sampler (e.g., in the example above, we used a Gibbs sampler with a bit-switching proposal distribution). The rest is calculation, which could in principle be handled automatically by a probability-aware programming framework.
Probabilistic programming systems seek to automate Bayesian inference by allowing the user to specify model structure in the form of a program, choose samplers, and let the computer handle the rest.
Example
Suppose we have a 6-sided die. Letting represent the outcome of a roll, suppose and for . If we assume that has a beta prior, then after observing many rolls we can manually determine that the posterior distribution of given the data is also beta. While finding a closed form solution of this posterior is simple for this problem, this may not be the case for more complicated problems. Use probabilistic programming and MCMC to obtain samples from the posterior.
Solution. We begin by specifying the model from which the data was drawn:
using Turing, MCMCChains, Distributions, StatsPlots, CSV
@model die_model(observations) = begin
# Set prior distribution for p (probability of rolling 1)
p ~ Beta(2,5)
# Probability of rolling one of 2 - 6
q = (1-p)/5
# Define how samples are obtained
for i in 1:length(observations)
observations[i] ~ Categorical([p, q, q, q, q, q])
end
end
Some points about how this works:
The model definition works essentially like a function definition, with the
@model
macro in place of thefunction
keyword. The argument (in this case,observations
) should be the observed data that will be provided for purposes of performing inference.Hidden random variables appear before a tilde (
~
), which is a common probability syntax for "is distributed according to". These random variables will be tracked by the framework, and you'll get information about their posterior distributions once you provide the observed data.Other than these distinctions, the program is normal Julia code. It describes the procedure we would use to sample from the model's prior distribution.
Note that above we have imposed a prior of for , encoding a belief that the die is biased toward 1.
After defining the model, we need to describe a method for proposing steps in the Markv chain we'll be using to sample from the posterior distribution. In practice, using a straightforward proposal distribution for the Metropolis-Hastings updates can be very inefficient because the proposal it suggests often go in directions of much smaller probability density. Hamiltonian Monte Carlo differentiates (usually Automatic differentiation is a technique for efficiently and accurately computing numerical derivatives of functions expressed in the form of code. The idea is to substitute into the function a two-field object which uses the chain rule to simultaneously track the value and the derivative at that value as each intermediate calculation is performed. An autodiff system needs to know how to compute derivatives for any basic function that might be used inside the body of the function being differentiated.
Let's use a Hamiltonian Monte Carlo sampler for this problem:
# Use HMC sampler to obtain posterior samples
n_posterior_observations = 1000
ϵ = 0.05 # Leapfrog step size
τ = 10 # Number of leapfrog iterations
data = [4,6,4,5,1,4,1,5,5,5,6,5,3,5,2,1,1,1,1,4,
2,1,1,1,3,6,4,2,1,5,1,3,4,2,1,3,3,3,2,3,
1,1,1,1,2,1,2,3,6,4,2,1,3,6,2,2,5,6,1,4,
6,4,4,2,4,4,6,1,2,1,5,3,1,3,6,3,1,3,6,1,
3,5,6,6,4,4,4,2,2,3,2,5,1,3,1,5,1,5,6,5]
# Obtain posterior samples
chain = sample(die_model(data), HMC(ϵ, τ), n_posterior_observations, progress=false)
The variable data
above is a vector containing the observed data.
We can now obtain summary statistics of the parameter , such as mean and standard deviation, and plot a histogram of the posterior samples with the following code:
# Extract summary of p parameter and plot histogram
p_summary = chain[:p]
plot(p_summary, seriestype = :histogram)
which produces the following histogram:
The variable p_summary
above is an MCMCChains
object and contains the samples of sampled from the posterior. To obtain the samples, we can run
Array(p_summary)
We can now use this to construct confidence intervals for . For example, a 95% confidence interval is given by
quantile(Array(p_summary), [0.025, 0.975])
Below we consider a slightly more complex HMM problem.
Example
Consider an HMM with hidden variables and:
where is defined by:
Suppose we have observed the variables available here. Estimate and using probabilistic programming.
Solution. We will assume a uniform prior on and . We can define the model as follows:
using Turing, MCMCChains, Distributions
@model HMM(x) = begin
n = length(x)
z = zeros(Int64, length(x)) # hidden states
# Define priors
p₁ ~ Uniform(0,1) # Transition probability 1→1
p₂ ~ Uniform(0,1) # Transition probability 2→1
# Define transition matrix
P = [p₁ 1-p₁; p₂ 1-p₂]
# Initialize samples
z[1] ~ Categorical([0.5, 0.5]) # Start z at either 1 or 2
x[1] ~ Normal(z[1], 0.1)
for i in 2:n
# Get next hidden state
z[i] ~ Categorical(P[z[i-1],:])
x[i] ~ Normal(z[i], 0.1)
end
return (p₁, p₂)
end
We will now use a Gibbs sampler to obtain samples from the posterior of p₁
and p₂
. HMC is only appropriate for continuous random variables; other samplers are needed for discrete random variables, like the 's in this case. We'll use one called particle Gibbs, which keeps track of several values for each random variable at the same time. (Each of these values is conceived as a particle; hence the name.) We will use a Gibbs sampler to combine HMC and Particle Gibbs.
# Load observed data
data = [0.97, 0.94, 2.02, 1.07, 1.93, 0.93, 2.0, 2.0, 2.07, 0.92, 2.18, 1.8, 1.94, 1.9, 0.93]
# Use Gibbs with HMC and PG to obtain posterior samples
n_posterior_observations = 1000
ϵ = 0.1 # Leapfrog step size
τ = 10 # Number of leapfrog iterations
hmc = HMC(ϵ, τ, :p₁, :p₂)
particle_gibbs = Turing.PG(20, :z)
G = Gibbs(hmc, particle_gibbs)
# Obtain posterior samples
chain = sample(HMM(data), G, n_posterior_observations)
Histograms of the and marginal posteriors are given below.
The actual values of and used to generate the data were 0.25 and 0.55, respectively. So these figures look pretty plausible, at least in the sense that their means are near the correct values.
Finally, let's look at the hidden Markov model from the previous section:
using Turing, OffsetArrays, StatsPlots
@model HMM(x) = begin
n = length(x)
z = tzeros(Int64, n)
q ~ Uniform(0, 1)
σ² ~ InverseGamma(1e-3, 1e-3)
P = OffsetArray([ q 1-q
1-q q ], 0:1, 0:1)
z[1] ~ DiscreteNonParametric(0:1, [0.5, 0.5])
x[1] ~ Normal(z[1], √(σ²))
for i in 2:n
z[i] ~ DiscreteNonParametric(0:1, collect(P[z[i-1],:]))
x[i] ~ Normal(z[i], √(σ²))
end
end
As in the previous example, we'll use a Gibbs sampler to combine HMC and particle Gibbs sampling:
# choose parameters for samplers
# 0.05 is a step size ϵ, while 10 is n_leapfrog, which we won't get into
hmc = HMC(0.05, 10, :q, :σ²)
# 20 particles
pg = PG(20, :z)
G = Gibbs(hmc, pg)
X = [-0.12, 0.11, -0.58, -1.21, 1.06, -0.09, -0.32, -0.09, -0.39, 2.01, 0.72, 1.61, -0.08, 0.99, 0.55, 1.28, 0.75, 1.09, 0.91, 0.45, 0.13, -0.61, 0.18, -0.03, 1.36, 0.12, 0.06, -0.96, 1.38, -1.86, 0.11, -1.17, -1.94, -0.3, -0.01, 0.82, -0.34, 0.55, 0.53, -1.15, -0.67, 0.47, 0.68, 1.84, 2.39, -0.05, -0.14, -0.03, 2.27, 0.36, 0.31, 0.46, 0.72, 1.34, 1.45, -0.28, -0.06, 0.71, -0.5, -1.01, 0.31, -0.07, 0.67, 2.55, 1.41, 0.35, 0.89, 1.04, 0.81, 1.08, 1.45, 0.23, 0.06, -0.05, 1.88, 1.11, 1.25, 0.35, 0.84, 1.96, 1.52, 1.34, 1.43, 1.9, 0.01, 1.08, 1.44, 1.22, 1.7, 1.36, 1.62, 1.49, 2.42, 1.11, -0.56, 0.71, -0.35, 0.26, 0.48, 0.42]
chains = sample(HMM(X), G, 50)
plot(chains[:q])
Congraulations! You've finished the Data Gymnasia Bayesian Inference and Graphical Models course.