शब्दकोष

Automatic Differentiation
Dot anonymous function notation
Float64
Transition Matrix

बाईं ओर एक कीवर्ड चुनें ...

Bayesian Inference and Graphical ModelsProbabilistic Programming

पढ़ने का समय: ~20 min

Using MCMC to do inference can be hard work, as we saw in the previous section. However, if we take the Bayesian approach where q and \sigma^2 are random variables just like Z_1, \ldots, Z_{100}, then the only thing the user really needs to specify are the priors, the structure of the model, and the sampler (e.g., in the example above, we used a Gibbs sampler with a bit-switching proposal distribution). The rest is calculation, which could in principle be handled automatically by a probability-aware programming framework.

Probabilistic programming systems seek to automate Bayesian inference by allowing the user to specify model structure in the form of a program, choose samplers, and let the computer handle the rest.

Example

Suppose we have a 6-sided die. Letting X represent the outcome of a roll, suppose \mathbb{P}(X = 1) = p and \mathbb{P}(X = k) = \frac{1-p}{5} for k = 2,3,4,5. If we assume that p has a beta prior, then after observing many rolls we can manually determine that the posterior distribution of p given the data is also beta. While finding a closed form solution of this posterior is simple for this problem, this may not be the case for more complicated problems. Use probabilistic programming and MCMC to obtain samples from the posterior.

Solution. We begin by specifying the model from which the data was drawn:

Some points about how this works:

  1. The model definition works essentially like a function definition, with the @model macro in place of the function keyword. The argument (in this case, observations) should be the observed data that will be provided for purposes of performing inference.

  2. Hidden random variables appear before a tilde (~), which is a common probability syntax for "is distributed according to". These random variables will be tracked by the framework, and you'll get information about their posterior distributions once you provide the observed data.

  3. Other than these distinctions, the program is normal Julia code. It describes the procedure we would use to sample from the model's prior distribution.

Note that above we have imposed a prior of \operatorname{Beta}(2,5) for p, encoding a belief that the die is biased toward 1.

After defining the model, we need to describe a method for proposing steps in the Markv chain we'll be using to sample from the posterior distribution. In practice, using a straightforward proposal distribution for the Metropolis-Hastings updates can be very inefficient because the proposal it suggests often go in directions of much smaller probability density. Hamiltonian Monte Carlo differentiates (usually autodiffs

) the density f and uses some fairly advanced mathematical ideas to suggest moves which are much more likely to be in directions where the density isn't way smaller:

Hamiltonian Monte Carlo

Let's use a Hamiltonian Monte Carlo sampler for this problem:

The variable data above is a vector containing the observed data.

We can now obtain summary statistics of the parameter p, such as mean and standard deviation, and plot a histogram of the posterior samples with the following code:

which produces the following histogram:

The variable p_summary above is an MCMCChains object and contains the samples of p sampled from the posterior. To obtain the samples, we can run

We can now use this to construct confidence intervals for p. For example, a 95% confidence interval is given by

Below we consider a slightly more complex HMM problem.

Example

Consider an HMM with hidden variables Z_i \in \{1,2\} and:

\begin{align*}p(z_1) &= \frac{1}{2} \textrm{ for } z_1 \in \{1,2\} \\ f(x_j|z_j) &\sim N(z_j,0.1) \textrm{ for } j \in \{1,2,\ldots,n\} \\ \mathbb{P}(Z_{k+1} &= z_{k+1}|Z_k = z_k) = p(z_{k},z_{k+1}),\end{align*}

where p(z_{k},z_{k+1}) is defined by:

Suppose we have observed the variables X_1, X_2, \ldots, X_{15} available here. Estimate p_1 and p_2 using probabilistic programming.

Solution. We will assume a uniform prior on p_1 and p_2. We can define the model as follows:

We will now use a Gibbs sampler to obtain samples from the posterior of p₁ and p₂. HMC is only appropriate for continuous random variables; other samplers are needed for discrete random variables, like the Z's in this case. We'll use one called particle Gibbs, which keeps track of several values for each random variable at the same time. (Each of these values is conceived as a particle; hence the name.) We will use a Gibbs sampler to combine HMC and Particle Gibbs.

Histograms of the p_1 and p_2 marginal posteriors are given below.

The actual values of p_1 and p_2 used to generate the data were 0.25 and 0.55, respectively. So these figures look pretty plausible, at least in the sense that their means are near the correct values.

Finally, let's look at the hidden Markov model from the previous section:

As in the previous example, we'll use a Gibbs sampler to combine HMC and particle Gibbs sampling:

Congraulations! You've finished the Data Gymnasia Bayesian Inference and Graphical Models course.

Bruno Bruno