Introduction to Bayesian statistics — part 2
Scope:
This article is intended to provide more clarity on the differences in approach between frequentist and Bayesian methods. It will introduce the basics of Bayesian statistics that will help in understanding the first article (Introduction to Bayesian statistics — part 1).
Frequentist approach:
Frequentist inference assumes that the underlying parameters are constant and unknown. In other words, the sample that we observed is from a distribution with constant parameters. The sample can be used to estimate the value of the parameters.
For example, if we know that the observations x₁, x₂, …, xₙ ~ N(Μ, σ²) with known σ², we can estimate Μ using the unbiased estimator: μ = ∑xᵢ/n.
Inference:
Let us assume that the sample estimator is μ and sample standard error of the estimation is s = σ/√n. We can build the 95% confidence interval [μ - z_α * s, μ + z_α * s].
However, this does not mean that there is 95% probability of Μ lying in this particular interval.
The interpretation: If we keep collecting multiple samples and create confidence intervals, we expect 95% of the confidence intervals to contain M.
Bayesian approach:
The idea of an underlying constant, unknown parameter limits the type of inference that can be performed using frequentist approach. In the example of normal distribution with constant mean Μ and constant, known variance σ², we cannot make an inference with H₀: Μ = 0 vs H₁: Μ ≠ 0. By allowing the underlying parameter to be a random variable, the Bayesian approach allows us to make a guess for the underlying parameter (prior distribution) and to update the guess based on the observed data (likelihood * prior = posterior).
Inference:
We can start by assuming that Μ follows a distribution with a certain pdf/pmf. This is our prior distribution. After observing the data, we may ask the following question:
- Can we update our prior knowledge of Μ based on the data that was observed to accurately represent the data generating process?
- Can we infer something about Μ? For example: H₀: Μ = c vs H₁: Μ ≠ c
Bayesian statistics allows us to answer both questions in a reasonable way. However, the quality of inference can be greatly affected by the choice of prior for the parameters. It is safer to use ‘non-informative’ priors, but this is not always the best strategy.