Statistical inference concerns inferring conclusions based on data. In other words it is about approximating the unknown distribution by looking at the data generated by that distribution. There are several different approaches to computational learning theory.
Two main reasons for these differences are
- the different definitions of probability
- the different assumptions on the generation of samples
The different definitions of probability that lead to two different inference approaches include:
Frequentist inference
Frequentist inference uses the frequency interpretation of probability (i.e. the sampling distribution of a statistic), where any given experiment can be considered as one of an infinite sequence of possible repetitions of the same experiment [@Everitt2002]. In other words, by looking at the repeated sample, the frequentist properties of any statistical inference procedure can be described.
Bayesian inference
Bayesian inference is based on the probability that a particular hypothesis is true given some observed evidence. The key idea is that the probability of an event A given an event B depends not only on the relationship between events A and B but also on the marginal probability of occurrence of each event:
$$p(A|B) = \frac{p(B|A) p(A)}{p(B)}$$
Frequentist vs Bayesian
Frequentist | Bayesian | |
---|---|---|
Randomness | objective indefiniteness | subjective ignorance |
Variables | random and deterministic | everything is random |
Inference | maximum likelihood | Bayes theorem |
Estimates | maximum likelihood estimation (MLE) | maximum a posteriori probability (MAP) |
Statistical inference of a hypothesis
Statistical inference of a hypothesis most often includes the following components:
Statistical Model
it models the random process that is supposed to generate the data, and describes how one or more random variables are related to one or more random variables.
Data Set
it is a particular realization of the random process. It comes from actual observations that are performed by sampling a statistical population. Each observation measures one or more attributes (i.e. features) of independent objects or individuals.
Data Generation Modeling Assumption
A statistical model includes a set of assumptions that describes the data generation from the observed data. There are three levels of modeling assumptions, as follows:
1. Fully-parametric
the data-generation process is described by a probability distribution involving only a finite number of unknown parameters. For example, a distribution of population values is described as truly Normal, with unknown mean and variance, and the data is generated by simple random sampling.
2. Non-parametric
there are fewer data-generation process assumptions (i.e. they are minimal) in comparison with the fully parametric models. For example, a continuous probability distribution can be estimated using the sample median when the data is generated by simple random sampling.
3. Semi-parametric
the data-generation process assumptions are between fully-parametric and non-parametric approaches. For example, population distribution can be estimated using a finite mean and the parametric assumption that the mean response level in the population is a linear function of some covariate.