Naive Bayes Classifier



Table of Contents

A Naive Bayes classifier is based on Bayes' theorem, with the naive assumption that all features are independent. For a more descriptive term of the underlying probability model, we can call it the "independent feature model". A naive Bayes classifier assumes that a particular feature has no relation with any other feature.

Given a set of features \({F_1 \dots F_n }\), the probability model for a classifier is a conditional model of the form

$$p(C \vert F_1,\dots,F_n),$$

where the class variable \(C\), depends on features \({F_1 \dots F_n }\). Using Bayes' theorem, we write

$$p(C \vert F_1,\dots,F_n) = \frac{p(C) \ p(F_1,\dots,F_n\vert C)}{p(F_1,\dots,F_n)}$$

As the denominator does not depend on \(C\) and the features values are given, the denominator is constant and can be deleted from the model. We are then only interested in the numerator, which can be written as follows:

$$p(C) \ p(F_1\vert C) \ p(F_2\vert C, F_1) \ p(F_3\vert C, F_1, F_2) \ \dots p(F_n\vert C, F_1, F_2, F_3,\dots,F_{n-1})$$
By taking the *naive* conditional independence assumptions into account (i.e. assuming that each feature \\(F_i\\) is conditionally independent of every other feature \\(F_j\\) for \\(j \neq i\\) ), it is equal to:
$$p(C) \ p(F_1\vert C) \ p(F_2\vert C) \ p(F_3\vert C) \ \cdots p(F_n\vert C)$$
This means that the desired conditional distribution can be expressed as:

$$p(C \vert F_1,\dots,F_n) \simeq p(C) \prod_{i=1}^n p(F_i \vert C) \label{a:eq:NaiveBayes}$$

This is a naive Bayes model. Models of this kind are much more manageable, because they factor into independent probability distributions \(p(F_i\vert C)\).