Understanding the naïve Bayes algorithm with its implementation in C# - Part 2

In this article, we aim to provide a concise presentation of probability theory and introduce key concepts such as independence and conditional independence.

Probability theory, admittedly, is not a straightforward topic. Having acknowledged that probability enables us to model chance, the challenge lies in mathematically expressing this intuitive concept. In this post, we will strive to be both illustrative and thorough, delving into the intricacies without neglecting the essential details.

What is probability theory ?

Probability theory is a branch of mathematics that provides a framework for modeling and quantifying uncertainty or randomness. It deals with the analysis of random phenomena and the likelihood of different outcomes occurring in a given event or experiment. The primary goal of probability theory is to formalize and mathematically describe the principles governing randomness and uncertainty.

In this article, we will presume that the reader is already familiar with the concept of probability. While our blog typically adopts a holistic approach, providing comprehensive details, the complexity of the probability field discourages a thorough exploration of the underlying theory in this instance. Instead, our focus will be on emphasizing the crucial concepts of independence and conditional independence.

What is independence ?

In probability theory, independence refers to the concept that the occurrence or non-occurrence of one event does not influence the occurrence or non-occurrence of another event. In simpler terms, two events are considered independent if the outcome of one event has no impact on the likelihood or outcome of the other event.

Mathematically, two events $A$ and $B$ are independent if the probability of both events occurring together (denoted as $P(A ∩ B)$) is equal to the product of their individual probabilities.

$$P(A∩B)=P(A)⋅P(B)$$

Example 1

Consider, for instance, a scenario where we are required to toss a fair coin twice. After the first toss, which yields heads, we document the result on a small piece of paper. Subsequently, an unexpected phone call interrupts our experiment, leading to a 25-minute conversation with our mother-in-law. Upon resuming our experiment after the call, we toss the coin for the second time. Is the outcome of this toss influenced by the one conducted 25 minutes earlier ? Clearly not; hence, the two tosses are independent of each other.

Example 2

In a situation where we select a card from a standard deck of 52 cards, the probability of drawing an ace is evident: $P(ace)=\dfrac{4}{52}$, considering there are four aces in the deck. Now, if we refrain from replacing the initially drawn card in the deck and proceed to choose another card, what becomes the probability of drawing an ace on this second draw ?

If an ace was drawn in the initial draw, there are now three aces among the 51 cards remaining in the deck, and so $P(ace in second draw)=\dfrac{3}{51}$. Conversely, if an ace was not drawn initially, there are still four aces among the remaining 51 cards, and so $P(ace in second draw)=\dfrac{4}{51}$. In either scenario, the outcome of the second draw is contingent on the result of the first draw, indicating that the two events are not independent.

By considering only independence, we can formulate intriguing and foundational theorems, such as the law of large numbers. Therefore, this framework often serves as a convenient approach for modeling certain situations, as mentioned earlier.

But if probability theory were confined solely to examining the independence of events or could only model independent events, it would lack utility and fail to accurately represent reality (like in the second example above). This is why mathematicians have incorporated the concept of conditional probability.

What is a conditional probability ?

Conditional probability is a concept in probability theory that measures the likelihood of an event occurring given that another event has already occurred. It is denoted by $P(A|B)$ and read as "the probability of $A$ given $B$". Here, $A$ and $B$ are events, and $P(A|B)$ represents the probability of event $A$ occurring given that event $B$ has occurred.

The fundamental formula for conditional probability is the following.

$$P(A|B)=\dfrac{P(A∩B)}{P(B)}$$ ​

  • $P(A∩B)$ is the probability of both events $A$ and $B$ occurring (joint probability).
  • $P(B)$ is the probability of event $B$ occurring.

The concept of conditional probability allows us to adjust probabilities based on new information or the occurrence of a related event. It plays a crucial role in understanding and modeling situations where the outcome of one event may depend on the occurrence of another event.

From this definition, we can infer the following two fundamental properties.

Theorem 1 (chain rule of probability)

The chain rule of probability allows us to express the joint probability of multiple events in terms of conditional probabilities. It provides a way to decompose a joint probability into a series of conditional probabilities.

For two events, $A$ and $B$, the chain rule is expressed as $P(A∩B)=P(A)⋅P(B|A)$

This equation states that the probability of both events $A$ and $B$ occurring is equal to the probability of $A$ occurring multiplied by the probability of $B$ occurring given that $A$ has occurred.

For three events, $A$, $B$, and $C$, the chain rule expands to $P(A∩B∩C)=P(A)⋅P(B|A)⋅P(C|A∩B)$

In general, for $n$ events $A_{1}$, $A_{2}$,..., $A_{n}$, the chain rule can be expressed as the following. $$P(A_{1}∩A_{2}∩...∩A_{n})=P(A_{1})P(A_{2}|A_{1})P(A_{3}|A_{1}∩A_{2})...P(A_{n}|A_{1}∩...∩A_{n-1})$$

The chain rule is a crucial tool for calculating joint probabilities and understanding the relationships between multiple events.

Give me an example, please !

When drawing 4 cards without replacement from a standard deck of 52 cards, what is the probability of drawing all four aces ?

Let $A_{i}$ (for $i=1,...4$) represent the event "The $i$th drawn card is an ace". We need to calculate $P(A_{1}∩A_{2}∩A_{3}∩A_{4})$. We know that $P(A_{1})=\dfrac{4}{52}$ and the probabilities currently available to us are conditional probabilities.

$$P(A_{2}|A_{1})=\dfrac{3}{51}$$ $$P(A_{3}|A_{1}∩A_{2})=\dfrac{2}{50}$$ $$P(A_{4}|A_{1}∩A_{2}∩A_{3})=\dfrac{1}{49}$$

$$P(A_{1}∩A_{2}∩A_{3}∩A_{4})=P(A_{1})P(A_{2}|A_{1})P(A_{3}|A_{1}∩A_{2})P(A_{4}|A_{1}∩A_{2}∩A_{3})$$

So, $P(A_{1}∩A_{2}∩A_{3}∩A_{4})=\dfrac{4.3.2.1}{52.51.50.49}=0.00000369$.

Theorem 2 (Bayes' theorem)

Bayes' theorem is a fundamental principle in probability theory that describes the probability of an event based on prior knowledge of conditions that might be related to the event. Named after the Reverend Thomas Bayes, who formulated the theorem, it is particularly useful in updating probabilities when new evidence becomes available.

The theorem is expressed mathematically as $P(A|B)= \dfrac{P(B|A)P(A)}{P(B)}$

  • $P(A|B)$ is the probability of event $A$ occurring given that event $B$ has occurred.
  • $P(B|A)$ is the probability of event $B$ occurring given that event $A$ has occurred.
  • $P(A)$ is the prior probability of event $A$.
  • $P(B)$ is the prior probability of event $B$.

Bayes' theorem allows us to update our belief in the probability of an event ($A$) based on new evidence ($B$).

Give me an example, please !

The following example is extracted from this website.

We are planning a picnic today, but the morning is cloudy.

  • 50% of all rainy days start off cloudy.
  • But cloudy mornings are common (about 40% of days start cloudy).
  • And this is usually a dry month (only 3 of 30 days tend to be rainy, or 10%).

What is the chance of rain during the day ?

Let $Rain$ represent the event "It will rain during the day" and $Cloud$ represent "It is cloudy in the morning". We have to calculate $P(Rain|Cloud)$. We apply the Bayes' theorem.

$$P(Rain|Cloud)= \dfrac{P(Rain)P(Cloud|Rain)}{P(Cloud)}$$

Based on our assumptions, $P(Rain)=\dfrac{1}{10}$, $P(Cloud|Rain)=\dfrac{1}{2}$, $P(Cloud)=\dfrac{4}{10}$.

$$P(Rain|Cloud)=\dfrac{\dfrac{1}{10}.\dfrac{1}{2}}{\dfrac{4}{10}}=0.125$$

What is conditional independence ?

Conditional independence refers to a situation in probability theory where the occurrence of one event does not provide any additional information about the occurrence of another event, given the knowledge of a third event. In simpler terms, two events $A$ and $B$ are conditionally independent given a third event $C$ if the probability of $A$ occurring is not influenced by the knowledge of whether $B$ has occurred, and vice versa, given that $C$ has occurred.

Mathematically, conditional independence is represented as $P(A∩B|C)=P(A|C)⋅P(B|C)$

This equation states that the joint probability of $A$ and $B$ given $C$ is equal to the product of the conditional probabilities of $A$ given $C$ and $B$ given $C$.

If this equation holds true, it implies that the occurrence or non-occurrence of one event does not provide any new information about the occurrence or non-occurrence of the other event, given the presence of the third event.

Conditional independence allows for a more nuanced understanding of the relationships between different events in the presence of additional information.

Caution

Conditional independence can be a challenging concept to grasp.

Equipped with these recollections and notations, we can now employ these principles in the context of our current application.

Understanding the naïve Bayes algorithm with its implementation in C# - Part 3