Implementing the Google's autocomplete feature with ngram language models - Part 3
In this post, we will delve into the concept of probabilities, with a particular emphasis on conditional probabilities. This foundational understanding will prepare us for the next post, where we will explore n-gram language models in greater depth.
Probability theory, admittedly, is not a straightforward topic. Having acknowledged that probability enables us to model chance, the challenge lies in mathematically expressing this intuitive concept. Here, we will strive to be both illustrative and thorough, delving into the intricacies without neglecting the essential details.
What is probability theory ?
Probability theory is a branch of mathematics that provides a framework for modeling and quantifying uncertainty or randomness. It deals with the analysis of random phenomena and the likelihood of different outcomes occurring in a given event or experiment. The primary goal of probability theory is to formalize and mathematically describe the principles governing randomness and uncertainty.
In this article, we will presume that the reader is already familiar with the concept of probability. While our blog typically adopts a holistic approach, providing comprehensive details, the complexity of the probability field discourages a thorough exploration of the underlying theory in this instance. Instead, our focus will be on emphasizing the crucial concepts of independence and conditional independence.
What is independence ?
In probability theory, independence refers to the concept that the occurrence or non-occurrence of one event does not influence the occurrence or non-occurrence of another event. In simpler terms, two events are considered independent if the outcome of one event has no impact on the likelihood or outcome of the other event.
Mathematically, two events $A$ and $B$ are independent if the probability of both events occurring together (denoted as $P(A ∩ B)$) is equal to the product of their individual probabilities.
$$P(A∩B)=P(A)⋅P(B)$$
Consider, for instance, a scenario where we are required to toss a fair coin twice. After the first toss, which yields heads, we document the result on a small piece of paper. Subsequently, an unexpected phone call interrupts our experiment, leading to a 25-minute conversation with our mother-in-law. Upon resuming our experiment after the call, we toss the coin for the second time. Is the outcome of this toss influenced by the one conducted 25 minutes earlier ? Clearly not; hence, the two tosses are independent of each other.
In a situation where we select a card from a standard deck of 52 cards, the probability of drawing an ace is evident: $P(ace)=\dfrac{4}{52}$, considering there are four aces in the deck. Now, if we refrain from replacing the initially drawn card in the deck and proceed to choose another card, what becomes the probability of drawing an ace on this second draw ?
If an ace was drawn in the initial draw, there are now three aces among the 51 cards remaining in the deck, and so $P(ace in second draw)=\dfrac{3}{51}$. Conversely, if an ace was not drawn initially, there are still four aces among the remaining 51 cards, and so $P(ace in second draw)=\dfrac{4}{51}$. In either scenario, the outcome of the second draw is contingent on the result of the first draw, indicating that the two events are not independent.
By considering only independence, we can formulate intriguing and foundational theorems, such as the law of large numbers. Therefore, this framework often serves as a convenient approach for modeling certain situations, as mentioned earlier.
But if probability theory were confined solely to examining the independence of events or could only model independent events, it would lack utility and fail to accurately represent reality (like in the second example above). This is why mathematicians have incorporated the concept of conditional probability.
What is a conditional probability ?
Conditional probability is a concept in probability theory that measures the likelihood of an event occurring given that another event has already occurred. It is denoted by $P(A|B)$ and read as "the probability of $A$ given $B$". Here, $A$ and $B$ are events, and $P(A|B)$ represents the probability of event $A$ occurring given that event $B$ has occurred.
The fundamental formula for conditional probability is the following.
$$P(A|B)=\dfrac{P(A∩B)}{P(B)}$$
- $P(A∩B)$ is the probability of both events $A$ and $B$ occurring (joint probability).
- $P(B)$ is the probability of event $B$ occurring.
The concept of conditional probability allows us to adjust probabilities based on new information or the occurrence of a related event. It plays a crucial role in understanding and modeling situations where the outcome of one event may depend on the occurrence of another event.
From this definition, we can infer the following two fundamental properties.
The chain rule of probability allows us to express the joint probability of multiple events in terms of conditional probabilities. It provides a way to decompose a joint probability into a series of conditional probabilities.
For two events, $A$ and $B$, the chain rule is expressed as $P(A∩B)=P(A)⋅P(B|A)$
This equation states that the probability of both events $A$ and $B$ occurring is equal to the probability of $A$ occurring multiplied by the probability of $B$ occurring given that $A$ has occurred.
For three events, $A$, $B$, and $C$, the chain rule expands to $P(A∩B∩C)=P(A)⋅P(B|A)⋅P(C|A∩B)$
In general, for $n$ events $A_{1}$, $A_{2}$,..., $A_{n}$, the chain rule can be expressed as the following. $$P(A_{1}∩A_{2}∩...∩A_{n})=P(A_{1})P(A_{2}|A_{1})P(A_{3}|A_{1}∩A_{2})...P(A_{n}|A_{1}∩...∩A_{n-1})$$
The chain rule is a crucial tool for calculating joint probabilities and understanding the relationships between multiple events.
This theorem might seem intimidating at first, but it can actually be proven through induction in just a few lines.
Give me an example, please !
Let’s revisit our earlier example, where we analyzed the probability of drawing aces from a standard deck of 52 cards. More precisely, when drawing 4 cards without replacement from a standard deck of 52 cards, what is the probability of drawing all four aces ?
Let $A_{i}$ (for $i=1,...4$) represent the event "The $i$th drawn card is an ace" (thus, for example, $A_{1}$ is the event "The first drawn card is an ace" and $A_{3}$ is the event "The third drawn card is an ace"). We need to calculate $P(A_{1}∩A_{2}∩A_{3}∩A_{4})$. We know that $P(A_{1})=\dfrac{4}{52}$ and the probabilities currently available to us are conditional probabilities.
$$P(A_{2}|A_{1})=\dfrac{3}{51}$$ $$P(A_{3}|A_{1}∩A_{2})=\dfrac{2}{50}$$ $$P(A_{4}|A_{1}∩A_{2}∩A_{3})=\dfrac{1}{49}$$
All that remains is to apply the chain rule of probability.
$$P(A_{1}∩A_{2}∩A_{3}∩A_{4})=P(A_{1})P(A_{2}|A_{1})P(A_{3}|A_{1}∩A_{2})P(A_{4}|A_{1}∩A_{2}∩A_{3})$$
So, $P(A_{1}∩A_{2}∩A_{3}∩A_{4})=\dfrac{4.3.2.1}{52.51.50.49}=0.00000369$.
With a clearer understanding of key probability concepts, we can now turn to our core topic and explore how these concepts can be applied to n-gram language models.
Implementing the Google's autocomplete feature with ngram language models - Part 4