Conditional Probability and Independence

Given two events $A$ and $B$ in a sample space $S$. Suppose that it is known that the event $B$ occurs. What can we say about the probability of the event $A$?

The new probability for $A$ is called the conditional probability given the event $B$. It is defined as $$P(A|B)=\frac{P(A\cap B)}{P(B)}.$$ Note that $P(A|B)$ is not defined if $P(B)$ is zero. We sometimes write $$P(A|B)\cdot P(B) = P(A\cap B)$$ to have a formula which is valid for all $B$. The second way of writing is also called the multiplication rule for conditional probabilities.

We can think of the conditional probability as a change of sample space from $S$ to $B$. Consequently, all formulas you compute for the probability function will have a counterpart for conditional probabilities. Note, for simple sample spaces we have $$P(A|B) = \frac{|A\cap B|}{|B|}.$$

Lets consider an example.

We roll two dice and observe that the sum is odd. What is the probabilty that the sum is less than $8$?

Let $B$ be the event that the sum is odd, $$ \begin{pmatrix} & 1 & 2 & 3 & 4 & 5 & 6 \\ 1 & 0 & B & 0 & B & 0 & B \\ 2 & B & 0 & B & 0 & B & 0 \\ 3 & 0 & B & 0 & B & 0 & B \\ 4 & B & 0 & B & 0 & B & 0 \\ 5 & 0 & B & 0 & B & 0 & B \\ 6 & B & 0 & B & 0 & B & 0 \end{pmatrix}$$ and $A$ the event that the sum is less than $8$, $$ \begin{pmatrix} & 1 & 2 & 3 & 4 & 5 & 6 \\ 1 & A & A & A & A & A & A \\ 2 & A & A & A & A & A & 0 \\ 3 & A & A & A & A & 0 & 0 \\ 4 & A & A & A & 0 & 0 & 0 \\ 5 & A & A & 0 & 0 & 0 & 0 \\ 6 & A & 0 & 0 & 0 & 0 & 0 \end{pmatrix}$$ Then $A\cap B$ is the event $$ \begin{pmatrix} & 1 & 2 & 3 & 4 & 5 & 6 \\ 1 & 0 & X & 0 & X & 0 & X \\ 2 & X & 0 & X & 0 & X & 0 \\ 3 & 0 & X & 0 & X & 0 & 0 \\ 4 & X & 0 & X & 0 & 0 & 0 \\ 5 & 0 & X & 0 & 0 & 0 & 0 \\ 6 & X & 0 & 0 & 0 & 0 & 0 \end{pmatrix}$$ Then $P(A\cap B)=\frac{12}{36}$ and $P(B)=\frac{18}{36}$ and $$P(A|B)=\frac{12}{18}$$

We can also take the point of view that we have changed the event space to the $18$ outcomes $$ \begin{pmatrix} & 1 & 2 & 3 & 4 & 5 & 6 \\ 1 & & B & & B & & B \\ 2 & B & & B & & B & \\ 3 & & B & & B & & B \\ 4 & B & & B & & B & \\ 5 & & B & & B & & B \\ 6 & B & & B & & B & \end{pmatrix}$$ and so $P(A|B)$ are the $12$ outcomes $$ \begin{pmatrix} & 1 & 2 & 3 & 4 & 5 & 6 \\ 1 & & A & & A & & A \\ 2 & A & & A & & A & \\ 3 & & A & & A & & 0 \\ 4 & A & & A & & 0 & \\ 5 & & A & & 0 & & 0 \\ 6 & A & & 0 & & 0 & \end{pmatrix}$$

The law of total probability

Assume we have written our sample space $$S=\bigcup_i B_i$$ as a disjoint union of events. Using the rule of sum we know that $P(A)=\sum_i P(A\cap B_i)$. Using the definition of conditional probability we have $P(A\cap B_i)=P(A|B_i)\cdot P(B_i)$. Combining these two pieces of information gives us the law of total probability $$P(A) = \sum_i P(A|B_i)P(B_i).$$ This formula can be very useful for computing $P(A)$ if both $P(A|B_i)$ and $P(B_i)$ are easy to compute. We look at an example.

You have two boxes. In box $1$ there is $40$ large marbles and $30$ small marbles. In box $2$ there is $10$ large marbles and $20$ small marbles. You pick a random box and then a random marble from that box. What is the probability that you pick a large marble?

Let $B_i$ be the event that box $i$ is selected, and let $A$ be the event that a large marble is selected. Then $$P(A)=P(A|B_1)P(B_1)+P(A|B_2)P(B_2)=\frac{40}{70}\cdot \frac{1}{2}+\frac{10}{30}\cdot \frac{1}{2}= \frac{19}{42}$$

Independence

Two events $A$ and $B$ are independent if $$P(A\cap B)=P(A)\cdot P(B).$$ Equivalently, if $P(A)>0$ and $P(B)>0$, $$P(A|B)=P(A)\text{ and }P(B|A)=P(B).$$ So $A$ and $B$ are independent if knowing $A$ does not influence the probability of $B$, and vice versa.

Three events $A$, $B$ and $C$ are independent if $$P(A\cap B) = P(A)P(B)$$ $$P(A\cap C)=P(A)P(C)$$ $$P(B\cap C)=P(B)P(C)$$ $$P(A\cap B\cap C)=P(A)P(B)P(C)$$ hold. If only the first three equations hold, then the events are called pairwise independent..

In this way we can extend the concepts of independence and pairwise independence to any number of events.

Bayes Theorem

Assume we have two events $A$ and $B$ with $P(B)>0$. Using the equation $P(A\cap B)=P(B|A)P(A)$ and the definition of independence, we get the first form of Bayes theorem: $$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$ The theorem is usually used together with a partition of the sample space $S=\cup_i A_i$, which gives us the formula $P(B)=\sum_i P(A_i)P(B|A_i)$ and the second and more useful form of Bayes theorem $$P(A_i|B) = \frac{P(B|A_i)P(A_i)}{\sum_i P(A_i)P(B|A_i)}$$ illustrated by the following examples:

You are looking at the the effectiveness of a new test for acute procastination. You know that $0.1$ per cent of the population has this condition. The test gives a positive result in $99$ per cent of the cases which have the condition and a negative result in $99$ per cent of the cases which do no have the condition. Assume that the test is positive on a random person. What is the probabilty that the person has the condition?

Instead of plugging the information into the formula, we will do the calculation step by step. We first introduce som notation. Let $C$ be the event that a person has the condition, let $pos$ be the event that the test is positive and let $min$ be the event that the test is negative. The question asks for the probability $P(C|pos)$ which we will compute using Bayes' theorem.

From the question we have the information $P(pos|C)=0.99$, $P(min|\overline{C})=0.99$ and $P(C)=0.001$. We have $$P(C|pos)=\frac{P(pos|C)\cdot P(C)}{P(pos)}$$ by Bayes' theorem. The only quantity missing is $P(pos)$. We will compute $P(pos)$ in several steps. First, $$P(pos)=P(pos\cap C)+P(pos\cap \overline{C})$$ by the Rule of Sum. Using the definition of conditional probability we have $$P(pos\cap C)=P(pos|C)\cdot P(C)=0.99\cdot 0.001=0.00099$$ and $$P(pos\cap \overline{C})=P(pos|\overline{C})\cdot P(\overline{C}).$$ We know that $$P(\overline{C}) = 1 - P(C) = 1 - 0.001=0.999$$ and $$P(pos|\overline{C})= 1-P(min|\overline{C})=1-0.99=0.01$$ And so $P(pos\cap \overline{C})=0.01\cdot 0.999=0.00999$. Putting all this together gives us $P(C)=0.00099+0.00999=0.01098$. Finally, $$P(C|pos)=\frac{0.99\cdot 0.001}{0.01098}\approx 0.09$$ or $9$ per cent.

The Monty Hall problem

You have to choose between opening three closed doors: Behind one door there is a car and behind the others athere are goats. You would like to pick the door with the car. So you pick a door. The door is not opened, instead another door is opened revealing a goat. You now have two choices: 1. You can stay with the door you picked, or 2. You can change door. What do you do?

Let $Ci$ be the event that the car is behind door $i$, $Y_i$ the event that you choose door $i$, and $D_i$ the even that door $i$ is opened after you choice of door.

There are nine cases to consider. Here we only consider the case where you choose door $1$. The other cases follow by symmetry. Let $Ci$ be the event that the car is behind door $i$ and $D_i$ the event that door $i$ is opened after you choose door $1$. We assume that: $$P(D_3 | C_1 ) = 0.5$$ $$P(D_3 | C_2 ) = 1.0$$ $$P(D_3 | C_3 ) = 0.0$$ We partition $$D_3 = (D_3\cap C_1) \cup (D_3 \cap C_2) \cap (D_3 \cap C_3)$$ and get $$P(D_3) = P(C_1)\cdot P(D_3|C_1)+P(C_2)\cdot P(D_3|C_2)+P(C_3)\cdot P(D_1|C_3)$$ $$= \frac{1}{3}\cdot 0.5 + \frac{1}{3}\cdot 1.0 + \frac{1}{3}\cdot 0 = \frac{1}{2}. $$ The probability of finding the car by switching to door $2$ is equal to $P(C_2 | D_1)$. We calculate using Bayes' Theorem: $$P(C_2 | D_3) = \frac{P(D_3 | C_2)\cdot P(C_2)}{P(D_3)}$$ $$=\frac{1.0\cdot \frac{1}{3}}{\frac{1}{2}}=\frac{2}{3}.$$

This is a very formal approach to the Monty Hall Problem. There are better and much more intuitive explanations.