Conditional Probability

class: center, middle, inverse, title-slide

# Conditional Probability
## <br><br> Introduction to Data Science
### <a href="https://sta198f2021.github.io/website/">Course Website</a>
### <br> Prof. Amy Herring

---

layout: true
  
<div class="my-footer">
<span>
<a href="https://sta198f2021.github.io/website/" target="_blank">Back to website</a>
</span>
</div>

---

## Conditional Probability

Often we wish to know the probabilty an event will occur given that another event has occured.

For example, instead of the **marginal** probability of contracting COVID (regardless of vaccination status), we may wish to know the probability that someone will contract COVID given that they have been vaccinated, or the probability someone will contract COVID given that they have not been vaccinated.

These are examples of **conditional probability**. The conditional probability that someone is a smoker (A) given that they identify as female (B) is denoted `$P(A|B)$`, which we say as "probability of A given B."

---

## Simple Example

Suppose we have a small set containing 3 female non-smokers, 1 female smoker, 4 non-female non-smokers, and 4 non-female smokers.

If we feel this set is a sample representative of a population of interest, we can **estimate** the probability someone is a smoker as `$\frac{1+4}{3+1+4+4}=0.42$`.

We may be interested in the conditional probability someone is a smoker given that they identify as female, which we can estimate as `$\frac{1}{3+1}$` -- we just change the denominator to correspond to our smaller population of interest.

---

## Conditional Probability

More formally, we define **conditional probability** as 
`$P(A|B)=\frac{P(A \cap B)}{P(B)}$` (verify this on your own using the simple example on the prior slide).

Manipulating this formula, we get the **multiplicative rule of probability**: `$P(A \cap B)=P(B)P(A|B)$`.

One more helpful rule is the **law of total probability**:

`$P(A)=P(A \mid B)P(B) + P(A \mid \overline{B})P(\overline{B}) = P(A \cap B)+P(A \cap \overline{B})$`, which translates to the obvious statement that the probability that A occurs is equal to the sum of the probabilities that A occurs with B and that A occurs without B.

---

## Recall Our Vaccine Hesitancy Cohort

| <div style="width:120px;text-align:left">Ethnicity</div> | <div style="width:340px;text-align:center">Vaccine Hesitant</div> | <div style="width:340px;text-align:center">Not Hesitant</div> |
|:------|------:|-------:|
| White British or Irish | 1362 | 7368 |
| Other white background | 71 | 199 |
| Mixed | 55 | 115 |
| Asian or Asian British - Indian | 37 | 143 |
| Asian or Asian British - Pakistani/Bangladeshi | 85 | 115 |
| Asian or Asian British - other | 15 | 95 |
| Black or Black British | 136 | 54 |
| Other Ethnic Group or Not Specified | 31 | 119 |

---

## Three Probabilities

Define events A=vaccine hesitant and B=Asian or Asian British-Indian. Calculate the following probabilities for a randomly-selected person in this cohort.

- **Marginal probability** of vaccine hesitancy, `$P(A)$`

- **Joint probability** of vaccine hesitancy and Indian ethnicity, `$P(A \cap B)$`

- **Conditional probability** of vaccine hesitancy given a person is of Indian ethnicity, `$P(A \mid B)$`

---

## Independence

Events `$A$` and `$B$` are **independent** when `$P(A \mid B)=P(A)$` or `$P(B \mid A)=P(B)$`.

In other words, knowing that one event has occurred does not lead to any change in the probability we assign to another event.

---

## Checking Independence

We can use the multiplicative rule to check if two events are independent.

If events A and B are independent, then `$$P(A \cap B)=P(A)\times P(B).$$`

**Check it out**: Are vaccine hesitancy and Indian ethnicity independent in our cohort?
---

## Independent vs Disjoint Events

For **independent events** `$A$` and `$B$`, `$P(A \mid B)=P(A)$` and `$P(B \mid A)=P(B)$`, so knowing one event occurred tells us *nothing* about the chances the other event will occur.

For two **disjoint** or **mutually exclusive** events, knowing that one event has occurred tells us that the other event definitely has not occurred, e.g. `$P(A \cap B)=0$`.

Disjoint events are therefore *not* independent!

---
## Example: Breast Cancer Screening

Let `$A$` be the event that a woman has breast cancer (e.g., **prevalence** in population for a certain age group).  Say `$P(A)=0.01$` for a 40-year-old woman.

Let `$B$` be the event that a screening mammogram is positive.

Once a person has a positive mammogram, our mental estimate of the probability she has breast cancer, now `$P(A|B)$`, has increased.  How much should it increase?  Are we certain she has cancer, e.g. P(cancer `$\mid$` mammo positive)=1, or is there some chance the test is wrong?

---

## Sensitivity and Specificity

A=has cancer
B=mammogram positive

A *diagnostic test* like a mammogram is often characterized by its quality -- we want a test to have good sensitivity (picking up cancer when a person really has it) and specificity (ruling out cancer when a person is cancer-free).

Sensitivity is `$P(B|A)$`, and specificity is `$P(\overline{B} \mid \overline{A})$`.

A typical screening mammogram has sensitivity of 85% and specificity of 90%.

---

## Bayes' Theorem

**Bayes' theorem** gives us a formal way to update our beliefs based on new information.  It says `$P(A \mid B)=\frac{P(B \mid A)P(A)}{P(B)}=\frac{P(A \cap B)}{P(B)}$`.

In this example, a 40 year old woman with a positive screening mammogram may wish to know her chances of having cancer. Several papers have shown that even doctors tend to strongly overestimate her chances of having cancer.

We'll consider two ways to solve this problem: one way using Bayes' formula directly, and another based on a "hypothetical 10000" table, which applies known probabilities to a hypothetical population of 10,000 40 year old women.

---

## Hypothetical 10,000

.pull-left[

We really are still using Bayes' theorem here, but it is hidden behind the scenes.

Here's what we know.

1. The prevalance of breast cancer among 40 year old women is 1% or 0.01.
2. The sensitivity of a screening mammogram for diagnosing cancer is 85% or 0.85.
3. The specificity is 90% or 0.90.

]

.pull-right[
Let's construct a 2x2 table comparing true cancer status and mammogram results in our hypothetical population of 10,000 women.

| | Cancer | No Cancer | Total |
|---|---:|---:|---:|
| Mammo + | | | |
| Mammo - | | | |
| Total | | | 10000 |
]

---

## Hypothetical Population

.pull-left[

1. The prevalance of breast cancer among 40 year old women is 1% or 0.01.
2. The sensitivity of a screening mammogram for diagnosing cancer is 85% or 0.85.
3. The specificity is 90% or 0.90.

]

.pull-right[

Item 1 says the prevalence in this group is 1%, so then we expect to have `$10000\times0.01=100$` cases and `$10000\times0.99=9900$` cancer-free women.

| | Cancer | No Cancer | Total |
|---|---:|---:|---:|
| Mammo + | | | |
| Mammo - | | | |
| Total | 100 | 9900 | 10000 |
]

---

## Hypothetical Population

.pull-left[

<ol class="org-ol">
<li value="2">The sensitivity of a screening mammogram for diagnosing cancer is 85% or 0.85.</li>
<li value="3">The specificity is 90% or 0.90.</li>

</ol>

]

.pull-right[

Item 2 gives the sensitivity, so `$P(\text{mammo +} \mid \text{cancer})=0.85$`.

Thus in the group of 100 women with cancer, the mammogram should pick up `$100\times0.85=85$` of them, and miss the remaining `$100-85=15$`.

| | Cancer | No Cancer | Total |
|---|---:|---:|---:|
| Mammo + | 85 | | |
| Mammo - | 15 | | |
| Total | 100 | 9900 | 10000 |
]

---

## Hypothetical Population

.pull-left[

<ol class="org-ol">
<li value="3">The specificity is 90% or 0.90.</li>
</ol>

]

.pull-right[

Item 3 gives the specificity, so `$P(\text{mammo -} \mid \text{no cancer})=0.90$`.

Thus in the group of 9900 women without cancer, the mammogram should correctly identify `$9900*0.90=8910$` of them as being cancer-free, and it will mistakenly identify `$9900-8910=990$` as having cancer (false positives).

| | Cancer | No Cancer | Total |
|---|---:|---:|---:|
| Mammo + | 85 | 990 | |
| Mammo - | 15 | 8910 | |
| Total | 100 | 9900 | 10000 |
]

---

## Hypothetical Population

.pull-left[

Now we complete the table by filling in the row totals.

]

.pull-right[

| | Cancer | No Cancer | Total |
|---|---:|---:|---:|
| Mammo + | 85 | 990 | 1075 |
| Mammo - | 15 | 8910 | 8925 |
| Total | 100 | 9900 | 10000 |

At this point, it's easy to calculate the conditional probability of cancer given a positive mammogram as `$\frac{85}{1075}=0.079$`.

]

---

## Bayes' Theorem in Action

Alternatively, we could just use Bayes' Theorem directly.

- Baseline probability of cancer `$P(A)=0.01$` (prevalence)

- She wants to know `$P(A \mid B)$`, or her chances of having cancer given that her mammogram is positive (also called **positive predictive value**).

- Bayes' Theorem: `$P(A \mid B)=\frac{P(B \mid A)P(A)}{P(B)}$`.

- Sensitivity is `$P(B \mid A)=0.85$`.

---

## Bayes' Theorem in Action

- How do we get `$P(B)$`?  We can get this using the law of total probability:  `$P(B)=P(B \mid A)P(A) + P(B \mid \overline{A})P(\overline{A})$`.

- We can get `$P(B \mid \overline{A})$` using the specificity `$P(\overline{B} \mid \overline{A})=0.90$` and the fact that `$P(\overline{B} \mid \overline{A})+P(B \mid \overline{A})=1$`.  So `$P(B \mid \overline{A})=1-0.9=0.1$`.
  
  - Then
`$P(B)=P(B \mid A)P(A) + P(B \mid \overline{A})P(\overline{A})=0.85\times0.01+0.1\times0.99=0.1075$`

- Then `$P(A \mid B)=\frac{P(B \mid A)P(A)}{P(B)}=\frac{0.85*0.01}{0.1075}=0.079$`.

---

## Bayes' Theorem and Baseline Prevalence

Here, we can think of the 1% prevalence of breast cancer among 40 year old women as our **prior probability** a woman has cancer, and 7.9% as the **posterior probability** she has cancer after we see the data that her screening mammogram is positive.

Cancer would be confirmed or ruled out by subsequent testing, such as a diagnostic mammogram, ultrasound, or biopsy. At the time of the subsequent testing, we'd have an updated baseline probability of 7.9% for our 40 year old woman.

---

## You Try It!

Most people who have a negative test result (e.g., mammogram looks good or COVID test negative) don't worry any longer about whether they really do have disease. Are they right not to worry? Suppose our 40 year old woman with a baseline 1% breast cancer risk instead had a negative (all clear) mammogram. What is the updated probability she has breast cancer given this test result?

---

## You Try It!

For more practice, go back to the vaccine hesitancy example.  
1. Calculate the percentage of each ethnic group in the population provided. We can view this as the **marginal probability** of each ethnicity in that population.

2. Given that someone is vaccine hesitant, calculate the **conditional probability** that the person belongs to each ethnic group, in turn. This information would be useful in helping to target policies and outreach campaigns.