Chapter 1 · Basic Probability

Basic Probability

This page now follows the attached probability chapter much more closely. It moves from conditional probability and multiplication rules into independence, Bayes’ theorem, random variables, expectation, variance, Bernoulli trials, and binomial distribution.

Conditional probability Multiplication theorem Independent events Bayes’ theorem Random variables Binomial distribution
1

What changes when one event is already known?

The key idea of this chapter is that the probability of an event can change once you know another event has already occurred. When that happens, you no longer work with the original sample space. You work with a reduced sample space containing only the outcomes compatible with the known event.

In the chapter’s opening example, three fair coins are tossed. If you are told that the first coin is a tail, then all outcomes beginning with head are immediately removed. The probability question is the same, but the space you count inside has changed.

P(E|F) = P(E ∩ F) / P(F)
  • The denominator is always the event you are conditioning on.
  • Conditional probability only makes sense when P(F) ≠ 0.
  • The event E ∩ F counts the outcomes that satisfy both conditions together.
  • Many textbook and research questions become easier once you explicitly redraw the reduced sample space.
Useful Example from the PDF

A family has two children. Given that at least one child is a boy, the sample space becomes {(b,b), (b,g), (g,b)}. Only one of these three outcomes has two boys, so the conditional probability is 1/3.

Another Short Example

If a card numbered 1 to 10 is drawn and you are told the number is greater than 3, the reduced sample space is {4,5,6,7,8,9,10}. The even outcomes are {4,6,8,10}, so the conditional probability is 4/7.

2

How do we calculate the probability of events happening together?

The multiplication theorem grows directly out of conditional probability. If you want the probability that two events occur together, first take one event, then multiply by the conditional probability of the second event given the first.

P(E ∩ F) = P(E)P(F|E) = P(F)P(E|F)

This rule is especially useful in multi-stage experiments such as drawing cards without replacement, drawing balls from an urn, or following a branching process with a tree diagram.

  • For three events, keep extending the chain with the next conditional probability.
  • Without replacement, the later probability often changes because the first draw changes the pool.
  • Tree diagrams make the multiplication rule easier to visualize.
Useful Example from the PDF

An urn contains 10 black and 5 white balls. If two balls are drawn without replacement, then P(first black) = 10/15 and P(second black | first black) = 9/14. So the probability that both are black is (10/15)(9/14) = 3/7.

Sequential card example

For three cards drawn without replacement, the chapter uses the event “first two are kings and the third is an ace” and multiplies one stage at a time.

Why this matters

The theorem gives a clean way to move from “given that” language into multi-step probability calculations.

3

Independent Events

Two events are independent when the occurrence of one does not affect the probability of the other. The chapter expresses this in two equivalent ways: either conditioning does not change the probability, or the probability of the intersection equals the product of the probabilities.

P(E|F) = P(E)   or   P(E ∩ F) = P(E)P(F)
  • Independent events are not the same as mutually exclusive events.
  • Mutually exclusive events with nonzero probability cannot be independent.
  • Repeated throws of a fair die or repeated tosses of a fair coin are standard independent experiments.
Useful Example from the PDF

Let E be “the card drawn is a spade” and F be “the card drawn is an ace”. Then P(E)=1/4, P(F)=1/13, and P(E ∩ F)=1/52. Since (1/4)(1/13)=1/52, the two events are independent.

Another Short Example

For one die throw, let E be “multiple of 3” and F be “even”. Then E={3,6}, F={2,4,6}, and E ∩ F={6}. Again, P(E ∩ F)=1/6=(1/3)(1/2), so the events are independent.

4

How do we reverse a probability question?

Once a sample space is split into disjoint cases, the theorem of total probability lets you rebuild the probability of an event by adding the contribution from each case. Bayes’ theorem then turns this around: it tells you how to update the probability of a hidden cause after observing an outcome.

P(A) = Σ P(Ei)P(A|Ei)
P(Ei|A) = P(Ei)P(A|Ei) / Σ P(Ej)P(A|Ej)
  • Total probability needs a partition of the sample space.
  • Bayes’ theorem updates the probability of a cause after an outcome is observed.
  • This is one of the most important bridges from descriptive counting to probabilistic inference.
Useful Example from the PDF

One of four boxes is selected at random and then a ball is drawn. If the ball is black, Bayes’ theorem can be used to find the probability that it came from box III. The chapter uses exactly this structure to show how evidence changes the probability of the original source.

Machine setup example

The chapter also uses a production example: after two acceptable items are observed, Bayes’ theorem updates the probability that the machine was correctly set up.

What students should notice

Bayes does not create information; it reorganizes prior probabilities and observed evidence into an updated answer.

5

From outcomes to numbers

A random variable is a real-valued function defined on the sample space of a random experiment. Instead of keeping track of raw outcomes like HHT or FFF, we assign a number that represents the feature we want to study.

For example, if three Bernoulli trials are performed, the random variable might count the number of successes. This produces a probability distribution showing which values are possible and how likely each one is.

X : x1, x2, ... , xn   with   P(X=xi) = pi
  • A random variable converts a probability problem into a numerical distribution.
  • The probabilities in a distribution are all non-negative and add up to 1.
  • Once the distribution is known, you can calculate the mean and variance of the random variable.
Useful Example from the PDF

In three Bernoulli trials, let X be the number of successes. Then X can take the values 0, 1, 2, or 3. The probability distribution of X comes directly from counting the ways each number of successes can happen.

6

Mean and Variance of a Probability Distribution

Once a probability distribution is known, two of the most important summaries are the mean and the variance. The mean, also called the expectation, gives the long-run average value of the random variable. The variance measures how far values typically spread around that mean.

E(X) = Σ xP(x)
Var(X) = E[(X - μ)2] = E(X2) - [E(X)]2
  • The mean is a weighted average using probabilities as weights.
  • Variance is always non-negative.
  • The standard deviation is the positive square root of the variance.
  • Mean describes centre; variance describes spread.
Useful Example from the PDF

The chapter finds the mean of a binomial distribution B(4, 1/3). The full table is written out first, and then each value of x is multiplied by its probability and added to obtain the mean.

Open the examples lab for simpler interactive intuition
7

Bernoulli Trials and Binomial Distribution

When is a trial Bernoulli?

The chapter defines Bernoulli trials very carefully. A sequence of trials is Bernoulli when the number of trials is finite, each trial has exactly two outcomes, the trials are independent, and the probability of success remains the same each time.

  • Success and failure are labels; they do not mean “good” and “bad”.
  • Drawing with replacement can give Bernoulli trials because the success probability stays constant.
  • Drawing without replacement usually breaks the Bernoulli condition because the probability changes after each draw.

The binomial model

If X counts the number of successes in n Bernoulli trials, then X has a binomial distribution. The probability of exactly x successes is:

P(X=x) = nCx qn-x px   where   q = 1 - p
Useful Example from the PDF

If a fair coin is tossed 10 times, the chapter asks for the probability of exactly six heads, at least six heads, and at most six heads. All three are direct applications of the binomial formula.

Another Useful Example

If 10% of eggs are defective and 10 eggs are drawn with replacement, the probability of at least one defective egg is found by complement: 1 - P(X=0) = 1 - (9/10)10.

Use a binomial model when

You are counting how many successes occur across repeated independent trials with the same success probability.

Do not use it when

The probability changes between trials, such as most without-replacement settings.