This page now follows the attached probability chapter much more closely. It moves from conditional probability and multiplication rules into independence, Bayes’ theorem, random variables, expectation, variance, Bernoulli trials, and binomial distribution.
The key idea of this chapter is that the probability of an event can change once you know another event has already occurred. When that happens, you no longer work with the original sample space. You work with a reduced sample space containing only the outcomes compatible with the known event.
In the chapter’s opening example, three fair coins are tossed. If you are told that the first coin is a tail, then all outcomes beginning with head are immediately removed. The probability question is the same, but the space you count inside has changed.
P(F) ≠ 0.E ∩ F counts the outcomes that satisfy both conditions together.
A family has two children. Given that at least one child is a boy, the sample space becomes
{(b,b), (b,g), (g,b)}. Only one of these three outcomes has two boys, so the conditional
probability is 1/3.
If a card numbered 1 to 10 is drawn and you are told the number is greater than 3, the reduced sample
space is {4,5,6,7,8,9,10}. The even outcomes are {4,6,8,10}, so the
conditional probability is 4/7.
The multiplication theorem grows directly out of conditional probability. If you want the probability that two events occur together, first take one event, then multiply by the conditional probability of the second event given the first.
This rule is especially useful in multi-stage experiments such as drawing cards without replacement, drawing balls from an urn, or following a branching process with a tree diagram.
An urn contains 10 black and 5 white balls. If two balls are drawn without replacement, then
P(first black) = 10/15 and P(second black | first black) = 9/14. So the
probability that both are black is (10/15)(9/14) = 3/7.
For three cards drawn without replacement, the chapter uses the event “first two are kings and the third is an ace” and multiplies one stage at a time.
The theorem gives a clean way to move from “given that” language into multi-step probability calculations.
Two events are independent when the occurrence of one does not affect the probability of the other. The chapter expresses this in two equivalent ways: either conditioning does not change the probability, or the probability of the intersection equals the product of the probabilities.
Let E be “the card drawn is a spade” and F be “the card drawn is an ace”.
Then P(E)=1/4, P(F)=1/13, and P(E ∩ F)=1/52. Since
(1/4)(1/13)=1/52, the two events are independent.
For one die throw, let E be “multiple of 3” and F be “even”.
Then E={3,6}, F={2,4,6}, and E ∩ F={6}. Again,
P(E ∩ F)=1/6=(1/3)(1/2), so the events are independent.
Once a sample space is split into disjoint cases, the theorem of total probability lets you rebuild the probability of an event by adding the contribution from each case. Bayes’ theorem then turns this around: it tells you how to update the probability of a hidden cause after observing an outcome.
One of four boxes is selected at random and then a ball is drawn. If the ball is black, Bayes’ theorem can be used to find the probability that it came from box III. The chapter uses exactly this structure to show how evidence changes the probability of the original source.
The chapter also uses a production example: after two acceptable items are observed, Bayes’ theorem updates the probability that the machine was correctly set up.
Bayes does not create information; it reorganizes prior probabilities and observed evidence into an updated answer.
A random variable is a real-valued function defined on the sample space of a random experiment. Instead of
keeping track of raw outcomes like HHT or FFF, we assign a number that represents
the feature we want to study.
For example, if three Bernoulli trials are performed, the random variable might count the number of successes. This produces a probability distribution showing which values are possible and how likely each one is.
In three Bernoulli trials, let X be the number of successes. Then X can take
the values 0, 1, 2, or 3. The probability distribution of X comes directly from counting the
ways each number of successes can happen.
Once a probability distribution is known, two of the most important summaries are the mean and the variance. The mean, also called the expectation, gives the long-run average value of the random variable. The variance measures how far values typically spread around that mean.
The chapter finds the mean of a binomial distribution B(4, 1/3). The full table is written
out first, and then each value of x is multiplied by its probability and added to obtain the mean.
The chapter defines Bernoulli trials very carefully. A sequence of trials is Bernoulli when the number of trials is finite, each trial has exactly two outcomes, the trials are independent, and the probability of success remains the same each time.
If X counts the number of successes in n Bernoulli trials, then
X has a binomial distribution. The probability of exactly x successes is:
If a fair coin is tossed 10 times, the chapter asks for the probability of exactly six heads, at least six heads, and at most six heads. All three are direct applications of the binomial formula.
If 10% of eggs are defective and 10 eggs are drawn with replacement, the probability of at least one
defective egg is found by complement:
1 - P(X=0) = 1 - (9/10)10.
You are counting how many successes occur across repeated independent trials with the same success probability.
The probability changes between trials, such as most without-replacement settings.