STAT 230 - Probability
MWF 2:30PM - 3:20PM
DC 1351
Cecilia Cotton
15% quizzes (3/5) | 25% midterm | 15% midterm | 45% final
1 | Introduction to Probability
1.1 | Definitions of Probability
This course is a probability course, not a statistics course (STAT 231)
- Probability = A discipline of mathematics concerned with describing and modeling uncertain events
- Statistics = The study of the collection, analysis, and interpretation of data as it relates to underlying probability models
Definitions of Probability (each has flaws)
- Classical Probability:
- Flaw: not all events are equally likely
- Relative Frequency: Repeat the event a lot of times
- Flaw: we can never repeat an experiment indefinitely
- Subjective Probability: Vibes
- Flaw: how do you resolve disagreements?
1.2 | Mathematical Probability Models
Probability Model = Sample Space + Probability Distribution + Event
Definition: Sample Space A sample space is the set of all possible outcomes of a random experiment where only one outcome can occur in a single trial
Definition: Probability Distribution A probability distribution on assigns probabilities (between 0 and 1) to outcomes
Definition: Event An event is a subset of the sample space, and the probability of an event is the sum of the probabilities for each outcomes in .
Sample Space
- Is Discrete if it’s finite and non-Discrete if infinite
- Ex: Rolling a die. Let be the number of dots facing upward
- Ex: Runtime of a computer program.
- Ex: Flipping a coin (there can be multiple sample spaces)
- Are the outcomes the same?
- - not a sample space (overlapping outcomes)
- - not a sample space (not all possibilities)
Event
- Ex: Rolling a die:
- Simple event: Rolling a 1,
- Compound event: Rolling an even,
- Ex: Runtime of a program
- Event that the program takes >30 seconds
Definition: Probability Let denote the set of all events on a sample space . > A probability defined on is a real valued function that satisfies the following axioms
- Scale: Everything is normalized to a size of 1
- Something Happens:
- Infinite Additivity:
Facts about the probability function
-
- Proof: Define an infinite collection of events (all , all disjoint, union is ) and use the Additivity axion
- Finite Additivity:
1. Proof: Extend to an infinite collection of disjoint events by adding infinite $\emptyset$ and the union stays the same
3. 1. Proof: Use the axiom that
1.3 | Counting in Uniform Probability Models
Counting Rules
Definition: Equally Likely Model Let denote the number of outcomes in an event . Then, for an equally likely sample space:
Note: This can only happen in a discrete sample set
Counting Rules
- Addition Rule
- Multiplication Rule
Example: Suppose that 3 of the numbers from 1-9 are chosen without replacement to form a 3 digit number. What is the probability that:
- The number is bigger than 500?
- The number is even? (you could also count EEE, EOE, OEE, OOE)
- The number is larger than 700 and even? Consider 7xx : 8xx: 9xx:
Permutations and Combinations
Definition: Permutation Given distinct objects, a permutation of size is an ordered subset of of the individuals. The number of permutations of size taken from objects is
Definition: Combination Given distinct objects, a combination of size is an unordered subset of of the individuals. The number of permutations of size taken from objects is
Suppose you have 20 distinct books, 7 of which are written by Mark Twain.
- How many ways can you arrange 12 books on a shelf if the order they are on the shelf matters?
- How many ways can you arrange 12 books on a shelf if exactly 3 of them must be Mark Twain books?
- A monkey picks books at random from the 20 books and puts them on the shelf until it contains 12 books. What is the probability that at least 3 of the books on the shelf are written by Mark Twain?
Denote to be {number of Twain books}. Choose from 7 Twain and 13 non-Twain books:
Consider drawing 3 numbers with replacements from 10 digits. What is the probability that there’s a repeated number?
Melissa participates in a lottery in which she selects 7 numbers between 1 and 50, and then a computer randomly picks 7 numbers between 1 and 50. She wins if her selected numbers match 5 or more of the randomly selected numbers, in any order. What is the probability that Melissa wins?
S = {selections of computer numbers}
A = {Melissa wins} - consider 5, 6, 7 matches
Make sure you decide whether or not order matters for everything!
Suppose a room contains n people. What is the probability at least two people in the room share a birthday? Assumption: Suppose that each of the n people is equally likely to have any of the 365 days of the year as their birthday, so that all possible combinations of birthdays are equally likely.
Put a uniform model over :
= {at least 2 match}
Multinomial Coefficient Theorem
There are 4 passengers on an elevator that services 5 floors. Assume that each of the 4 passengers are equally likely to get off the elevator on any of the 5 floors. What is the probability that:
- The passengers all get off on different floors? (# ways to select 5 floors)
- 2 passengers get off on floor two, and two get off on floor three? (# ways to select 2 passengers)
- 2 passengers get off on one floor, and two passengers get off on another, different floor? (# ways to select 2 passengers) x (# ways to select 2 floors)
- Exactly 1 passenger gets off on floor 1 (# ways to select one passenger) x (# ways to select remaining floors)
- Exactly one passenger gets off on floor 1 and exactly one gets off on floor 2 (# ways to select 2 passengers) x (# ways to assign them) x (# ways to select remaining floors)
Definition: Multinomial Coefficient Theorem Consider objects with consist of indistinguishable types. Suppose that there are objects of type . Then, the number of distinguishable arrangements of the objects is
Consider rearranging the letters at random in the name “ZENYATTA” to form a single ‘word’.
- How many ways can this be done? (# total permutations) / (# replacements)
- What is the probability that all of the letters appear in alphabetic order? (# alphabetical permutations) / (# total permutations)
- What is the probability that the word begins and ends with “T”? (# first letter is T) x (# last letter is T)
3 members of the C-S department, 2 members of the math department, and 3 members of the stats department sit down at random in a row of 8 seats.
- What is the probability that each department’s members are sitting in consecutive seats? (# permutations of faculties) x (# permutations in each faculty)
- What is the probability that members of the same department are sitting on both ends of the row? P(CS on both ends) + P(Math both ends) + P(Stats both ends)
Or, excluding order:
5 floor and 5 people on an elevator, what is the probability that 2 people get off on one floor and 3 get off on the other?
= (# of ways to choose 2 floors) (# of ways to choose 3 people) (2 ways to permute people)
2 | Probability Rules and Conditional Probability
2.1 | Use of Sets
Venn Diagrams
De Morgan’s Laws (they generalize to summations)
2.2 | Addition Rules for Unions of Events
Inclusion-Exclusion Principle
Inclusion-Exclusion Rules: For arbitrary events and :
Proof:
Inclusion-Exclusion Principle (Generalized)
Ex: At least one die shows 6
|S| = 36 = P(first die shows 6) = P(second die shows 6)
Ex: 80% Tim Hortons, 63% Canadian Tire, 51% both
- Canadian Tire and Tim Hortons T = {TH}, C = {CT}
- Canadian Tire but not Tim Hortons We want
Ex: Rearrange the letters in “FOOD”
- , A = the event that the two O’s appear together S = {Distinguishable Arrangements assuming Os are indistinguishable}
- , B = the event that the word starts with F
2.3 | Dependent and Independent Events
Independence
Definition: Independence Two events and are said to be independent of
A sequences of events are said to be independent if
for all possible subsets of size k, . Events that are not independent are called dependent.
Ex: Consider rolling two fair die, and let A = {sum is 10}, B = {first die is 6}, C = {sum is 7}
- A and B are dependent
- B and C are independent
- A and C are dependent
Proposition: If and are independent and mutually exclusive (disjoint), then either or
Proof: Suppose are independent and disjoint, then
Remark: Pairwise independence of events doesn’t mean that the events are independent
Experiment: Roll two fair die A = {first die = 1}, B = {second die = 1}, C = {sum = 7} Each pair is independent, but
Proposition: If are independent, then ; ; are all pairwise independent.
Proof: Suppose are independent.
2.4 | Conditional Probability and Product Rules
Conditional Probability
Definition: Conditional Probability
A = {sum is 10}, B = {first die is 6}, C = {sum is 7}
Definition: Independence and Conditionals Two events are independent if
Properties of conditional probability:
- If are disjoint,
Ex: Consider rearranging the letters in the word RACECAR
- What is the probability that the random word ends in “R” given that it starts with “ACE”? A = {word starts ACE} R = {word ends R} S = {distinguishable arrangements}
- Is the event that the word starts with “ACE” independent of the event that it ends with an “R”?
No, they’re dependent
Product Rules
Product Rule:
Ex: Suppose a bag contains 12 red balls and 7 blue balls. Suppose that a ball is drawn at random, then, without replacement, a second ball is drawn at random.
- What is the probability that both balls are red? R1 = {first marble is red} R2 = {second marble is red}
Alternate solution: S = {all selections of first 2 marbles}
- What is the probability that the second ball is red?
Law of Total Probability
Definition: Partition A sequence of sets are said to partition the sample space if they partition the sample space I’m not writing the formal definition
Theorem: Law of total probability: Suppose that partition . Then for any event ,
Usually we use because that partitions any :
Example: A rare disease occurs in 0.3% of a population. A test has been developed for the disease. If the test is performed on an individual who has the disease, the probability of a positive test result (i.e. the test says that they have the disease) is 0.96. If the test is performed on an individual who does not have the disease, the probability of a positive test result is 0.05. Suppose the test is performed on a random individual in the population.
- What is the probability that the test result will be positive? D = {Disease} T+ = {Test is positive}
- Suppose the test comes back positive. What is the probability that the individual actually has the disease?
- If the test comes back negative, what is the probability that the individual actually has the disease?
Bayes Theorem
Bae’s Theorem
Gold Coin question
Two factories, Factory I and Factory II, produce phones for a phone brand “JimSang”. Factory I produces 60% of all JimSang phones, and Factory II produces the other 40 %. Of phones produced in Factory I, 10% are defective, and 20% of phones produced by Factory II are defective.
- What is the probability that a randomly selected JimSang phone is defective?
D = {Defective phone} = {Phone made in Factory i}
- A customer has purchased a phone, and it is not defective. What is the probability that it was produced in Factory II?
3 | Univariate Discrete Probability Distributions
3.1 | Discrete Random Variables
Geometric Series Formula
Partial Geometric Series
Binomial Theorem
Taylor Series for
Example
Definition: Random Variable A random variable is a function that maps the sample space into the set of real numbers . In other words:
The values that a random variable takes is called the range, We say that a random variable is discrete if its range is finite, and continuous otherwise.
Definition: Probability Mass Function The probability function of a discrete random variable is the function
Ex: Suppose that is the sum of the outcomes of a two dice roll. Calculate the probability function of X.
Properties of the probability function
- It’s between 0 and 1
- The sum of all probabilities is 1
3.2 | Functions of Random Variables
Cumulative Distribution Function (CDF)
When is discrete:
Properties of the CDF
- for
- and
Ex: Compute and graph for a fair die roll
CDF is CADLAG (continuous from the right, limited from the left)
Ex:
- Compute the probability function on
- Compute the CDF
- Compute P(at least 1 correct)
3.3 | Expectation of a Random Variable
Expected Values
Definition: Expectation Suppose is a discrete random variable with probability function . Then is called the expected value of , and is defined by
The expected value of is sometimes referred to as the mean / full moment of
Ex: Compute of a dice roll
Ex: Lotto 649
R = Return on R(S)={-1, 499999}$
Properties of Expectation
Property: Bounding If a random variable is bounded by , then
If and is the result of a fair six sided die roll, then compute
Property: Linearity of Expectation If is a linear function , then for a random variable
MISTAKE: It is not true in general that
Variance
- Deviation
- Absolute deviation
- Squared deviation:
Definition: Variance The variance of a random variable is denoted , and is defined by
Shortcut to compute variance:
Definition: Moments The moment of a random variable is defined by
Central Second Moment
Suppose is a random variable such that . Then
Definition: Standard Deviation
Theorem: Lyapounov’s Inequality If ,
Ex: Compute for a die roll
Theorem: Variance of a linear combination:
Variance Properties
- For all random variables :
- Larger values of indicate that the probability function / distribution of is more “spread out” around the mean
- if and only if .
3.4 | Moment Generating Functions
Casino Day
Roulette:
R = return on a Red bet of R(s)={1,-1}$
D = return on first dozen bet,
N = return on “00” bet:
All bets are EV , but variance is different
Craps: P(A before B) =
P(winning in craps): F = first roll
Odds bet
If point is 5 or 9: 3 to 2 (1.5 to 1) If point is 6 or 8: 6 to 5 (1.2 to 1)
Transformations of random variables
What is the probability that a 2 is rolled before an odd number? N = { neither 2 nor odd }
Remark: If is a discrete random variable with PMF and for some , then is a discrete random variable. Sometimes we wish to compute based on .
- Determine , the range of
- For each , compute by determining for which , then computing
Let denote the number of kings in a random 3 card hand.
- Calculate the PMF of :
-
Let . Compute the PMF of
-
Let . Compute the PMF of
Expectation and variance give a simple summary of distribution Skewness
Kurtosis
There exist distributions without expectation: Suppose X is a random variable with probability mass function
Then and is not defined
Moment generating functions
Definition: Moment Generating Function (MGF) The MGF of a random variable is given by
If is discrete with PMF , then
An MGF is well defined if for
Properties of the MGF
- Fubin’s Theorem
- So long as is defined in a neighborhood of the origin
Suppose is the outcome of a fair die roll. Use the MGF to compute its mean and variance.
Suppose is the coin flips required to flip a heads. Compute the MGF of :
well defined if well defined on , thus it’s a good MGF because it contains the origin
Alternatively, you could make it look like a derivative:
Theorem: Uniqueness Theorem for MGFs Two functions with the same well-defined MGFs have the same probability distributions (because the MGF is just a transform of the probability distribution)
Suppose has the following MGF. Compute the distribution.
Theorem: MGFs are closed under linear combinations
Ex:
3.5 | Special Discrete Probability Distributions
Discrete Uniform Distribution
Definition: Equality
Definition: Discrete Uniform Distribution Range is
Supposed . Compute
Mean and Variance of the Uniform Distribution
Calculating the mean:
MGF
Use the partial sum of geometric series:
Hypergeometric Distribution
Definition: Hypergeometric Distribution Yugioh reference :) Imagine drawing cards without replacement from an card deck with successes. The number of successes you draw is:
The name comes from the hypergeometric series / hypergeometric identity
Calculate the PMF of :
Probabilities of drawing a 5 card hand:
Binomial Distribution
Definition: Binomial Distribution A Bernoulli trial is a single success/failure experiment where success has probability A Binomial Distribution is the distribution of successes in Bernoulli trials
Compute the PMF of a binomial distribution
Mean and Variance
Negative Binomial Distribution
Definition: Negative Binomial Preform Bernoulli trials with a probability of success until exactly successes are observed. Then if denotes the number of failures before successes
If is the number of trials required,
Compute the probability function of a NB random variable:
(number of sequences with failures and successes)
Geometric Distribution
Definition: Geometric Distribution Perform Bernoulli trials with probability of success until one success is observed. is written as
MGF of Geometric Distribution:
This is well-defined at the origin, and thus the uniqueness theorem applies
Mean and Variance
What is the probability that a coin takes more than 5 flips to show heads? = # of tails before observing 1 heads
Trivial Pursuit: What’s the probability the number of rolls exceeds the mean?
Question: Suppose for a and :
- For all
Remark: Memory-less property of Geometric Distribution If , then
Poisson
Definition: Poisson Distribution (paramater is the mean) Binomial distribution as and
Remark: If
Example: Poisson approximation of Binomial Suppose . Consider a sequence of Binomial experiments each consisting of trials in which . Then for any fixed
Let . If is large and is close to zero,
Ex: A bit error occurs for a given data transmission method independently in one out of every 1000 bits transferred. Suppose a 64 bit message is sent using the transmission system. What is the probability that there are exactly 2 bit errors? Approximate this using a Poisson approximation.
X = # of errors
If
Shiny versions of Pokemon are possible to encounter and catch starting in Generation 2 (Pokemon Gold/Silver). Normal encounters with Pokemon while running in grass occur according to a Poisson process with rate 1 per minute on average. 1 in every 8192 encounters will be a Shiny Pokemon, on average.
- If you run around in grass for 15 hours, what is the probability you will encounter at least one Shiny pokemon? = # of shiny pokemon encountered in 15 hours
- How long would you have to run around in grass so that you have a better than 50 percent chance of encountering at least one Shiny pokemon? = # of shiny pokemon after t hours
4 | Multivariate Discrete Probability Distributions
4.1 | Basic Terminology and Techniques
Multiple (discrete) random variables
Joint PMF
If
Marginal = add up on the margins to get the PMFs of
Def: Independence of Random Variables Discrete random variables with joint PMF and marginal PMFs are said to be independent variables if
Let , then this is supported on
Properties
Ex: has the following joint PMF.
- Compute the marginal PMFs
- Compute
If are independent, their marginal PMFs satisfy
Conditional Probability
4.2 | Multinomial Distribution
Definition: Multinomial Distribution
- Individual trials have possible outcomes with the i-th being denotes
- Trials are repeated times, so the i-th one occurs times Then, have a Multinomial Distribution with parameters ,
Multinomial Distribution
Roulette
General PMF
Properties
- If , then
- so that are dependent
- Suppose . Then
(Intuition: the “margins” of a multinomial are a binomial)
Ex: Roulette
Ex: 2 hearts, 2 spades, 1 diamond with replacement (w/o would be multi-hypergeometric)
Ex: Given 3 hearts in 5 cards, what is the possibility of no spades?
Ex: What if we add two distributions?
4.3 | Expectations, Covariance, Correlation
Distributions of multivariate transformations: Suppose that
Then, for jointly distributed random variables is a random variables. If have joint p.f. , then the probability function of is given by
Show that
For
Properties
- Convolution formula
- Binomial is closed under addition
- Geometric + Geometric = Negative Binomial
Sums of independent:
- Poissons are Poisson
- Geometrics with same are NB
- Binomials with same are Binomial
Definition: Multivariate Expectation
Properties of EV of Jointly Distributed Random Variables
- Closed under linear combinations
- Decomposing addition
Covariance
If independent
Correlation
Bilinear form property of
If independent, then
4.4 | Linear Combinations of Random Variables
Expectation of random variable combinations
Variance of random variable combinations
Indicator (Bernoulli) Random Variables: Let be an event. We say that is the indicator random variable of event
Properties
4.5 | Markov’s Inequality, Chebyshev’s Inequality, Law of Large Numbers
Weak Law of Large Numbers: Suppose are independent RVs with mean and variance , and
is the sample mean. Then for all ,
Markov’s Inequality If is an RV with , then for any ,
Chebyshev’s Inequality If is a RV with , , then for any
For 10 coin tosses, what’s the probability that the sample mean deviates by more than from
4.6 | Conditional Probability Distributions
Conditional distribution:
Conditional mean:
5 | Univariate Continuous Probability Distributions
5.1 | Continuous Random Variables
How to model continuous RVs?
- Define the analog of the probability function (hard)
- Model relative frequency from the histogram (calc)
5.2 | Functions of Random Variables
Definition: Continuous RV has probability density function (pdf) if
Support of a pdf is
Definition: CDF of a random variable X is
If is continuous with pdf , then
Moreover, by FTC part I:
FTC part II:
Properties of the CDF of a continuous RV:
- is continuous
- is differentiable except at a countable number of exceptional points
5.3 | Expectation of a Random Variable
If is a continuous RV with pdf and , then
It follows that
Cool things
- Expectation still linear
- Variance shortcut still works
Definition: Moment Generating Function
5.4 | Special Continuous Probability Distributions
Theorem: Let be a continuous random variable with pdf . Suppose that is a strictly monotonic differentiable function on the range of . Then, the pdf of in the region where is given by the following, where denotes the inverse of
Let be a continuous RV with pdf . Then, has pdf
Definition: We say that has a (continuous) uniform distribution on , or , if has pdf
Properties
Definition: We say that has an exponential distribution if the density of is
Definition: Quantile / Percentile The 100 x q-th percentile (or quantile) of the distribution of is the value such that
The quantile is the “inverse” if the CDF, and is called the quantile function. The median of a distribution is its 50-th percentile.
Definition: Normal Distribution with mean and variance if the density is
Properties
- Symmetric about the mean:
- Unimodal with a peak at
- (Standardization)
- mean and variance
- MGF looks like the density (that’s why it’s special)
Definition: Standard normal random variable if
and the CDF is
PROBLEM: Normal distribution has no elementary antiderivative ()! Evaluate them numerically with the table
Definition: The 68-95-99.7 Rule: If , then the probability that is within standard deviations of the mean is:
Theorem: Joint Density Function If
Sample Mean Suppose that are independent and that
5.5 | Central Limit Theorem
Theorem (Central Limit Theorem) Suppose are independent RVs, each with a common CDF . Suppose further that . Then
as . In other words, if is large,
Binomial to Normal If , then for large
Poisson to Normal If , then for large
Suppose that you are interested in buying some red potatoes from a local grocery store. The potatoes are priced at 1.60 per kilogram. The average weight of a red potato at the store is 200 grams with a standard deviation of 25 grams. The distribution of the weights of potatoes is unknown.
- Suppose you take a random sample of 49 potatoes. What is the approximate probability that the average weight of the 49 potatoes sampled is between 195 and 205 grams (inclusive)?
Use CLT to approximate sample mean with normal distribution
- Now, suppose you take a random sample of 36 potatoes. What is the approximate probability that the random sample of 36 potatoes will cost more than 12.
- Now, suppose that you only have 10 in cash, and you want to buy as many potatoes as you can with this 10. Determine the largest number of potatoes that you can select (randomly, of course), in order to be at least 85% certain that their overall cost will be at most 10.