STAT 230 - Probability
MWF 2:30PM - 3:20PM
DC 1351
Cecilia Cotton

15% quizzes (3/5) | 25% midterm | 15% midterm | 45% final

Anthony Zhang

1 | Introduction to Probability

1.1 | Definitions of Probability

This course is a probability course, not a statistics course (STAT 231)

  • Probability = A discipline of mathematics concerned with describing and modeling uncertain events
  • Statistics = The study of the collection, analysis, and interpretation of data as it relates to underlying probability models

Definitions of Probability (each has flaws)

  1. Classical Probability:
    1. Flaw: not all events are equally likely
  1. Relative Frequency: Repeat the event a lot of times
    1. Flaw: we can never repeat an experiment indefinitely
  2. Subjective Probability: Vibes
    1. Flaw: how do you resolve disagreements?

1.2 | Mathematical Probability Models

Probability Model = Sample Space + Probability Distribution + Event

Definition: Sample Space A sample space is the set of all possible outcomes of a random experiment where only one outcome can occur in a single trial

Definition: Probability Distribution A probability distribution on assigns probabilities (between 0 and 1) to outcomes

Definition: Event An event is a subset of the sample space, and the probability of an event is the sum of the probabilities for each outcomes in .

Sample Space

  • Is Discrete if it’s finite and non-Discrete if infinite
  • Ex: Rolling a die. Let be the number of dots facing upward
  • Ex: Runtime of a computer program.
  • Ex: Flipping a coin (there can be multiple sample spaces)
    • Are the outcomes the same?
    • - not a sample space (overlapping outcomes)
    • - not a sample space (not all possibilities)

Event

  • Ex: Rolling a die:
    • Simple event: Rolling a 1,
    • Compound event: Rolling an even,
  • Ex: Runtime of a program
    • Event that the program takes >30 seconds

Definition: Probability Let denote the set of all events on a sample space . > A probability defined on is a real valued function that satisfies the following axioms

  1. Scale: Everything is normalized to a size of 1
  2. Something Happens:
  3. Infinite Additivity:

Facts about the probability function

    1. Proof: Define an infinite collection of events (all , all disjoint, union is ) and use the Additivity axion
  1. Finite Additivity:
1. Proof: Extend to an infinite collection of disjoint events by adding infinite $\emptyset$ and the union stays the same

3. 1. Proof: Use the axiom that

1.3 | Counting in Uniform Probability Models

Counting Rules

Definition: Equally Likely Model Let denote the number of outcomes in an event . Then, for an equally likely sample space:

Note: This can only happen in a discrete sample set

Counting Rules

  1. Addition Rule
  1. Multiplication Rule

Example: Suppose that 3 of the numbers from 1-9 are chosen without replacement to form a 3 digit number. What is the probability that:

  1. The number is bigger than 500?
  1. The number is even? (you could also count EEE, EOE, OEE, OOE)
  1. The number is larger than 700 and even? Consider 7xx : 8xx: 9xx:

Permutations and Combinations

Definition: Permutation Given distinct objects, a permutation of size is an ordered subset of of the individuals. The number of permutations of size taken from objects is

Definition: Combination Given distinct objects, a combination of size is an unordered subset of of the individuals. The number of permutations of size taken from objects is

Suppose you have 20 distinct books, 7 of which are written by Mark Twain.

  1. How many ways can you arrange 12 books on a shelf if the order they are on the shelf matters?
  1. How many ways can you arrange 12 books on a shelf if exactly 3 of them must be Mark Twain books?
  1. A monkey picks books at random from the 20 books and puts them on the shelf until it contains 12 books. What is the probability that at least 3 of the books on the shelf are written by Mark Twain?

Denote to be {number of Twain books}. Choose from 7 Twain and 13 non-Twain books:

Consider drawing 3 numbers with replacements from 10 digits. What is the probability that there’s a repeated number?

Melissa participates in a lottery in which she selects 7 numbers between 1 and 50, and then a computer randomly picks 7 numbers between 1 and 50. She wins if her selected numbers match 5 or more of the randomly selected numbers, in any order. What is the probability that Melissa wins?

S = {selections of computer numbers}

A = {Melissa wins} - consider 5, 6, 7 matches

Make sure you decide whether or not order matters for everything!

Suppose a room contains n people. What is the probability at least two people in the room share a birthday? Assumption: Suppose that each of the n people is equally likely to have any of the 365 days of the year as their birthday, so that all possible combinations of birthdays are equally likely.

Put a uniform model over :

= {at least 2 match}

Multinomial Coefficient Theorem

There are 4 passengers on an elevator that services 5 floors. Assume that each of the 4 passengers are equally likely to get off the elevator on any of the 5 floors. What is the probability that:

  1. The passengers all get off on different floors? (# ways to select 5 floors)
  1. 2 passengers get off on floor two, and two get off on floor three? (# ways to select 2 passengers)
  1. 2 passengers get off on one floor, and two passengers get off on another, different floor? (# ways to select 2 passengers) x (# ways to select 2 floors)
  1. Exactly 1 passenger gets off on floor 1 (# ways to select one passenger) x (# ways to select remaining floors)
  1. Exactly one passenger gets off on floor 1 and exactly one gets off on floor 2 (# ways to select 2 passengers) x (# ways to assign them) x (# ways to select remaining floors)

Definition: Multinomial Coefficient Theorem Consider objects with consist of indistinguishable types. Suppose that there are objects of type . Then, the number of distinguishable arrangements of the objects is

Consider rearranging the letters at random in the name “ZENYATTA” to form a single ‘word’.

  1. How many ways can this be done? (# total permutations) / (# replacements)
  1. What is the probability that all of the letters appear in alphabetic order? (# alphabetical permutations) / (# total permutations)
  1. What is the probability that the word begins and ends with “T”? (# first letter is T) x (# last letter is T)

3 members of the C-S department, 2 members of the math department, and 3 members of the stats department sit down at random in a row of 8 seats.

  1. What is the probability that each department’s members are sitting in consecutive seats? (# permutations of faculties) x (# permutations in each faculty)
  1. What is the probability that members of the same department are sitting on both ends of the row? P(CS on both ends) + P(Math both ends) + P(Stats both ends)

Or, excluding order:

5 floor and 5 people on an elevator, what is the probability that 2 people get off on one floor and 3 get off on the other?

= (# of ways to choose 2 floors) (# of ways to choose 3 people) (2 ways to permute people)

2 | Probability Rules and Conditional Probability

2.1 | Use of Sets

Venn Diagrams

De Morgan’s Laws (they generalize to summations)

2.2 | Addition Rules for Unions of Events

Inclusion-Exclusion Principle

Inclusion-Exclusion Rules: For arbitrary events and :

Proof:

Inclusion-Exclusion Principle (Generalized)

Ex: At least one die shows 6

|S| = 36 = P(first die shows 6) = P(second die shows 6)

Ex: 80% Tim Hortons, 63% Canadian Tire, 51% both

  1. Canadian Tire and Tim Hortons T = {TH}, C = {CT}
  1. Canadian Tire but not Tim Hortons We want

Ex: Rearrange the letters in “FOOD”

  1. , A = the event that the two O’s appear together S = {Distinguishable Arrangements assuming Os are indistinguishable}
  1. , B = the event that the word starts with F

2.3 | Dependent and Independent Events

Independence

Definition: Independence Two events and are said to be independent of

A sequences of events are said to be independent if

for all possible subsets of size k, . Events that are not independent are called dependent.

Ex: Consider rolling two fair die, and let A = {sum is 10}, B = {first die is 6}, C = {sum is 7}

  1. A and B are dependent
  1. B and C are independent
  1. A and C are dependent

Proposition: If and are independent and mutually exclusive (disjoint), then either or

Proof: Suppose are independent and disjoint, then

Remark: Pairwise independence of events doesn’t mean that the events are independent

Experiment: Roll two fair die A = {first die = 1}, B = {second die = 1}, C = {sum = 7} Each pair is independent, but

Proposition: If are independent, then ; ; are all pairwise independent.

Proof: Suppose are independent.

2.4 | Conditional Probability and Product Rules

Conditional Probability

Definition: Conditional Probability

A = {sum is 10}, B = {first die is 6}, C = {sum is 7}

Definition: Independence and Conditionals Two events are independent if

Properties of conditional probability:

  1. If are disjoint,

Ex: Consider rearranging the letters in the word RACECAR

  1. What is the probability that the random word ends in “R” given that it starts with “ACE”? A = {word starts ACE} R = {word ends R} S = {distinguishable arrangements}
  1. Is the event that the word starts with “ACE” independent of the event that it ends with an “R”?

No, they’re dependent

Product Rules

Product Rule:

Ex: Suppose a bag contains 12 red balls and 7 blue balls. Suppose that a ball is drawn at random, then, without replacement, a second ball is drawn at random.

  1. What is the probability that both balls are red? R1 = {first marble is red} R2 = {second marble is red}

Alternate solution: S = {all selections of first 2 marbles}

  1. What is the probability that the second ball is red?

Law of Total Probability

Definition: Partition A sequence of sets are said to partition the sample space if they partition the sample space I’m not writing the formal definition

Theorem: Law of total probability: Suppose that partition . Then for any event ,

Usually we use because that partitions any :

Example: A rare disease occurs in 0.3% of a population. A test has been developed for the disease. If the test is performed on an individual who has the disease, the probability of a positive test result (i.e. the test says that they have the disease) is 0.96. If the test is performed on an individual who does not have the disease, the probability of a positive test result is 0.05. Suppose the test is performed on a random individual in the population.

  1. What is the probability that the test result will be positive? D = {Disease} T+ = {Test is positive}
  1. Suppose the test comes back positive. What is the probability that the individual actually has the disease?
  1. If the test comes back negative, what is the probability that the individual actually has the disease?

Bayes Theorem

Bae’s Theorem

Gold Coin question

Two factories, Factory I and Factory II, produce phones for a phone brand “JimSang”. Factory I produces 60% of all JimSang phones, and Factory II produces the other 40 %. Of phones produced in Factory I, 10% are defective, and 20% of phones produced by Factory II are defective.

  1. What is the probability that a randomly selected JimSang phone is defective?

D = {Defective phone} = {Phone made in Factory i}

  1. A customer has purchased a phone, and it is not defective. What is the probability that it was produced in Factory II?

3 | Univariate Discrete Probability Distributions

3.1 | Discrete Random Variables

Geometric Series Formula

Partial Geometric Series

Binomial Theorem

Taylor Series for

Example

Definition: Random Variable A random variable is a function that maps the sample space into the set of real numbers . In other words:

The values that a random variable takes is called the range, We say that a random variable is discrete if its range is finite, and continuous otherwise.

Definition: Probability Mass Function The probability function of a discrete random variable is the function

Ex: Suppose that is the sum of the outcomes of a two dice roll. Calculate the probability function of X.

Properties of the probability function

  1. It’s between 0 and 1
  1. The sum of all probabilities is 1

3.2 | Functions of Random Variables

Cumulative Distribution Function (CDF)

When is discrete:

Properties of the CDF

  1. for
  2. and

Ex: Compute and graph for a fair die roll

CDF is CADLAG (continuous from the right, limited from the left)

Ex:

  1. Compute the probability function on
  1. Compute the CDF
  2. Compute P(at least 1 correct)

3.3 | Expectation of a Random Variable

Expected Values

Definition: Expectation Suppose is a discrete random variable with probability function . Then is called the expected value of , and is defined by

The expected value of is sometimes referred to as the mean / full moment of

Ex: Compute of a dice roll

Ex: Lotto 649

R = Return on R(S)={-1, 499999}$

Properties of Expectation

Property: Bounding If a random variable is bounded by , then

If and is the result of a fair six sided die roll, then compute

Property: Linearity of Expectation If is a linear function , then for a random variable

MISTAKE: It is not true in general that

Variance

  1. Deviation
  1. Absolute deviation
  1. Squared deviation:

Definition: Variance The variance of a random variable is denoted , and is defined by

Shortcut to compute variance:

Definition: Moments The moment of a random variable is defined by

Central Second Moment

Suppose is a random variable such that . Then

Definition: Standard Deviation

Theorem: Lyapounov’s Inequality If ,

Ex: Compute for a die roll

Theorem: Variance of a linear combination:

Variance Properties

  1. For all random variables :
  1. Larger values of indicate that the probability function / distribution of is more “spread out” around the mean
  2. if and only if .

3.4 | Moment Generating Functions

Casino Day

Roulette:

R = return on a Red bet of R(s)={1,-1}$

D = return on first dozen bet,

N = return on “00” bet:

All bets are EV , but variance is different

Craps: P(A before B) =

P(winning in craps): F = first roll

Odds bet

If point is 5 or 9: 3 to 2 (1.5 to 1) If point is 6 or 8: 6 to 5 (1.2 to 1)

Transformations of random variables

What is the probability that a 2 is rolled before an odd number? N = { neither 2 nor odd }

Remark: If is a discrete random variable with PMF and for some , then is a discrete random variable. Sometimes we wish to compute based on .

  1. Determine , the range of
  2. For each , compute by determining for which , then computing

Let denote the number of kings in a random 3 card hand.

  1. Calculate the PMF of :
  1. Let . Compute the PMF of

  2. Let . Compute the PMF of

Expectation and variance give a simple summary of distribution Skewness

Kurtosis

There exist distributions without expectation: Suppose X is a random variable with probability mass function

Then and is not defined

Moment generating functions

Definition: Moment Generating Function (MGF) The MGF of a random variable is given by

If is discrete with PMF , then

An MGF is well defined if for

Properties of the MGF

  1. Fubin’s Theorem
  1. So long as is defined in a neighborhood of the origin

Suppose is the outcome of a fair die roll. Use the MGF to compute its mean and variance.

Suppose is the coin flips required to flip a heads. Compute the MGF of :

well defined if well defined on , thus it’s a good MGF because it contains the origin

Alternatively, you could make it look like a derivative:

Theorem: Uniqueness Theorem for MGFs Two functions with the same well-defined MGFs have the same probability distributions (because the MGF is just a transform of the probability distribution)

Suppose has the following MGF. Compute the distribution.

Theorem: MGFs are closed under linear combinations

Ex:

3.5 | Special Discrete Probability Distributions

Discrete Uniform Distribution

Definition: Equality

Definition: Discrete Uniform Distribution Range is

Supposed . Compute

Mean and Variance of the Uniform Distribution

Calculating the mean:

MGF

Use the partial sum of geometric series:

Hypergeometric Distribution

Definition: Hypergeometric Distribution Yugioh reference :) Imagine drawing cards without replacement from an card deck with successes. The number of successes you draw is:

The name comes from the hypergeometric series / hypergeometric identity

Calculate the PMF of :

Probabilities of drawing a 5 card hand:

Binomial Distribution

Definition: Binomial Distribution A Bernoulli trial is a single success/failure experiment where success has probability A Binomial Distribution is the distribution of successes in Bernoulli trials

Compute the PMF of a binomial distribution

Mean and Variance

Negative Binomial Distribution

Definition: Negative Binomial Preform Bernoulli trials with a probability of success until exactly successes are observed. Then if denotes the number of failures before successes

If is the number of trials required,

Compute the probability function of a NB random variable:

(number of sequences with failures and successes)

Geometric Distribution

Definition: Geometric Distribution Perform Bernoulli trials with probability of success until one success is observed. is written as

MGF of Geometric Distribution:

This is well-defined at the origin, and thus the uniqueness theorem applies

Mean and Variance

What is the probability that a coin takes more than 5 flips to show heads? = # of tails before observing 1 heads

Trivial Pursuit: What’s the probability the number of rolls exceeds the mean?

Question: Suppose for a and :

  1. For all

Remark: Memory-less property of Geometric Distribution If , then

Poisson

Definition: Poisson Distribution (paramater is the mean) Binomial distribution as and

Remark: If

Example: Poisson approximation of Binomial Suppose . Consider a sequence of Binomial experiments each consisting of trials in which . Then for any fixed

Let . If is large and is close to zero,

Ex: A bit error occurs for a given data transmission method independently in one out of every 1000 bits transferred. Suppose a 64 bit message is sent using the transmission system. What is the probability that there are exactly 2 bit errors? Approximate this using a Poisson approximation.

X = # of errors

If

Shiny versions of Pokemon are possible to encounter and catch starting in Generation 2 (Pokemon Gold/Silver). Normal encounters with Pokemon while running in grass occur according to a Poisson process with rate 1 per minute on average. 1 in every 8192 encounters will be a Shiny Pokemon, on average.

  1. If you run around in grass for 15 hours, what is the probability you will encounter at least one Shiny pokemon? = # of shiny pokemon encountered in 15 hours
  1. How long would you have to run around in grass so that you have a better than 50 percent chance of encountering at least one Shiny pokemon? = # of shiny pokemon after t hours

4 | Multivariate Discrete Probability Distributions

4.1 | Basic Terminology and Techniques

Multiple (discrete) random variables

Joint PMF

If

Marginal = add up on the margins to get the PMFs of

Def: Independence of Random Variables Discrete random variables with joint PMF and marginal PMFs are said to be independent variables if

Let , then this is supported on

Properties

Ex: has the following joint PMF.

  1. Compute the marginal PMFs
  1. Compute

If are independent, their marginal PMFs satisfy

Conditional Probability

4.2 | Multinomial Distribution

Definition: Multinomial Distribution

  1. Individual trials have possible outcomes with the i-th being denotes
  2. Trials are repeated times, so the i-th one occurs times Then, have a Multinomial Distribution with parameters ,

Multinomial Distribution

Roulette

General PMF

Properties

  1. If , then
  2. so that are dependent
  3. Suppose . Then

(Intuition: the “margins” of a multinomial are a binomial)

Ex: Roulette

Ex: 2 hearts, 2 spades, 1 diamond with replacement (w/o would be multi-hypergeometric)

Ex: Given 3 hearts in 5 cards, what is the possibility of no spades?

Ex: What if we add two distributions?

4.3 | Expectations, Covariance, Correlation

Distributions of multivariate transformations: Suppose that

Then, for jointly distributed random variables is a random variables. If have joint p.f. , then the probability function of is given by

Show that

For

Properties

  1. Convolution formula
  1. Binomial is closed under addition
  1. Geometric + Geometric = Negative Binomial

Sums of independent:

  • Poissons are Poisson
  • Geometrics with same are NB
  • Binomials with same are Binomial

Definition: Multivariate Expectation

Properties of EV of Jointly Distributed Random Variables

  1. Closed under linear combinations
  1. Decomposing addition

Covariance

If independent

Correlation

Bilinear form property of

If independent, then

4.4 | Linear Combinations of Random Variables

Expectation of random variable combinations

Variance of random variable combinations

Indicator (Bernoulli) Random Variables: Let be an event. We say that is the indicator random variable of event

Properties

4.5 | Markov’s Inequality, Chebyshev’s Inequality, Law of Large Numbers

Weak Law of Large Numbers: Suppose are independent RVs with mean and variance , and

is the sample mean. Then for all ,

Markov’s Inequality If is an RV with , then for any ,

Chebyshev’s Inequality If is a RV with , , then for any

For 10 coin tosses, what’s the probability that the sample mean deviates by more than from

4.6 | Conditional Probability Distributions

Conditional distribution:

Conditional mean:

5 | Univariate Continuous Probability Distributions

5.1 | Continuous Random Variables

How to model continuous RVs?

  1. Define the analog of the probability function (hard)
  2. Model relative frequency from the histogram (calc)

5.2 | Functions of Random Variables

Definition: Continuous RV has probability density function (pdf) if

Support of a pdf is

Definition: CDF of a random variable X is

If is continuous with pdf , then

Moreover, by FTC part I:

FTC part II:

Properties of the CDF of a continuous RV:

  1. is continuous
  2. is differentiable except at a countable number of exceptional points

5.3 | Expectation of a Random Variable

If is a continuous RV with pdf and , then

It follows that

Cool things

  1. Expectation still linear
  1. Variance shortcut still works

Definition: Moment Generating Function

5.4 | Special Continuous Probability Distributions

Theorem: Let be a continuous random variable with pdf . Suppose that is a strictly monotonic differentiable function on the range of . Then, the pdf of in the region where is given by the following, where denotes the inverse of

Let be a continuous RV with pdf . Then, has pdf

Definition: We say that has a (continuous) uniform distribution on , or , if has pdf

Properties

Definition: We say that has an exponential distribution if the density of is

Definition: Quantile / Percentile The 100 x q-th percentile (or quantile) of the distribution of is the value such that

The quantile is the “inverse” if the CDF, and is called the quantile function. The median of a distribution is its 50-th percentile.

Definition: Normal Distribution with mean and variance if the density is

Properties

  1. Symmetric about the mean:
  1. Unimodal with a peak at
  2. (Standardization)
  1. mean and variance
  2. MGF looks like the density (that’s why it’s special)

Definition: Standard normal random variable if

and the CDF is

PROBLEM: Normal distribution has no elementary antiderivative ()! Evaluate them numerically with the table

Definition: The 68-95-99.7 Rule: If , then the probability that is within standard deviations of the mean is:

Theorem: Joint Density Function If

Sample Mean Suppose that are independent and that

5.5 | Central Limit Theorem

Theorem (Central Limit Theorem) Suppose are independent RVs, each with a common CDF . Suppose further that . Then

as . In other words, if is large,

Binomial to Normal If , then for large

Poisson to Normal If , then for large

Suppose that you are interested in buying some red potatoes from a local grocery store. The potatoes are priced at 1.60 per kilogram. The average weight of a red potato at the store is 200 grams with a standard deviation of 25 grams. The distribution of the weights of potatoes is unknown.

  1. Suppose you take a random sample of 49 potatoes. What is the approximate probability that the average weight of the 49 potatoes sampled is between 195 and 205 grams (inclusive)?

Use CLT to approximate sample mean with normal distribution

  1. Now, suppose you take a random sample of 36 potatoes. What is the approximate probability that the random sample of 36 potatoes will cost more than 12.
  1. Now, suppose that you only have 10 in cash, and you want to buy as many potatoes as you can with this 10. Determine the largest number of potatoes that you can select (randomly, of course), in order to be at least 85% certain that their overall cost will be at most 10.