基本概率分布Basic Concept of Probability Distributions 1: Binomial Distribution

PMF

If the random variable $X$ follows the binomial distribution with parameters $n$ and $p$, we write $X sim B(n, p)$. The probability of getting exactly $x$ successes in $n$ trials is given by the probability mass function: $$f(x; n, p) = Pr(X=x) = {nchoose x}p^{x}(1-p)^{n-x}$$ for $x=0, 1, 2, cdots$ and ${nchoose x} = {n!over(n-x)!x!}$.

Proof:

$$ egin{align*} sum_{x=0}^{infty}f(x; n, p) &= sum_{x=0}^{infty}{nchoose x}p^{x}(1-p)^{n-x}\ &= [p + (1-p)]^{n};;quadquad mbox{(binomial theorem)}\ &= 1 end{align*} $$

Mean

The expected value is $$mu = E[X] = np$$

Proof:

$$ egin{align*} Eleft[X^k ight] &= sum_{x=0}^{infty}x^{k}{nchoose x}p^{x}(1-p)^{n-x}\ &= sum_{x=1}^{infty}x^{k}{nchoose x}p^{x}(1-p)^{n-x}\ &= npsum_{x=1}^{infty}x^{k-1}{n-1choose x-1}p^{x-1}(1-p)^{n-x}quadquadquad (mbox{identity} x{nchoose x} = n{n-1choose x-1})\ &= npsum_{y=0}^{infty}(y+1)^{k-1}{n-1choose y}p^{y}(1-p)^{n-1-y}quad(mbox{substituting} y=x-1)\ &= npEleft[(Y + 1)^{k-1} ight] quadquadquad quadquadquad quadquadquadquadquad (Ysim B(n-1, p)) \ end{align*} $$ Using the identity $$ egin{align*} x{nchoose x} &= {xcdot n!over(n-x)!x!}\ & = {n!over(n-x)!(x-1)!}\ &= n{(n-1)!over[(n-1)-(x-1)]!(x-1)!}\ &= n{n-1choose x-1} end{align*} $$ Hence setting $k=1$ we have $$E[X] = np$$

Variance

The variance is $$sigma^2 = mbox{Var}(X) = np(1-p)$$

Proof:

$$ egin{align*} mbox{Var}(X) &= Eleft[X^2 ight] - E[X]^2\ &= npE[Y+1] - n^2p^2\ & = npleft(E[Y] + 1 ight) - n^2p^2\ & = np[(n-1)p + 1] - n^2p^2quadquad (Ysim B(n-1, p))\ &= np(1-p) end{align*} $$

Examples

1. Let $X$ be binomially distributed with parameters $n=10$ and $p={1over2}$. Determine the expected value $mu$, the standard deviation $sigma$, and the probability $Pleft(|X-mu| geq 2sigma ight)$. Compare with Chebyshev's Inequality.

Solution:

The binomial mass function is $$f(x) ={nchoose x} p^x cdot q^{n-x}, x=0, 1, 2, cdots$$ where $q=1-p$. The expected value and the standard deviation are $$E[X] = np=5, sigma = sqrt{npq} = 1.581139$$ The probability that $X$ takes a value more than two standard deviations from $mu$ is $$ egin{align*} Pleft(|X-mu| geq 2sigma ight) &= Pleft(|X-5| geq 3.2 ight)\ &= P(Xleq 1) + P(X geq9)\ &= sum_{x=0}^{1}{10choose x}p^{x}(1-p)^{10-x} + sum_{x=9}^{infty}{10choose x}p^{x}(1-p)^{10-x}\ & = 0.02148437 end{align*} $$ R code:

sum(dbinom(c(0, 1), 10, 0.5)) + 1 - sum(dbinom(c(0:8), 10, 0.5))
# [1] 0.02148437
pbinom(1, 10, 0.5) + 1 - pbinom(8, 10, 0.5)
# [1] 0.02148438

Chebyshev's Inequality gives the weaker estimation $$Pleft(|X - mu| geq 2sigma ight) leq {1over2^2} = 0.25$$

2. What is the probability $P_1$ of having at least six heads when tossing a coin ten times?

Solution:

$$ egin{align*} P(X geq 6) &= sum_{x=6}^{10}{10choose x}0.5^{x}0.5^{10-x}\ &= 0.3769531 end{align*} $$ R code:

1 - pbinom(5, 10, 0.5)
# [1] 0.3769531
sum(dbinom(c(6:10), 10, 0.5))
# [1] 0.3769531

3. What is the probability $P_2$ of having at least 60 heads when tossing a coin 100 times?

Solution:

$$ egin{align*} P(X geq 60) &= sum_{x=60}^{100}{100choose x}0.5^{x}0.5^{100-x}\ &= 0.02844397 end{align*} $$ R code:

1 - pbinom(59, 100, 0.5)
# [1] 0.02844397
sum(dbinom(c(60:100), 100, 0.5))
# [1] 0.02844397

Alternatively, we can use normal approximation (generally when $np > 5$ and $n(1-p) > 5$). $mu = np=50$ and $sigma = sqrt{np(1-p)} = sqrt{25}$. $$ egin{align*} P(X geq 60) &= 1 - P(X leq 59)\ &= 1- Phileft({59.5-50over sqrt{25}} ight)\ &= 1-Phi(1.9)\ &= 0.02871656 end{align*} $$ R code:

1 - pnorm(1.9)
# [1] 0.02871656

4. What is the probability $P_3$ of having at least 600 heads when tossing a coin 1000 times?

Solution: $$ egin{align*} P(X geq 600) &= sum_{x=600}^{1000}{1000choose x} 0.5^{x} 0.5^{1000-x}\ &= 1.364232 imes10^{-10} end{align*} $$ R code:

sum(dbinom(c(600:100), 1000, 0.5))
# [1] 1
sum(dbinom(c(600:1000), 1000, 0.5))
# [1] 1.364232e-10

Alternatively, we can use normal approximation. $mu = np=500$ and $sigma = sqrt{np(1-p)} = sqrt{250}$. $$ egin{align*} P(X geq 600) &= 1 - P(X leq 599)\ &= 1- Phileft({599.5-500over sqrt{250}} ight)\ &= 1.557618 imes 10^{-10} end{align*} $$ R code:

1 - pnorm(99.5/sqrt(250))
# [1] 1.557618e-10

Reference

Ross, S. (2010). A First Course in Probability (8th Edition). Chapter 4. Pearson. ISBN: 978-0-13-603313-4.
Brink, D. (2010). Essentials of Statistics: Exercises. Chapter 5 & 8. ISBN: 978-87-7681-409-0.

作者：赵胤
出处：http://www.cnblogs.com/zhaoyin/
本文版权归作者和博客园共有，欢迎转载，但未经作者同意必须保留此段声明，且在文章页面明显位置给出原文连接，否则保留追究法律责任的权利。