基本概率分布Basic Concept of Probability Distributions 5: Hypergemometric Distribution

PDF version

PMF

Suppose that a sample of size $n$ is to be chosen randomly (without replacement) from an urn containing $N$ balls, of which $m$ are white and $N-m$ are black. If we let $X$ denote the number of white balls selected, then $$f(x; N, m, n) = Pr(X = x) = {{mchoose x}{N-mchoose n-x}over {Nchoose n}}$$ for $x= 0, 1, 2, cdots, n$.

Proof:

This is essentially the Vandermonde's identity: $${m+nchoose r} = sum_{k=0}^{r}{mchoose k}{nchoose r-k}$$ where $m$, $n$, $k$, $rin mathbb{N}_0$. Because $$ egin{align*} sum_{r=0}^{m+n}{m+nchoose r}x^r &= (1+x)^{m+n} quadquadquadquadquadquadquadquad mbox{(binomial theorem)}\ &= (1+x)^m(1+x)^n\ &= left(sum_{i=0}^{m}{mchoose i}x^{i} ight)left(sum_{j=0}^{n}{nchoose j}x^{j} ight)\ &= sum_{r=0}^{m+n}left(sum_{k=0}^{r}{mchoose k}{nchoose r-k} ight)x^r quadquadmbox{(product of two binomials)} end{align*} $$ Using the product of two binomials: $$ egin{eqnarray*} left(sum_{i=0}^{m}a_i x^i ight)left(sum_{j=0}^{n}b_j x^j ight) &=& left(a_0+a_1x+cdots + a_mx^m ight)left(b_0+b_1x+cdots + b_nx^n ight)\ &=& a_0b_0 + a_0b_1x +a_1b_0x +cdots +a_0b_2x^2 + a_1b_1x^2 + a_2b_0x^2 +\ & &cdots + a_mb_nx^{m+n}\ &=& sum_{r=0}^{m+n}left(sum_{k=0}^{r}a_{k}b_{r-k} ight)x^{r} end{eqnarray*} $$ Hence $$ egin{eqnarray*} & &sum_{r=0}^{m+n}{m+nchoose r}x^r = sum_{r=0}^{m+n}left(sum_{k=0}^{r}{mchoose k}{nchoose r-k} ight)x^r\ &implies& {m+nchoose r} = sum_{k=0}^{r}{mchoose k}{nchoose r-k}\ & implies& sum_{k=0}^{r}{{mchoose k}{nchoose r-k}over {m+nchoose r}} = 1 end{eqnarray*} $$

Mean

The expected value is $$mu = E[X] = {nmover N}$$

Proof:

$$ egin{eqnarray*} E[X^k] &=& sum_{x=0}^{n}x^kf(x; N, m, n)\ &=& sum_{x=0}^{n}x^k{{mchoose x}{N-mchoose n-x}over {Nchoose n}}\ &=& {nmover N}sum_{x=0}^{n} x^{k-1} {{m-1 choose x-1}{N-mchoose n-x}over {N-1 choose n-1}}\ & & (mbox{identities:} x{mchoose x} = m{m-1choose x-1}, n{Nchoose n} = N{N-1choose n-1})\ &=& {nmover N}sum_{x=0}^{n} (y+1)^{k-1} {{m-1 choose y}{(N-1) - (m - 1)choose (n-1)-y}over {N-1 choose n-1}}quadquad(mbox{setting} y=x-1)\ &=& {nmover N}Eleft[(Y+1)^{k-1} ight] quadquadquad quadquad quadquadquadquad (mbox{since} Ysim g(y; m-1, n-1, N-1)) end{eqnarray*} $$ Hence, setting $k=1$ we have $$E[X] = {nmover N}$$ Note that this follows the mean of the binomial distribution $mu = np$, where $p = {mover N}$.

Variance

The variance is $$sigma^2 = mbox{Var}(X) = np(1-p)left(1 - {n-1 over N-1} ight)$$ where $p = {mover N}$.

Proof:

$$ egin{align*} E[X^2] &= {nmover N}E[Y+1] quadquadquad quadquadquad quad (mbox{setting} k=2)\ &= {nmover N}left(E[Y] + 1 ight)\ & = {nmover N}left[{(n-1) (m-1) over N-1}+1 ight] end{align*} $$ Hence the variance is $$ egin{align*} mbox{Var}(X) &= Eleft[X^2 ight] - E[X]^2\ &= {mnover N}left[{(n-1) (m-1) over N-1}+1 - {nmover N} ight]\ &= np left[ (n-1) cdot {pN-1over N-1}+1-np ight] quadquad quad quad quadquad(mbox{setting} p={mover N})\ &= npleft[(n-1)cdot {p(N-1) + p -1 over N-1} + 1 -np ight]\ &= npleft[(n-1)p + (n-1)cdot{p-1 over N-1} + 1-np ight]\ &= npleft[1-p - (1-p)cdot {n-1over N-1} ight] \ &= np(1-p)left(1 - {n-1 over N-1} ight) end{align*} $$ Note that it is approximately equal to 1 when $N$ is sufficient large (i.e. ${n-1over N-1} ightarrow 0$ when $N ightarrow +infty$). And then it is the same as the variance of the binomial distribution $sigma^2 = np(1-p)$, where $p = {mover N}$.

Examples

1. At a lotto game, seven balls are drawn randomly from an urn containing 37 balls numbered from 0 to 36. Calculate the probability $P$ of having exactly $k$ balls with an even number for $k=0, 1, cdots, 7$.

Solution:

$$P(X = k) = {{19choose k}{18choose 7-k}over {37 choose 7}}$$

p = NA; k = 0:7
for (i in k){
+   p[i+1] = round(choose(19, i) * choose(18, 7-i) 
+                  / choose(37, 7), 3)
+ }
p
# [1] 0.003 0.034 0.142 0.288 0.307 0.173 0.047 0.005

2. Determine the same probabilities as in the previous problem, this time using the normal approximation.

Solution:

The mean is $$mu = {nmover N} = {7 imes19over 37} = 3.594595$$ and the standard deviation is $$sigma = sqrt{{nmover N}left(1-{mover N} ight)left(1 - {n-1over N-1} ight)} = sqrt{{7 imes19over 37}left(1 - {19over 37} ight) left(1 - {7-1over 37-1} ight)} = 1.207174$$ The probability of normal approximation is

p = NA; k = 0:7
mu = 7 * 19 / 37
s = sqrt(7 * 19 / 37 * (1 - 19/37) * (1 - 6/36))
for (i in k){
+   p[i+1] = round(dnorm(i, mu, s), 3)
+ }
p
# [1] 0.004 0.033 0.138 0.293 0.312 0.168 0.045 0.006

Reference

  1. Ross, S. (2010). A First Course in Probability (8th Edition). Chapter 4. Pearson. ISBN: 978-0-13-603313-4.
  2. Brink, D. (2010). Essentials of Statistics: Exercises. Chapter 11. ISBN: 978-87-7681-409-0.


作者:赵胤
出处:http://www.cnblogs.com/zhaoyin/
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。

原文地址:https://www.cnblogs.com/zhaoyin/p/4206519.html