Applied Nonparametric Statistics-lec9

Ref：https://onlinecourses.science.psu.edu/stat464/print/book/export/html/12

前面我们考虑的情况是：response是连续的，variable是离散的。举例：如果打算检查GPA的中位数是否与学生坐在教室的位置有关，

那么GPA的中位数是连续的，是响应变量；学生坐的位置（前中后）是离散的，是解释变量。

现在考虑解释变量也是连续的情况，即检查两个连续变量之间的因果关系。其中，我们最关心的是关系的强弱和方向。

首先，我们考虑线性相关的情况，计算Pearson's correlation coefficient

计算Pearson's Correlation Coefficient

cor.test(x, y)

结果将给出系数cor，置信区间，p-value

计算斜率（最小二乘法拟合时）

> h=c(67, 62, 64, 65)
> w=c(120, 172, 167, 145)
> lm(w~h)

注意：这里使用的是y~x来拟合，应该是可以选用不同的公式的。结果里，斜率是-10.85

R Outpuit

Spearman's Rank Correlation

使用两个变量的rank值，置换计算Pearson's，就是Spearman's

Kendall's Tau Rank Correlation

measuring association by counting the number of concordant and disconcordant pairs

concordant pairs

Bootstrap

The sample we get from sampling from the data with replacement is called the bootstrap sample

sample=sample(data, 10, replace=T)

Steps for Creating a Bootstrap Estimate of Correlation

1. Gather a bootstrap sample of size n (Think carefully how to do this).

2. Calculate the sample correlation, ri , from the bootstrap sample.

3. Repeat steps (1)-(2) B times. Typically want B to be larger than 100. I would say B = 1000 is a good number.

4. To find the (1-α)100\%CI for ρ, you would order the data and find the α/2 and 1-α/2 percentiles as the lower and upper bounds.