哈佛视屏课练习1

---恢复内容开始---

先插入工作表，表名为msleep_ggplot2.csv

> setwd("F:/研究生/课程/哈佛视频课/1")
> tab= read.csv("msleep_ggplot2.csv")
> class(tab)
[1] "data.frame"
> head(tab)
> dim(tab)
[1] 83 11

> View(tab)
> c(tab$sleep_total,1000)

> plot(tab$brainwt,tab$sleep_total)
> plot(tab$brainwt,tab$sleep_total,log="x")

计算列为“sleep_total”的总值：
> summary(tab$sleep_total)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.90    7.85   10.10   10.43   13.75   19.90 

表中列1、2的所有内容：
> tab[c(1,2),]

计算sleep_total大于18的项目的所有内容：

> tab[tab$sleep_total>18,]

计算前1，2行动物的sleep_total：
> tab$sleep_total[c(1,2)]
[1] 12.1 17.0

计算sleep_total大于18的平均值：
> mean(tab$sleep_total[tab$sleep_total>18])
[1] 19.275

用which筛选sleep_total>18的项目所在的行，即位置：

> which(tab$sleep_total>18)
[1] 22 37 43 62

第一个sleep_total>18的项目，sleep_total的数值：

> tab$sleep_total[which(tab$sleep_total>18)[1]]
[1] 19.7

计算What is the row number of the animal which has more than 18 hours of total sleep and less than 3 hours of REM sleep?，在R语言中，条件和用&连接，不用and：

> which(tab$sleep_total>18 & tab$sleep_rem<3)
[1] 43

sort()返还的是数值，数值从小到大排序，sort() simply gives back the list of numeric values after sorting them:
> sort(tab$sleep_total)

order（）从小到大排序，返还的是排序后数值所在的位置，或者行，order() gives back the index, in the original vector, of the smallest value, then the next smallest, etc：

> order(tab$sleep_total)

tab$sleep_total[order(tab$sleep_total)]返回的是数值，相当于sort(tab$sleep_total)。

> rank(tab$sleep_total)


为指定的某几行按给出的意愿排序，得出的结果为排序后原先的位置或行：
> match(c("Cow","Owl monkey","Cheetah"),tab$name)


计算the row number for "Cotton rat" in the tab dataframe，类似于检索指定的项目在哪个位置：

> match(c("Cotton rat"),tab$name)

> vec=c("red","blue","green","green","yellow","orange")
> fac=factor(vec)
> fac
[1] red    blue   green  green  yellow orange
Levels: blue green orange red yellow
> levels(fac)
[1] "blue"   "green"  "orange" "red"    "yellow"
> vec=="blue"
[1] FALSE  TRUE FALSE FALSE FALSE FALSE
> fac2=factor(vec,levels=c("blue","green","yellow","orange","red"))
> fac2
[1] red    blue   green  green  yellow orange
Levels: blue green yellow orange red
> levels(fac2)
[1] "blue"   "green"  "yellow" "orange" "red" 

table（）可统计数据的频数： 
> table(tab$order)

split（）函数，split() is a function which takes a vector and splits it into a list, by grouping the vector according to a factor
将order列排序，再列出sleep_total的值：

> s=split(tab$sleep_total,tab$order)




计算Rodentia的平均值：

> mean(s[["Rodentia"]])

lapply() and sapply() are useful functions for applying a function repeatedly to a vector or list. lapply() returns a list, while sapply() tries to "simplify", returning a vector if possible：

> lapply(s,mean)
$Afrosoricida
[1] 15.6

$Artiodactyla
[1] 4.516667

$Carnivora
[1] 10.11667

$Cetacea
[1] 4.5

$Chiroptera
[1] 19.8

$Cingulata
[1] 17.75

$Didelphimorphia
[1] 18.7

$Diprotodontia
[1] 12.4

$Erinaceomorpha
[1] 10.2

$Hyracoidea
[1] 5.666667

$Lagomorpha
[1] 8.4

$Monotremata
[1] 8.6

$Perissodactyla
[1] 3.466667

$Pilosa
[1] 14.4

$Primates
[1] 10.5

$Proboscidea
[1] 3.6

$Rodentia
[1] 12.46818

$Scandentia
[1] 8.9

$Soricomorpha
[1] 11.1

> sapply(s,mean)
   Afrosoricida    Artiodactyla       Carnivora         Cetacea      Chiroptera       Cingulata 
      15.600000        4.516667       10.116667        4.500000       19.800000       17.750000 
Didelphimorphia   Diprotodontia  Erinaceomorpha      Hyracoidea      Lagomorpha     Monotremata 
      18.700000       12.400000       10.200000        5.666667        8.400000        8.600000 
 Perissodactyla          Pilosa        Primates     Proboscidea        Rodentia      Scandentia 
       3.466667       14.400000       10.500000        3.600000       12.468182        8.900000 
   Soricomorpha 
      11.100000

tapply（）函数比较简化，相当于split（）和sapply（）两者函数结合：

> tapply(tab$sleep_total,tab$order,mean)


计算"Primates"的sleep_total的标准差：
Use either lapply(s, sd), sapply(s, sd) or tapply(tab$sleep_total, tab$order, sd)

---恢复内容结束---