R语言基础篇——数据对象

1、基本数据类型(numeric,logical,character,NA,double,complex,integer)

2、日期变量

常用函数

Sys.Date()-返回系统当前的日期，Sys.time()-返回系统当前的日期和时间，date()-返回系统当前的日期和时间，

as.Date()-将字符串形式的日期值转换为日期变量，as.Date(x,format="",...)

as.POSIXllt()-将字符串转化为包含时间及时区的日期变量，as.POSIXllt(x,tz="",format)

strptime()-将字符串变量转换为包含时间的日期变量，strptime(x,format,tz="")

strfttime()-将日期变量转换为指定格式的字符串变量，strfttime(x,format)

format()-将日期变量转换为指定格式的字符串变量,format(x,...)

3、查看对象的类型

class()、mode()、typeof()

4、数据结构

（1）向量

向量创建：c()函数创建向量

向量索引：#下标方式索引 vector<-c(1,2,3,4) vector[1] vector[c(1:3)]

　　　　　#按名称索引 names(vector)<-c("one","two","three","four") vector[c("one","two")]

　　　　　#which方式索引 which(vector==1) which(vector==c(1,2)) which.max(vector)

　　　　 #subser方式索引 subset(vector,vector>2&vector<4)

　　　　　#%in%方式索引 c(1,5)%in%vector

向量编辑： #向量扩展(x<-c(x,c(5,6,7))) #单个元素的删除 x<-x[-1] #多个元素的删除 (x<-x[c(3:5)])

向量排序：sort(x,decreasing = FALSE,na.last = TRUE...) 倒序——rev()函数

等差数列的建立：seq(from = 1, to = 1, by = ((to - from)/length.out - 1),length.out = NULL,...) seq(1,-9,by = -2)

重复数列的建立：rep(x,times=1,length.out=NA,each=1) rep(1:3, each=2, times=2) 112233112233112233

（2）矩阵

创建矩阵：matrix(data=NA,nrow=1,ncol=1,byrow=FALSE,dimnames=NULL)

x<-c(1:9)

a<-matrix(x,nrow=5,ncol=2,byrow=FAlSE,dimnames=list(c("r1","r2","r3","r4","r5"),c("c1","c2")))

矩阵和转换为向量：as.vector(),转换为向量时元素按列读取数据

矩阵索引：#根据位置索引 a[2,1]

　　　　　#根据行和列的名称索引 a["r2","c2"]

　　　　　#使用一维下标索引 a[,2]

　　　　　#使用数值型向量索引 a[c(3:5)，2]

矩阵编辑：#矩阵合并（a1<-rbind(a,c(11,12))） (a2<-rbind(a,c(11:15)))

　　　　　#删除矩阵中元素 a5<-a[-1,] #删除矩阵中的第一行

矩阵的运算：colSums()-对矩阵的各列求和 rowSums()-对矩阵的各行求和 colMeans()-对矩阵各列求均值 rowMeans()-对矩阵各行求均值

　　　　　　t()-矩阵行列转换 det()-求解矩阵的行列式 crossprod()-求解两个矩阵的内积 outer()-求解矩阵的外积 %*%-矩阵乘法

　　　　　　diag()-对矩阵取对角元素 solve()-对矩阵求解逆矩阵 eigen()-对矩阵求解特征值和特征向量

（3）数组

创建数组：array(data,dim=length(data),dimnames=NULL)

x<-c(1:9)

dim1<-c("A1","A2","A3")

dim2<-c("B1","B2","B3","B4","B5")

dim3<-c("C1","C2")

a<-array(x,dim=c(3,5,2),dimnames=list(dim1,dim2,dim3))

数组索引：#按下标索引 a[2,4,2]

　　　　　#按维度名称索引a["A2"，"B3","C1"]

　　 #查看数组的维度 dim(a)

（4）数据框

创建数据框：data.frame()

#向量组成数据框

data_iris<-data.frame(s.length=c(1,1,1,1),s.width=c(2,2,2,2),w.length=c(3,3,3,3),w.width=c(4,4,4,4))

#矩阵组成数据框

data_matrix<-matrix(c(1:8),c(4,2))

data_iris2<-data.frame(data_matrix)

数据框索引：#列索引 data_iris[,1] || data_iris$s.length || data_iris["s,length"]

#行索引 data_iris[1,] || data_iris[1:3,]

　　　　　#元素索引 data_iris[1,1] data_iris$s.length[1] data_iris["s,length"][1]

　　　　　 #subset索引 subset(data_iris, s.length=1)

　　　　　 #sqldf函数索引 library(sqldf) newdf<-sqldf("select * from mtcars where carb=1 order by mpg",row.names=TRUE)

数据框编辑：#增加新的样本数据 data_iris<-rbind(data_iris,list(9,9,9,9))

　　　　　　#增加数据集的新属性变量 data_iris<-rbind(data_iris,Species=rep(7,5))

　　　　　　#数据框列名的编辑 names(data_iris)

（5）因子

创建因子序列：

将statistics分解成因子型变量，水平为26个小写字母 (ff<-factor(substring("statistics"),1:10,1:10,levels=letters))

去除没有包含在向量中的水平 f<-factor(ff)

#创建因子型向量，水平名称为letter factor(letters[1:20],labels="letter")

#创建有序的因子序列 z<-factor(LETTERS[1:4],ordered=TRUE)

通过gl()函数创建因子序列 gl(n,k,length=n*k,labels=seq_len(n),ordered=TRUE)

n-表示因子水平的个数

k-表示每个水平的重复数

length-表示生成的序列的长度

labels-一个n维向量，表示因子水平

ordered-一个逻辑值，为TRUE表示有序因子，为FALSE则表示无序因子

gl(2,3,labels=c("T","F"))

因子的存储方式：

> status<-c("Poor","Improved","Excellent","Poor")
> class(status) #查看向量的类型
[1] "character"
> s<-factor(status,ordered=TRUE)
> s
[1] Poor Improved Excellent Poor
Levels: Excellent < Improved < Poor
> class(s)
[1] "ordered" "factor" #查看数据的类型
> storage.mode(s) #查看存储类型，可以看出因子是按整数存储的
[1] "integer"
> as.numeric(s) #转换为数值型向量
[1] 3 2 1 3
> levels(s) #查看因子的水平
[1] "Excellent" "Improved" "Poor"

（6）列表

创建列表：list(object1,object2,...)

data<-list(a=c(1,2,3,4),b=c("true","false"),c=c("one","two","three","four"),d=(1+3i))

列表索引:#列索引 data[[1]] || data$a ||data[["a"]]

　　　　#元素索引 data[[1]][1]

列表编辑：列表的编辑和向量类似，使用c()函数进行合并。

#增加名称为e的一列

data1<-c(data,list(e=c(3,4,5)))

或者data1<-c(data,e=list(c(3,4,5)))