【R读取报错】解决: Can't bind data because some arguments have the same name

最近读取一个数据时,报如标题的错误。

args[1] <- "RT_10-VS-RT_0"
all <- read.delim(paste0(args[1],".xls"),header = T,check.names = F) 
dat <- all %>% dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence) 

这是因为select函数对于有重复列名的数据框,选择不了。(即使不选择重复的列也会报此错误)。

可以用以下脚本查下重复的列名:

#检查重复列名
> tibble::enframe(names(all)) %>% count(value) %>% filter(n > 1)
# A tibble: 1 x 2
  value          n
  <chr>      <int>
1 Protein_ID     2

发现有两个Protein_ID的列。

如何解决呢?可改用readr读取,会智能解析。

all <- readr::read_delim(paste0(args[1],".xls"),delim = "	") %>% 
  dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence)

Parsed with column specification:
cols(
  .default = col_character(),
  No. = col_double(),
  Mass = col_double(),
  Protein_Coverage = col_double(),
  `Mean_Ratio_RT_10_118/RT_0_117` = col_double(),
  `Tremble Identity` = col_double(),
  `Tremble E-value` = col_double()
)
See spec(...) for full column specifications.
Warning: 29 parsing failures.
 row                           col expected actual                file
1001 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
1001 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
1410 Mean_Ratio_RT_10_118/RT_0_117 a double    n/a 'RT_10-VS-RT_0.xls'
1871 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
1871 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
.... ............................. ........ ...... ...................
See problems(...) for more details.

Warning message:
Duplicated column names deduplicated: 'Protein_ID' => 'Protein_ID_1' [14]

警告中也有提示解析(按默认解析方式col_double)失败的列和行,提示了重复列Protein_ID。怎么去掉长长的Parsed with column specification信息呢,我们可以指定读入时列名解析类型,或使用默认参数col_types = cols()

all <- readr::read_delim(paste0(args[1],".xls"),delim = "	",col_types = cols()) %>% 
  dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence)  

Warning: 29 parsing failures.
 row                           col expected actual                file
1001 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
1001 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
1410 Mean_Ratio_RT_10_118/RT_0_117 a double    n/a 'RT_10-VS-RT_0.xls'
1871 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
1871 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
.... ............................. ........ ...... ...................
See problems(...) for more details.

Warning message:
Duplicated column names deduplicated: 'Protein_ID' => 'Protein_ID_1' [14] 

警告信息还在,最好保留。

Ref:https://github.com/tidyverse/readr/issues/954

原文地址:https://www.cnblogs.com/jessepeng/p/12452211.html