plink 阈性状(质量性状)GWAS分析

plink进行阈性状(质量性状)关联分析有两种方法

--assoc 和 --logistic,

这两种方法又分别有分是否校正:

--assoc --adjust 和 --logistic --adjust

--logistic方法有可以选择是否添加协变量:

--logistic --covar 和 --logistic --adjust --covar

这么算的话一共有6种组合。

1、测试数据

root@PC1:/home/test3# ls
qc3.map  qc3.ped
root@PC1:/home/test3# wc -l *   ## 288样本, 417105位点
   417105 qc3.map
      288 qc3.ped
   426702 total
root@PC1:/home/test3# head -n 5 qc3.map
1       1:3174  0       3174
1       1:32399 0       32399
1       1:32402 0       32402
1       1:32406 0       32406
1       1:32443 0       32443
root@PC1:/home/test3# cut -f 1-10 qc3.ped | head -n 5      ## ped文件的第5列性别,第6列是表型数据,1或者2
1       1       0       0       1       1       T       C       T       C
2       2       0       0       1       1       C       C       C       C
3       3       0       0       1       1       C       C       C       C
4       4       0       0       1       2       C       C       C       C
5       5       0       0       1       1       C       C       C       C

2、 --assoc

root@PC1:/home/test3# ls
qc3.map  qc3.ped
root@PC1:/home/test3# plink --file qc3 --assoc --out test    ## 直接使用assoc命令
PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
(C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to test.log.
Options in effect:
  --assoc
  --file qc3
  --out test

15974 MB RAM detected; reserving 7987 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (417105 variants, 288 people).
--file: test-temporary.bed + test-temporary.bim + test-temporary.fam written.
417105 variants loaded from .bim file.
288 people (288 males, 0 females) loaded from .fam.
288 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 288 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.948564.
417105 variants and 288 people pass filters and QC.
Among remaining phenotypes, 75 are cases and 213 are controls.
Writing C/C --assoc report to test.assoc ... done.
root@PC1:/home/test3# ls
qc3.map  qc3.ped  test.assoc  test.log
root@PC1:/home/test3# wc -l *
   417105 qc3.map
      288 qc3.ped
   417106 test.assoc
       27 test.log
   834526 total
root@PC1:/home/test3# head -n 3 test.assoc  ## 查看结果
 CHR         SNP         BP   A1      F_A      F_U   A2        CHISQ            P           OR
   1      1:3174       3174    T   0.1351   0.1332    C     0.003612       0.9521        1.017
   1     1:32399      32399    T   0.1324   0.1288    C       0.0114        0.915        1.032

3、--assoc --adjust命令

root@PC1:/home/test3# ls
qc3.map  qc3.ped
root@PC1:/home/test3# plink --file qc3 --assoc --adjust --out test   ## 直接使用--assoc  --adjust
PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
(C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to test.log.
Options in effect:
  --adjust
  --assoc
  --file qc3
  --out test

15974 MB RAM detected; reserving 7987 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (417105 variants, 288 people).
--file: test-temporary.bed + test-temporary.bim + test-temporary.fam written.
417105 variants loaded from .bim file.
288 people (288 males, 0 females) loaded from .fam.
288 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 288 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.948564.
417105 variants and 288 people pass filters and QC.
Among remaining phenotypes, 75 are cases and 213 are controls.
Writing C/C --assoc report to test.assoc ... done.
--adjust: Genomic inflation est. lambda (based on median chisq) = 1.18741.
--adjust values (417105 variants) written to test.assoc.adjusted .
root@PC1:/home/test3# ls
qc3.map  qc3.ped  test.assoc  test.assoc.adjusted  test.log
root@PC1:/home/test3# wc -l *
   417105 qc3.map
      288 qc3.ped
   417106 test.assoc
   417106 test.assoc.adjusted
       30 test.log
  1251635 total
root@PC1:/home/test3# head -n 3 test.assoc    ## 查看结果
 CHR         SNP         BP   A1      F_A      F_U   A2        CHISQ            P           OR
   1      1:3174       3174    T   0.1351   0.1332    C     0.003612       0.9521        1.017
   1     1:32399      32399    T   0.1324   0.1288    C       0.0114        0.915        1.032
root@PC1:/home/test3# head -n 3 test.assoc.adjusted   ## 查看结果
 CHR         SNP      UNADJ         GC       BONF       HOLM   SIDAK_SS   SIDAK_SD     FDR_BH     FDR_BY
  20 20:11254823   7.14e-48  1.357e-40  2.978e-42  2.978e-42        INF        INF  1.489e-42  2.013e-41
  20 20:11269211   7.14e-48  1.357e-40  2.978e-42  2.978e-42        INF        INF  1.489e-42  2.013e-41

4、--logistic

root@PC1:/home/test3# ls
qc3.map  qc3.ped
root@PC1:/home/test3# plink --file qc3 --logistic --out test    ## 直接使用logistic
PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
(C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to test.log.
Options in effect:
  --file qc3
  --logistic
  --out test

15974 MB RAM detected; reserving 7987 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (417105 variants, 288 people).
--file: test-temporary.bed + test-temporary.bim + test-temporary.fam written.
417105 variants loaded from .bim file.
288 people (288 males, 0 females) loaded from .fam.
288 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 288 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.948564.
417105 variants and 288 people pass filters and QC.
Among remaining phenotypes, 75 are cases and 213 are controls.
Writing logistic model association results to test.assoc.logistic ... done.
root@PC1:/home/test3# ls
qc3.map  qc3.ped  test.assoc.logistic  test.log
root@PC1:/home/test3# wc -l *
   417105 qc3.map
      288 qc3.ped
   417106 test.assoc.logistic
       27 test.log
   834526 total
root@PC1:/home/test3# head test.assoc.logistic   ## 查看结果
 CHR         SNP         BP   A1       TEST    NMISS         OR         STAT            P
   1      1:3174       3174    T        ADD      273      1.019      0.06304       0.9497
   1     1:32399      32399    T        ADD      266      1.029       0.1025       0.9183
   1     1:32402      32402    C        ADD      267     0.9198      -0.4321       0.6657
   1     1:32406      32406    C        ADD      267     0.8815      -0.6626       0.5076
   1     1:32443      32443    C        ADD      268       1.16       0.6713        0.502
   1     1:32548      32548    T        ADD      269      1.038        0.147       0.8831
   1     1:45044      45044    A        ADD      286     0.8389      -0.7204       0.4713
   1     1:45047      45047    T        ADD      283     0.8288      -0.6577       0.5107
   1     1:45085      45085    C        ADD      283      1.007      0.03192       0.9745

5、--logistic --adjust命令

root@PC1:/home/test3# ls
qc3.map  qc3.ped
root@PC1:/home/test3#
root@PC1:/home/test3#
root@PC1:/home/test3# ls
qc3.map  qc3.ped
root@PC1:/home/test3# plink --file qc3 --logistic --adjust --out test   ## 直接运行
PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
(C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to test.log.
Options in effect:
  --adjust
  --file qc3
  --logistic
  --out test

15974 MB RAM detected; reserving 7987 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (417105 variants, 288 people).
--file: test-temporary.bed + test-temporary.bim + test-temporary.fam written.
417105 variants loaded from .bim file.
288 people (288 males, 0 females) loaded from .fam.
288 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 288 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.948564.
417105 variants and 288 people pass filters and QC.
Among remaining phenotypes, 75 are cases and 213 are controls.
Writing logistic model association results to test.assoc.logistic ... done.
--adjust: Genomic inflation est. lambda (based on median chisq) = 1.12802.
--adjust values (417105 variants) written to test.assoc.logistic.adjusted .
root@PC1:/home/test3# ls  ## 查看生成的结果
qc3.map  qc3.ped  test.assoc.logistic  test.assoc.logistic.adjusted  test.log
root@PC1:/home/test3# wc -l *
   417105 qc3.map
      288 qc3.ped
   417106 test.assoc.logistic
   417106 test.assoc.logistic.adjusted
       30 test.log
  1251635 total
root@PC1:/home/test3# head -n 3 test.assoc.logistic   ## 查看结果
 CHR         SNP         BP   A1       TEST    NMISS         OR         STAT            P
   1      1:3174       3174    T        ADD      273      1.019      0.06304       0.9497
   1     1:32399      32399    T        ADD      266      1.029       0.1025       0.9183
root@PC1:/home/test3# head -n 3 test.assoc.logistic.adjusted    ## 查看结果
 CHR         SNP      UNADJ         GC       BONF       HOLM   SIDAK_SS   SIDAK_SD     FDR_BH     FDR_BY
  20 20:11020547  1.411e-23  4.399e-21  5.887e-18  5.887e-18        INF        INF  5.887e-18  7.959e-17
  20 20:11403719   4.34e-22  9.205e-20   1.81e-16   1.81e-16   2.22e-16   2.22e-16  9.052e-17  1.224e-15

6、加入协变量, 如何生成协变量

step1:

root@PC1:/home/test3# ls
qc3.map  qc3.ped
root@PC1:/home/test3# plink --file qc3 --genome --out cov   ## 直接使用--genome命令
PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
(C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to cov.log.
Options in effect:
  --file qc3
  --genome
  --out cov

15974 MB RAM detected; reserving 7987 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (417105 variants, 288 people).
--file: cov-temporary.bed + cov-temporary.bim + cov-temporary.fam written.
417105 variants loaded from .bim file.
288 people (288 males, 0 females) loaded from .fam.
288 phenotype values loaded from .fam.
Using up to 8 threads (change this with --threads).
Before main variant filters, 288 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.948564.
417105 variants and 288 people pass filters and QC.
Among remaining phenotypes, 75 are cases and 213 are controls.
IBD calculations complete.
Finished writing cov.genome .
root@PC1:/home/test3# ls
cov.genome  cov.log  qc3.map  qc3.ped

step2:

root@PC1:/home/test3# ls
cov.genome  cov.log  qc3.map  qc3.ped
root@PC1:/home/test3# plink --file qc3 --read-genome cov.genome --cluster --mds-plot 10 --out cov2 
PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
(C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to cov2.log.
Options in effect:
  --cluster
  --file qc3
  --mds-plot 10
  --out cov2
  --read-genome cov.genome

15974 MB RAM detected; reserving 7987 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (417105 variants, 288 people).
--file: cov2-temporary.bed + cov2-temporary.bim + cov2-temporary.fam written.
417105 variants loaded from .bim file.
288 people (288 males, 0 females) loaded from .fam.
288 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 288 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.948564.
417105 variants and 288 people pass filters and QC.
Among remaining phenotypes, 75 are cases and 213 are controls.
Clustering... done.
Cluster solution written to cov2.cluster1 , cov2.cluster2 , and cov2.cluster3 .
Performing multidimensional scaling analysis (SVD algorithm, 10
dimensions)... done.
MDS solution written to cov2.mds .
root@PC1:/home/test3# ls  ## 查看结果 cov2.mds为下一步要使用的结果
cov2.cluster1  cov2.cluster2  cov2.cluster3  cov2.log  cov2.mds  cov.genome  cov.log  qc3.map  qc3.ped

step3:

root@PC1:/home/test3# ls
cov2.cluster1  cov2.cluster2  cov2.cluster3  cov2.log  cov2.mds  cov.genome  cov.log  qc3.map  qc3.ped
root@PC1:/home/test3# awk '{print$1, $2, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13}' cov2.mds > cov3.txt  ## 整理格式
root@PC1:/home/test3# ls
cov2.cluster1  cov2.cluster2  cov2.cluster3  cov2.log  cov2.mds  cov3.txt  cov.genome  cov.log  qc3.map  qc3.ped
root@PC1:/home/test3# wc -l cov3.txt
289 cov3.txt
root@PC1:/home/test3# head -n 3 cov3.txt  ## 查看格式
FID IID C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
1 1 0.0111045 -0.0260147 0.0157171 -0.0128374 -0.00871872 0.015319 -0.0206926 0.0229361 -0.0262635 0.0180982
2 2 -0.00213807 0.0639865 -0.0298421 -0.0296622 0.0138335 0.057727 -0.0341887 -0.0108086 0.0270301 0.0114068

8、--logistic --covar

root@PC1:/home/test3# ls
cov3.txt  qc3.map  qc3.ped
root@PC1:/home/test3# plink --file qc3 --covar cov3.txt --logistic --hide-covar --out test  ##添加协变量
PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
(C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to test.log.
Options in effect:
  --covar cov3.txt
  --file qc3
  --hide-covar
  --logistic
  --out test

Note: --hide-covar flag deprecated.  Use e.g. "--linear hide-covar".
15974 MB RAM detected; reserving 7987 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (417105 variants, 288 people).
--file: test-temporary.bed + test-temporary.bim + test-temporary.fam written.
417105 variants loaded from .bim file.
288 people (288 males, 0 females) loaded from .fam.
288 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
--covar: 10 covariates loaded.
Before main variant filters, 288 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.948564.
417105 variants and 288 people pass filters and QC.
Among remaining phenotypes, 75 are cases and 213 are controls.
Writing logistic model association results to test.assoc.logistic ... done.
root@PC1:/home/test3# ls  ## 查看生成结果
cov3.txt  qc3.map  qc3.ped  test.assoc.logistic  test.log
root@PC1:/home/test3# wc -l *
      289 cov3.txt
   417105 qc3.map
      288 qc3.ped
   417106 test.assoc.logistic
       31 test.log
   834819 total
root@PC1:/home/test3# head test.assoc.logistic -n 3   ## 查看结果
 CHR         SNP         BP   A1       TEST    NMISS         OR         STAT            P
   1      1:3174       3174    T        ADD      273     0.9015      -0.3232       0.7465
   1     1:32399      32399    T        ADD      266     0.9416      -0.1934       0.8467

9、--logistic --adjust --covar

root@PC1:/home/test3# ls
cov3.txt  qc3.map  qc3.ped
root@PC1:/home/test3# plink --file qc3 --covar cov3.txt --adjust --logistic --hide-covar --out test
PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
(C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to test.log.
Options in effect:
  --adjust
  --covar cov3.txt
  --file qc3
  --hide-covar
  --logistic
  --out test

Note: --hide-covar flag deprecated.  Use e.g. "--linear hide-covar".
15974 MB RAM detected; reserving 7987 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (417105 variants, 288 people).
--file: test-temporary.bed + test-temporary.bim + test-temporary.fam written.
417105 variants loaded from .bim file.
288 people (288 males, 0 females) loaded from .fam.
288 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
--covar: 10 covariates loaded.
Before main variant filters, 288 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.948564.
417105 variants and 288 people pass filters and QC.
Among remaining phenotypes, 75 are cases and 213 are controls.
Writing logistic model association results to test.assoc.logistic ... done.
--adjust: Genomic inflation est. lambda (based on median chisq) = 1.17784.
--adjust values (417105 variants) written to test.assoc.logistic.adjusted .
root@PC1:/home/test3# ls   ### 查看生成结果
cov3.txt  qc3.map  qc3.ped  test.assoc.logistic  test.assoc.logistic.adjusted  test.log
root@PC1:/home/test3# wc -l *
      289 cov3.txt
   417105 qc3.map
      288 qc3.ped
   417106 test.assoc.logistic
   417106 test.assoc.logistic.adjusted
       34 test.log
  1251928 total
root@PC1:/home/test3# head test.assoc.logistic -n 3  ## 查看结果
 CHR         SNP         BP   A1       TEST    NMISS         OR         STAT            P
   1      1:3174       3174    T        ADD      273     0.9015      -0.3232       0.7465
   1     1:32399      32399    T        ADD      266     0.9416      -0.1934       0.8467
root@PC1:/home/test3# head test.assoc.logistic.adjusted -n 3  ## 查看结果
 CHR         SNP      UNADJ         GC       BONF       HOLM   SIDAK_SS   SIDAK_SD     FDR_BH     FDR_BY
  20 20:11403719  7.848e-19  3.194e-16  3.274e-13  3.274e-13  3.274e-13  3.274e-13  3.274e-13  4.425e-12
  20 20:11020547  2.313e-18  8.011e-16  9.646e-13  9.646e-13  9.646e-13  9.646e-13   3.76e-13  5.082e-12

参考:

(1)https://github.com/MareesAT/GWA_tutorial/

(2)https://blog.csdn.net/yijiaobani/article/details/105294261

原文地址:https://www.cnblogs.com/liujiaxin2018/p/15704168.html