SNPsnap | 筛选最佳匹配的SNP | 富集分析 | CP loci

一个矛盾:

GWAS得到的SNP做富集分析的话,通常都会有强的偏向性。

co-localization of GWAS signals to gene-dense and high linkage disequilibrium (LD) regions, and correlations of gene size, location and function

数据库使用注意:

  • 一次最多只能输入200-300个SNP
  • SNP必须以rs id格式输入,否则基本不识别

SNPsnap: a Web-based tool for identification and annotation of matched SNPs 

providing matched sets of SNPs that can be used to calibrate background expectations.

基于:allele frequency, number of SNPs in LD, distance to nearest gene and gene density

根据条件,选出类似的SNP:

  1. Minor allele frequency : we partitioned SNPs into minor allele frequency bins (using 1–2, 2–3, … , 49–50% strata).
  2. LD buddies : for each SNP, we counted the number of ‘buddy’ SNPs in LD at various thresholds (r 2 > 0.1, 0.2, … , 0.9) [using PLINK v.1.07 ( Purcell et al. , 2007 ) to compute LD].
  3. Distance to nearest gene : we computed the distance to the nearest 5′ start site using Ensembl gene coordinates ( Flicek et al. , 2014 ). If the SNP was within a gene, we used the distance to that gene’s start site.
  4. Gene density : we counted the number of genes in loci around the SNP, using LD (r 2 > 0.1, 0.2, … , 0.9) and physical distance (100, 200, … , 1000 kb) to define loci.

这里我们就要根据这个工具来筛选T0的SNP。

a) the number of T0 loci was set to be the same as that of the T1 loci (associated with a single trait);

b) the length distribution of T0 loci was set to be the same as that of the T1 loci;

c) the T0 loci should not include the ENCODE blacklist regions and human leukocyte antigen (HLA) regions; and

d) they should be randomly selected from autosomal regions.

画这个图的脚本:

head=T2
bedfile=../sort.CP.region.T2.bed

# cat CP.region.T0.bed | bedtools sort -g ../genome.txt > sort.CP.region.T0.bed
# cat CP.region.T2.bed | bedtools sort -g ../genome.txt > sort.CP.region.T2.bed
# cat CP.region.T3.bed | bedtools sort -g ../genome.txt > sort.CP.region.T3.bed

bedtools intersect -a ../../UCSC.anno/CDS.bed -b $bedfile -wa | bedtools merge > $head.CDS.bed &&
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.CDS.bed -wa > $head.CDS.cons.bed &&

bedtools intersect -a ../../UCSC.anno/UTR3.bed -b $bedfile -wa | bedtools merge > $head.UTR3.bed &&
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.UTR3.bed -wa > $head.UTR3.cons.bed &&

bedtools intersect -a ../../UCSC.anno/UTR5.bed -b $bedfile -wa | bedtools merge > $head.UTR5.bed &&
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.UTR5.bed -wa > $head.UTR5.cons.bed &&

bedtools intersect -a ../../UCSC.anno/Down2K.bed -b $bedfile -wa | bedtools merge > $head.Down2K.bed &&
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.Down2K.bed -wa > $head.Down2K.cons.bed &&

bedtools intersect -a ../../UCSC.anno/Up2K.bed -b $bedfile -wa | bedtools merge > $head.Up2K.bed &&
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.Up2K.bed -wa > $head.Up2K.cons.bed &&

bedtools intersect -a ../../UCSC.anno/Intron.bed -b $bedfile -wa | bedtools merge > $head.Intron.bed &&
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.Intron.bed -wa > $head.Intron.cons.bed &&

bedtools intersect -a ../../UCSC.anno/intergenic.bed -b $bedfile -wa | bedtools merge > $head.intergenic.bed &&
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.intergenic.bed -wa > $head.intergenic.cons.bed &&

echo done!


# awk '{ total += $4 } END { print total/NR }' T2.CDS.cons.bed

  

批量求均值

awk '{ total += $4 } END { print total/NR }' T*.CDS.cons.bed
awk '{ total += $4 } END { print total/NR }' T*.UTR3.cons.bed
awk '{ total += $4 } END { print total/NR }' T*.UTR5.cons.bed
awk '{ total += $4 } END { print total/NR }' T*.Down2K.cons.bed
awk '{ total += $4 } END { print total/NR }' T*.Up2K.cons.bed
awk '{ total += $4 } END { print total/NR }' T*.Intron.cons.bed
awk '{ total += $4 } END { print total/NR }' T*.intergenic.cons.bed

 

按CP loci来分别统计平均分,bedtools的特殊功能

for i in CDS UTR3 UTR5 Down2K Up2K Intron intergenic
do
# bedtools map -a sort.CP.region.T0.bed -b T0/T0.CDS.cons.bed -c 4 -o mean | cut -f4
echo $i
#
# echo $i > CPmerge/$i.T0.score
# bedtools map -a sort.CP.region.T0.bed -b T0/T0.$i.cons.bed -c 4 -o mean | cut -f4 >> CPmerge/$i.T0.score
#
echo $i > CPmerge/$i.T1.score
bedtools map -a sort.CP.region.T1.bed -b T1/T1.$i.cons.bed -c 4 -o mean | cut -f6 >> CPmerge/$i.T1.score
#
echo $i > CPmerge/$i.T2.score
bedtools map -a sort.CP.region.T2.bed -b T2/T2.$i.cons.bed -c 4 -o mean | cut -f6 >> CPmerge/$i.T2.score
#
echo $i > CPmerge/$i.T00.score
bedtools map -a sort.SNPsnap.bed -b SNPsnap/SNPsnap.$i.cons.bed -c 4 -o mean | cut -f6 >> CPmerge/$i.T00.score
#
done

#paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T0.* > T0.score
#paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T1.* > T1.score
#paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T2.* > T2.score
#paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T00.* > T00.score

  

 

待续

原文地址:https://www.cnblogs.com/leezx/p/11822163.html