segemehl 生成sam文件的后续处理——生成methylation table

1.使用callMajorMethyl.pl处理sam文件

使用以下代码得到正链甲基化情况

 1 #!/bin/bash
 2 if [ $UID -ne 0 ];
 3     then
 4     echo "Please run as root"
 5     exit 1
 6 fi
 7 ##定义.fa文件的路径
 8 fa_path="/home/dklv/UCSC/Sequences/chromFa.mm9/"
 9 ##定义.sam文件的路径
10 sam_path="/home/cjch/projects/monoclone/monoclone.sam"
11 ##在sam文件中匹配所需要条目
12 samtools view -hS  ${sam_path} | awk '/^@/ || /XB:Z:F..CT/' > tmp.ct.sam
13 ##将sam文件转换成bam文件
14 samtools view -bS tmp.ct.sam > tmp.ct.bam
15 ##将bam文件sort处理
16 samtools sort tmp.ct.bam tmp.ct.sorted
17 ##对fa文件夹中不同染色体进行比对
18 for fa in ${fa_path}*.fa;do
19     name=${fa##*/}
20     samtools mpileup -Bf ${fa} tmp.ct.sorted.bam > tmp
21     perl  /home/cjch/projects/monoclone/callMajorMethyl.pl -s 1 -i tmp -o ${name}.txt
22     echo "${name} has been processed "
23 done
24 ##删除临时文件
25 rm -rf tmp*
26 rm -rf ${fa_path}*.fai
27 echo ""
28 echo ""
29 echo ""
30 echo "=========================================="
31 echo "All .fa file have been processed"   
32 echo ""

使用以下代码得到负链链甲基化情况

 1 #!/bin/bash
 2 if [ $UID -ne 0 ];
 3     then
 4     echo "Please run as root"
 5     exit 1
 6 fi
 7 ##定义.fa文件的路径
 8 fa_path="/home/dklv/UCSC/Sequences/chromFa.mm9/"
 9 ##定义.sam文件的路径
10 sam_path="/home/cjch/projects/monoclone/monoclone.sam"
11 ##在sam文件中匹配所需要条目
12 samtools view -hS  ${sam_path} | awk '/^@/ || /XB:Z:F..GA/' > tmp.ct.sam
13 ##将sam文件转换成bam文件
14 samtools view -bS tmp.ct.sam > tmp.ct.bam
15 ##将bam文件sort处理
16 samtools sort tmp.ct.bam tmp.ct.sorted
17 ##对fa文件夹中不同染色体进行比对
18 for fa in ${fa_path}*.fa;do
19     name=${fa##*/}
20     samtools mpileup -Bf ${fa} tmp.ct.sorted.bam > tmp
21     perl  /home/cjch/projects/monoclone/callMajorMethyl.pl -s 2 -i tmp -o ${name}.txt
22     echo "${name} has been processed "
23 done
24 ##删除临时文件
25 rm -rf tmp*
26 rm -rf ${fa_path}*.fai
27 echo ""
28 echo ""
29 echo ""
30 echo "=========================================="
31 echo "All .fa file have been processed"   
32 echo ""

tips:

可以分别建立两个文件夹独立处理,节省时间的同时也方便后续处理

2.处理结果得到bw文件

首先将g2a得到所有文件追加一个.ga后缀

1 for name in *.txt ;do
2     new=${name}".ga"
3     mv  ${name}  ${new}
4 done

将c2t和g2a处理得到的文件放到一个文件下,运行get_methylation_table.sh脚本,脚本代码如下

 1 #!/bin/bash
 2 
 3 echo "Now we will get a Wigfile"
 4 for flct in *.txt;do
 5 flga=$flct".ga"
 6 awk -v name=$flct 'BEGIN{split(name,chr,".");print "variableStep chrom="chr[1] } FNR==NR{print $2"	"$6};FNR!=NR{print $2"	-"$6}' $flct $flga | sort -k1,1n | sed 's/-0.00/0.00/g'  >> rlt.wig
 7 done
 8 
 9 echo "Now We Got A Wigfile"
10 echo `date`
11 
12 
13 ./wigToBigWig rlt.wig /home/dklv/UCSC/genome/mm9.chrom.sizes mm9_methylation_segemehl_M1_H1.bw
14 
15 
16 echo "Now We Got A Wigfile"
17 echo "==============================="
18 echo `date`
19 echo "The mission had been finished "
20 echo "==============================="
原文地址:https://www.cnblogs.com/lyhonk/p/3940602.html