【software】变异注释工具：annovar

annovar提供三种注释方式

一，基于基因的注释给定变异，看变异是否影响编码蛋白的改变支持基因定义系统：RefSeq genes, UCSC genes, ENSEMBL genes, GENCODE genes, AceView genes, or many other gene definition systems.

二、基于区域的注释看变异是否在基因组的特定区域

三、基于过滤的注释看变异是否在特定数据库中，如dbSNP

annotate_variation.pl 主程序
table_annovar.pl 常用注释程序

关键参数

—-buildver    数据库版本, 如hg19
—-protocol    refGene, exac03

anno_refgene.sh

#!/bin/bash

name=$1
/rawdata/Zhangyu/software/ANNOVAR_latest/annovar/table_annovar.pl $name 
/rawdata/Zhangyu/software/ANNOVAR_latest/annovar/humandb 
-buildver hg19 
-otherinfo 
-remove 
-nastring . 
-protocol refGene 
-operation g 
--outfile ${name}.annovar

13 20797176 21105944 0 - comments: a 342kb deletion encompassing GJB6, associated with hearing loss

annovar_latest.sh

#!/bin/bash

name=huaxi-7.varscan.vcf
/rawdata/Zhangyu/software/ANNOVAR_latest/annovar/table_annovar.pl $name 
/rawdata/Zhangyu/software/ANNOVAR_latest/annovar/humandb 
-buildver hg19 
-otherinfo 
-remove 
-nastring . 
-protocol refGene,SAOdbSNP150,cosmic83,gnomad_exome,NovoDb_WES_2573,NovoDb_WGS_568,clinvar_20170905,HGMD,genomicSuperDups,gff3 
-operation g,f,f,f,f,f,f,f,r,r 大专栏  【software】变异注释工具：annovaran class="se">
--gff3dbfile hg19_rmsk.gff 
--vcfinput 
--outfile ${name}.annovar

TJ集群annovar注释

name=$1
/PUBLIC/software/HUMAN/ANNOVAR_2017Jun08/table_annovar.pl ./${name} 
/TJNAS01/OBD/Zhangyu/software/annovar_new_db_all_2019.1.8 
-buildver hg19 
-otherinfo 
-remove 
-nastring . 
-protocol refGene,avsnp150,SAOdbSNP150,1000g2015aug_all,exac03,esp6500siv2_all,gnomad_exome,NovoDb_WES_2573,NovoDb_WGS_568,cosmic83,clinvar_20170905,HGMD,ljb26_pp2hvar,ljb26_pp2hdiv,ljb26_sift,gerp++gt2,caddgt10,genomicSuperDups,gff3 
-operation g,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,r,r 
--gff3dbfile hg19_rmsk.gff 
--outfile ${name}.annovar

TJ集群annovar注释, input为VCF

name=$1
/PUBLIC/software/HUMAN/ANNOVAR_2017Jun08/table_annovar.pl ./${name}.vcf 
/TJNAS01/OBD/Zhangyu/software/annovar_new_db_all_2019.1.8 
-buildver hg19 
-otherinfo 
-remove 
-nastring . 
-protocol refGene,avsnp150,SAOdbSNP150,1000g2015aug_all,exac03,esp6500siv2_all,gnomad_exome,NovoDb_WES_2573,NovoDb_WGS_568,cosmic83,clinvar_20170905,HGMD,ljb26_pp2hvar,ljb26_pp2hdiv,ljb26_sift,gerp++gt2,caddgt10,genomicSuperDups,gff3 
-operation g,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,r,r 
--gff3dbfile hg19_rmsk.gff 
--vcfinput 
--outfile ${name}.annovar

关于indel的处理

目前没有描述indel的普遍认同的方式。

作为使用者我们应该这样做：（1）分割VCF文件确保每一行只包含一个变异；（2）left-normalize所有的VCF文件；（3）使用ANNOVAR注释

所以使用命令：

bcftools norm -m-both -o ex1.step1.vcf ex1.vcf.gz

bcftools norm -f human_g1k_v37.fasta -o ex1.step2.vcf ex1.step1.vcf

第一个命令分割多allels变异检出为单独的行，第二个命令运行真正的 left-normalization。（有时候第一个命令可能出现没有变异能被分解，尽管在文件中存在这些变异，这种情况下，你可以使用vt program 代替）

参考：

ANNOVAR进行突变注释

-END-