python学习——通过命令行参数根据fasta文件中染色体id提取染色体序列

提取fasta文件genome_test.fa中第14号染色体的序列,其内容如下:

>chr1
ATATATATAT
>chr2
ATATATATATCGCGCGCGCG
>chr3
ATATATATATCGCGCGCGCGATATATATAT
>chr4
ATATATATATCGCGCGCGCGATATATATATCGCGCGCGCG
>chr5
ATATATATATCGCGCGCGCGATATATATATCGCGCGCGCGATATATATAT
>chr6
ATCGATGCAGCATG
>chr7
TATCGCGCGCGCGATATAT
>chr8
ATATCGCGCGCGCGATATATATATCGCG
>chr9
ATCGCGCGCGCGATATATATATCGCG
>chr10
GCGCGCGATATAT
>chr11
CGCGATATATATATC
>chr12
ATATATCGCGCGCGCGATATAT
>chr13
ATATATCGCGCGCGCGATATATGCGATATATATATC
>chr15
ATATATGCGAT
>chr14
GCGCGCGCGATATATGCGAT
>chr16
GCGATATATGCGATATATATATC
>chr17
GCGCGCGCGATATATATATCGCGCGCGCGATATATATAT
>chr18
GCGCGCGCGATATATATATCGCGCGCGCGATATATATATATATGCGATATATATATC
>chr19
ATATGCGATATATATATCGCGCGCGCGATATATATATCGCGCGCGCGATATATATATATATGCGA
>chr20
TATGCGATATATATATCGCGCGCGCGATATATATATCGCGCGCGCGATATATATATATATGCGA
>chr21
TATATCGCGCGCGCGATATATATATCGCGCGCGCGATATATATATATATGCGA
>chr22
ATATATATCGCGCGCGCGATATATATATATATGCGA
>chrX
CGCGCGCGATATATATATATATGCGA
>chrY
CGCGCGCGATATATATATATATGCGACGCGCGCGATATATATATATATGCGACGCGCGCGATATATATATATATGCGA

用python以及命令行参数实现

新建.py文件“”GetSeqFromChrID.py”,

python脚本如下:

 1 import argparse
 2 
 3 def read_fasta(input):
 4 
 5     with open(input, 'r') as f:
 6         fasta = {}
 7         for line in f:
 8             line = line.strip()
 9             if line[0] == '>':
10                 header = line[1:]
11             else:
12                 sequence = line
13                 fasta[header] = fasta.get(header, '') + sequence
14 
15     return fasta
16 
17 
18 if __name__ == '__main__':
19     # read arguments
20     parser = argparse.ArgumentParser(description="this program is used to extract a single "
21                                                  "sequence from genome")
22     parser.add_argument('--input', '-i',
23                         type=str,
24                         help='input file in fasta format')
25     parser.add_argument('--output', '-o',
26                         type=str,
27                         help='output file')
28     parser.add_argument('seq_id',
29                         type=str,
30                         help='sequence id')
31     args = parser.parse_args()
32 
33     fasta = read_fasta(args.input)
34     with open(args.output, 'w') as f:
35         f.write('>{:s}
{:s}
'.format(args.seq_id,fasta.get(args.seq_id, 'can not found this sequence')))

命令行参数输入如下:红色字体是输入部分

1 (base) e:15_pythonDEBUG>python GetSeqFromChrID.py -i genome_test.fa -o chr14.fa chr14

结果如下:

  

原文地址:https://www.cnblogs.com/caicai2019/p/10867405.html