fastx tookit 操作fasta/fastq 文件 (1)

准备测试文件 test.fq, 包含4条fastq 文件,碱基编码格式为phred64;

@FC12044_91407_8_200_406_24
NTTAGCTCCCACCTTAAGATGTTTA
+FC12044_91407_8_200_406_24
SXXTXXXXXXXXXTTSUXSSXKTMQ
@FC12044_91407_8_200_720_610
CTCTGTGGCACCCCATCCCTCACTT
+FC12044_91407_8_200_720_610
OXXXXXXXXXXXXXXXXXTSXQTXU
@FC12044_91407_8_200_345_133
GATTTTTTAACAATAAACGTACATA
+FC12044_91407_8_200_345_133
OQTOOSFORTFFFIIOFFFFFFFFF
@FC12044_91407_8_200_106_131
GTTGCCCAGGCTCGTCTTGAACTCC
+FC12044_91407_8_200_106_131
XXXXXXXXXXXXXXSXXXXISTXQS 

1) fastq_to_fasta , 将fastq 文件转换为fasta文件

命令:

fastq_to_fasta -i test.fq -o test.fa

输出内容为:

cat test.fa
>FC12044_91407_8_200_720_610
CTCTGTGGCACCCCATCCCTCACTT
>FC12044_91407_8_200_345_133
GATTTTTTAACAATAAACGTACATA
>FC12044_91407_8_200_106_131
GTTGCCCAGGCTCGTCTTGAACTCC

2) fastx_trimmer, 截取fastq 序列, 指定保留序列的起始位置和终止位置, 

命令:将序列截成10bp长

fastx_trimmer -f 1 -l 10 -i test.fq -o test.trim.fq

输出内容为:

cat test.trim.fq 
@FC12044_91407_8_200_406_24
NTTAGCTCCC
+FC12044_91407_8_200_406_24
SXXTXXXXXX
@FC12044_91407_8_200_720_610
CTCTGTGGCA
+FC12044_91407_8_200_720_610
OXXXXXXXXX
@FC12044_91407_8_200_345_133
GATTTTTTAA
+FC12044_91407_8_200_345_133
OQTOOSFORT
@FC12044_91407_8_200_106_131
GTTGCCCAGG
+FC12044_91407_8_200_106_131
XXXXXXXXXX

3) fastq_renamer

命令:重命名序列标识符, 可以将其用编号代替

fastx_renamer -n COUNT -i test.fq -o test.renamer.fq

输出内容为:

cat test.renamer.fq 
@1
NTTAGCTCCCACCTTAAGATGTTTA
+1
SXXTXXXXXXXXXTTSUXSSXKTMQ
@2
CTCTGTGGCACCCCATCCCTCACTT
+2
OXXXXXXXXXXXXXXXXXTSXQTXU
@3
GATTTTTTAACAATAAACGTACATA
+3
OQTOOSFORTFFFIIOFFFFFFFFF
@4
GTTGCCCAGGCTCGTCTTGAACTCC
+4
XXXXXXXXXXXXXXSXXXXISTXQS

4) fasta_formatter, 设置每行最大字符数, 将fasta 文件格式化 

命令:将每行允许的字符设置为10

fasta_formatter  -w 10 -i test.fa -o test.formatter.fa

输出内容为:

cat test.formatter.fa 
>FC12044_91407_8_200_720_610
CTCTGTGGCA
CCCCATCCCT
CACTT
>FC12044_91407_8_200_345_133
GATTTTTTAA
CAATAAACGT
ACATA
>FC12044_91407_8_200_106_131
GTTGCCCAGG
CTCGTCTTGA
ACTCC

5) fastq_masker, 根据碱基质量的阈值标记序列

命令:

fastq_masker -q 40 -i test.fq -o test.masker.fq

输出内容为:

cat test.masker.fq 
@FC12044_91407_8_200_406_24
NNNNNNNNNNNNNNNNNNNNNNNNN
+FC12044_91407_8_200_406_24
SXXTXXXXXXXXXTTSUXSSXKTMQ
@FC12044_91407_8_200_720_610
NNNNNNNNNNNNNNNNNNNNNNNNN
+FC12044_91407_8_200_720_610
OXXXXXXXXXXXXXXXXXTSXQTXU
@FC12044_91407_8_200_345_133
NNNNNNNNNNNNNNNNNNNNNNNNN
+FC12044_91407_8_200_345_133
OQTOOSFORTFFFIIOFFFFFFFFF
@FC12044_91407_8_200_106_131
NNNNNNNNNNNNNNNNNNNNNNNNN
+FC12044_91407_8_200_106_131
XXXXXXXXXXXXXXSXXXXISTXQS  

 

 

原文地址:https://www.cnblogs.com/xudongliang/p/5081518.html