目录
简介
安装
call 变异
结果过滤
简介
VarScan采用了一种稳健的启发式/统计方法来调用满足读取深度、基本质量、变异等位基因频率和统计显著性所需阈值的变量。
It can be used to detect different types of variation:
Germline variants (SNPs an dindels) in individual samples or pools of samples.
Multi-sample variants (shared or private) in multi-sample datasets (with mpileup).
Somatic mutations, LOH events, and germline variants in tumor-normal pairs.
Somatic copy number alterations (CNAs) in tumor-normal exome data.
官网:http://varscan.sourceforge.net/germline-calling.html
该篇幅阐述尝试用VarScan call Somatic mutations(SNPs an dindels) in individual samples .可行待确定?
安装
直接下载下载jar文件即可使用, 地址:https://sourceforge.net/projects/varscan/files/
call 变异
input
VarScan2 的输入文件是由samtools mpileup生成mpileup文件。samtools mpileup其他参数可以根据需求调整,一般默认就好。
step 1 :Generate a pileup file# samtools mpileup -B -f [reference sequence] [BAM file] >myData.pileupsamtools mpileup -B -f Homo_sapiens_assembly19.fasta S2000332-L1.NRAS.C-T.gencore.sort.bam > S2000332-L1.NRAS.C-T.gencore.sort.pileup
Methods
VarScan2 提供三个命令进行call 变异, 包括:mpileup2snp/pileup2snp(单独call snp),mpileup2indel/pileup2indel (单独call indel) 和mpileup2cns/pileup2cns(同时call snp indel)。执行的命令如下:
java -jar VarScan.v2.4.4.jar mpileup2cns -hWarning: No p-value threshold provided, so p-values will not be calculatedMin coverage: 8Min reads2: 2Min var freq: 0.2Min avg qual: 15P-value thresh: 0.01USAGE: java -jar VarScan.jar mpileup2cns [pileup file] OPTIONSmpileup file - The SAMtools mpileup fileOPTIONS:--min-coverage Minimum read depth at a position to make a call [8]--min-reads2 Minimum supporting reads at a position to call variants [2]--min-avg-qual Minimum base quality at a position to count a read [15]--min-var-freq Minimum variant allele frequency threshold [0.01]--min-freq-for-hom Minimum frequency to call homozygote [0.75]--p-value Default p-value threshold for calling variants [99e-02]--strand-filter Ignore variants with >90% support on one strand [1]--output-vcf If set to 1, outputs in VCF format--vcf-sample-list For VCF output, a list of sample names in order, one per line--variants Report only variant (SNP/indel) positions [0]step 2 : call CNSjava -jar VarScan.v2.4.4.jar mpileup2cns \S2000332-L1.NRAS.C-T.gencore.sort.pileup \--min-var-freq 0.01 \--p-value 0.01 \--output-vcf 1 \--variants > S2000332-L1.NRAS.C-T.snv.vcf
结果过滤
从上所述及官网介绍,varscan2 call 变异的命令是比较简单的,只有包括2步骤:1,将bam 文件转换成pipeup 格式;2,将pipeup 格式作为varscan2 的输入进行变异检测,参数的设置的可以按需求修改,一般默认。
虽然call 变异过程比较简单,但call完VCF文件的大小的会比较大,突变内容很多。而过滤将会成为尤为重要的部分。如何在众多数据筛选出可靠的阳性突变?(在此之前了解vcf 每个字段的意思,vcf 可以在info format 列中获得基因型深度(genotype depth),位点质量得分,次等位基因频率和基因型总深度(genotype call depth))
1. 根据panel 确实筛选的变异范围
bedtools intersect -a NRAS.C-T.bed -b S2000332-L1.NRAS.C-T.cns.vcf -wb
2. 通用过滤,即根据
a. 突变点最小平均覆盖深度
b. 支持alt 最小覆盖深度
c. 链平衡(single-strands or bi-strands)
d. 支持alt 碱基的平均质量
e. VAF值, 突变频率
Min coverage: 10Min reads2: 2Min strands2: 1Min var freq: 0.2Min avg qual: 15P-value thresh: 0.1USAGE: java -jar VarScan.jar filter [variant file] OPTIONSvariant file - A file of SNPs or indelsOPTIONS:--min-coverage Minimum read depth at a position to make a call [10]--min-reads2 Minimum supporting reads at a position to call variants [2]--min-strands2 Minimum # of strands on which variant observed (1 or 2) [1]--min-avg-qual Minimum average base quality for variant-supporting reads [20]--min-var-freq Minimum variant allele frequency threshold [0.20]--p-value Default p-value threshold for calling variants [1e-01]--indel-file File of indels for filtering nearby SNPs--output-file File to contain variants passing filters
参考:
SNP Filtering Tutorial http://www.ddocent.com/filtering/
https://blog.csdn.net/u012110870/article/details/102804559
https://blog.csdn.net/u012110870/article/details/102804561?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_title-0&spm=1001.2101.3001.4242
https://blog.csdn.net/Gossie/article/details/109320960?utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromMachineLearnPai2%7Edefault-3.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromMachineLearnPai2%7Edefault-3.control




