The slides uploaded in Session V for "Variant Annotation and Prioritization" have now been updated to include instructions on how to create a database (slide 13). In class, we provided this for you in the interest of time, as you can imagine compiling that much information can take some time.

If you plan on trying this please note that you will need to re-create a snpEff annotated VCF.

snpEff -Xmx2G -i vcf -o vcf -classic -formatEff -dataDir /home/mm573/HBC-NGS/var-calling/reference/snpeff/ hg19  na12878_q20_annot.vcf  > na12878_new_snpEff.vcf


The additional parameters -classic and -formatEff are required because of version incompatibility. The new version of snpEff stores all annotation in the ANN field of your VCF INFO column. GEMINI has not been updated to handle the changes in output from the new snpEff and is expecting information to be store in a field called EFF. By adding the -classic and -formatEff the results are written using the old format with EFF.

To load your VCF into GEMINI:

gemini load -v na12878_new_snpEff.vcf -t snpEff na12878_new.db

For larger (full datasets) you will want to also add the parameter to specify cores for multi-threading:

--cores 8 


  • No labels