Whole-Seq analysis


Demo Report

Fig1. Experiment pipeline of Genomic DNA sequencing

Genomic DNA is fragmented by sonication. Fragment DNA is excised in the suitable size. After purification with Qiagen kit, the fragment DNA is end-repaired and A-tailed using the polymerase activity of klenow fragment. Indexed adapters are then ligated to the DNA fragments by DNA ligase followed by performing PCR reaction of 10 to 18 cycles to enrich the adapter-modified DNA fragments. After validating the libraries by QPCR, Experion and Qubit, the library could be sequenced using Illumina HiSeq™ 2500.

First step in the trim process was converting the quality score (Q) to error probability. Next, for every base a new value was calculated:

This value would be negative for low quality bases, where the error probability was high. For every base, we calculated the running sum of this value. If the sum dropped below zero, it was set to zero. The part of the sequence between the first positive value of the running sum and the highest value of the running sum was retained. Everything before and after this region was trimmed off. In addition, if the read length was shorter than 35bp, the read would also be discarded.

Table 1.Sequencing Summary

Biological Sample-1

Biological Sample-2

Biological Sample-3

Total read

63862713

153548220

103168428

Original read length

126

126

126

Total base

7842564582

21484550110

13223284292

Total read after QT

61949948

153481468

103118442

Average length after trimming

125.48

125.42

125.86

Total base after QT

7921673139

20452781629

13060106517

Percentage trimming

99.75%

99.64%

99.89%

 

 

The analysis follows the best practices as recommended by the Broad Institute. Reads sequence were aligned by bwa aligner to human reference genome (GRCh37) (Li H, 2009). The exome SNP calls were produced using the GATK SNP calling pipeline (DePristo M., 2010).

 

 

Figure 3. Distribution of variants

Variant annotation was carried out using Variant Effect Predictor to add information such as what gene the variant is in, the consequence of the mutation (nonsynonymous, nonsense, etc.) and information from databases such as PolyPhen2 , SIFT , dbSNP , and COSMIC (McLaren W, et al., 2010; Adzhubei IA et al., 2010; Kumar P et al., 2009; Sherry ST et al., 1999; Forbes SA et al.,2011).

 

Figure 4. Summary of calculated variation consequences

Figure 5. Summary of SIFT prediction

Figure 6. Summary of PolyPhen prediction

  1. Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics, 25:1754-60.
  2. DePristo M, et al., (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics. 43:491-498
  3. McLaren W et al., (2010). Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26(16):2069-70  doi:10.1093/bioinformatics/btq330
  4. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, et al. (2010) A method and server for predicting damaging missense mutations. Nature Methods 7: 248–249. doi: 10.1038/nmeth0410-248
  5. Kumar P, Henikoff S, Ng PC (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature Protocols 4: 1073–1081. doi: 10.1038/nprot.2009.86
  6. Sherry ST, Ward M, Sirotkin K (1999) dbSNP - Database for Single Nucleotide Polymorphisms and Other Classes of Minor Genetic Variation. Genome Research 9: 677–679.
  7. Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Research 39: D945–D950. doi: 10.1093/nar/gkq929

 

  • analysis/variant_call_add_annotation.xlsx