Development of the variant calling algorithm, ADIScan, and its use to estimate discordant sequences …
페이지 정보작성자 관리자 작성일2018-06-11 조회18,605회
Yangrae Cho, Sunho Lee, Jong Hui Hong, Byong Joon Kim, Wonn-Young Hong, Jongcheol Jung , Hyang Burm Lee , Joohon Sung, Han-Na Kim, Hyung-Lae Kim, Jongsun Jung
       Syntekabio Incorporated, Techno-2ro B-512, Yuseong-gu, Daejeon, 34025, Republic of Korea.
 DFTBA, CALS, Chonnam National University, Gwangju 61186, Republic of Korea
 Complex Disease and Genome Epidemiology Branch, Department of Epidemiology, School of Public Health, Seoul National University, Seoul 08826, Republic of Korea
  Department of Biochemistry, School of Medicine, Ewha Woman's University, Seoul 07985, Republic of Korea
Calling variants from next-generation sequencing (NGS) data or discovering discordant sequences between two NGS data sets is challenging. We developed a computer algorithm, ADIScan1, to call variants by comparing the fractions of allelic reads in a tester to the universal reference genome. We then created ADIScan2 by modifying the algorithm to directly compare two sets of NGS data and predict discordant sequences between two testers. ADIScan1 detected >99.7% of variants called by GATK with an additional 724 393 SNVs. ADIScan2 identified ∼500 candidates of discordant sequences in each of two pairs of the monozygotic twins. About 200 of these candidates were included in the ∼2800 predicted by VarScan2. We verified 66 true discordant sequences among the candidates that ADIScan2 and VarScan2 exclusively predicted. ADIScan2 detected many discordant sequences overlooked by VarScan2 and Mutect, which specialize in detecting low frequency mutations in genetically heterogeneous cancerous tissues. Numbers of verified sequences alone were >5 times more than expected based on recently estimated mutation rates from whole genome sequences. Estimated post-zygotic mutation rates were 1.68 × 10−7 in this study. ADIScan1 and 2 would complement existing tools in screening causative mutations of diverse genetic diseases and comparing two sets of genome sequences, respectively.