A 1204-single nucleotide polymorphism and insertion–deletion polymorphism panel for massively parallel sequencing analysis of DNA mixtures

Hsiao-Lin Hwa, Wan-Chia Chung, Pei-Lung Chen, Chih-Peng Lin, Huei-Ying Li, Hsiang-I Yin, James Chun-I Lee.

Journal: Forensic Science International: Genetics (January 2018)

Massively parallel sequencing (MPS) technology enables the simultaneous analysis of a huge number of single nucleotide polymorphisms (SNPs) and insertion–deletion polymorphisms (indels). MPS also enables the detection of the alleles of minor contributors in a highly unbalanced DNA mixture. In this study, we established a 1204-marker panel optimized for MPS consisting of 987 autosomal markers (964 SNPs and 23 indels), 27 X-chromosome SNPs, 61 Y-chromosome markers (56 SNPs and 5 indels), and 129 mitochondrial SNPs. The DNA samples of six unrelated individuals (two men and four women), 26 nondegraded DNA mixtures (with minor to major ratios of 1:29, 1:39, 1:79, and 1:99), and eight highly artificially degraded DNA mixtures (with minor to major ratios of 1:29, 1:39, 1:79, and 1:99) were analyzed through MPS by using the panel. A scoring system was developed to determine the minor contributors in DNA mixtures based on the genotypes identified using MPS. The genotypes of the 1204 markers were successfully profiled through MPS by using the custom-designed panel. The efficiency of MPS for analyzing these highly degraded samples was lower than that for analyzing nondegraded samples. All minor contributors in the 26 nondegraded and 8 degraded DNA mixtures were accurately assigned using this scoring system based on 964 autosomal SNPs. An association between the observed reads ratio and theoretical ratio of the minor component was noted for nondegraded mixtures. In conclusion, we established a 1204-marker individual identification panel for MPS that successfully analyzed autosomal, X-chromosome, Y-chromosome, and mitochondrial SNPs and indels simultaneously. In combination with the newly developed scoring system, the panel can accurately identify minor contributors in nondegraded and highly degraded DNA mixtures.

Identification of a c.544C>T mutation in WDR34 as a deleterious recessive allele of short rib-polydactyly syndrome

Shu-Han You, Yun-Shien Lee, Chueh-Pai Lee, Chih-Peng Lin, Chiao-Yun Lin, Chia-Lung Tsai, Yao-Lung Chang, Po-Jen Cheng, Tzu-Hao Wang, Shuenn-Dyh Chang

Journal: Taiwanese Journal of Obstetrics & Gynecology (December 2017)


Single-nucleotide polymorphism (SNP) microarrays and whole-exome sequencing (WES) are tools to precisely diagnose rare autosomal recessive (AR) diseases. In this study, SNP chip and WES were used to identify a mutated location in WDR34 in a baby born to consanguineous parents.

Case report:

The baby, born at 36 gestational weeks had a small thoracic cage, symmetric short proximal bones, and polydactyly. Radiography showed short ribs with reduced lung volume and pulmonary opacities, compatible with asphyxiating thoracic dystrophy or short rib-polydactyly syndrome (SRPS). At 4 months of age, she died of pulmonary hypoplasia and sepsis. SNP microarray and evaluation tool confirmed WDR34 as the candidate gene. WES detected an AR mutation at c.554C > T [p.Arg182Trp] in WDR34.


This study was the first to identify c.544C > T [p.Arg182Trp] mutation in WDR34 in a patient with SRPS. According to the database, the homozygous mutation of c.544C > T in WDR34 was deleterious and the prevalence of heterozygous mutation was relatively higher in Asian population. More studies of this mutation in patients with SRPS are required.

Hepatocellular carcinoma-associated single-nucleotide variants and deletions identified by the use of genome-wide high-throughput analysis of hepatitis B virus.

Liu WC, Wu IC, Lee YC, Lin CP, Cheng JH, Lin YJ, Yen CJ, Cheng PN, Li PF, Cheng YT, Cheng PW, Sun KT, Yan SL, Lin JJ, Yang JC, Chang KC, Ho CH, Tseng VS, Chang BC, Wu JC, Chang TT.

Journal: J Pathol. (2017 Jul 11)

This study investigated hepatitis B virus (HBV) single-nucleotide variants (SNVs) and deletion mutations linked with hepatocellular carcinoma (HCC). Ninety-three HCC patients and 108 non-HCC patients were enrolled for HBV genome-wide next-generation sequencing (NGS) analysis. A systematic literature review and a meta-analysis were performed to validate NGS-defined HCC-associated SNVs and deletions. The experimental results identified 60 NGS-defined HCC-associated SNVs, including 41 novel SNVs, and their pathogenic frequencies. Each SNV was specific for either genotype B (n = 24) or genotype C (n = 34), except for nt53C, which was present in both genotypes. The pathogenic frequencies of these HCC-associated SNVs showed a distinct U-shaped distribution pattern. According to the meta-analysis and literature review, 167 HBV variants from 109 publications were categorized into four levels (A-D) of supporting evidence that they are associated with HCC. The proportion of NGS-defined HCC-associated SNVs among these HBV variants declined significantly from 75% of 12 HCC-associated variants by meta-analysis (Level A) to 0% of 10 HCC-unassociated variants by meta-analysis (Level D) (P < 0.0001). PreS deletions were significantly associated with HCC, in terms of deletion index, for both genotypes B (P = 0.030) and C (P = 0.049). For genotype C, preS deletions involving a specific fragment (nt2977-3013) were significantly associated with HCC (HCC versus non-HCC, 6/34 versus 0/32, P = 0.025). Meta-analysis of preS deletions showed significant association with HCC (summary odds ratio 3.0; 95% confidence interval 2.3-3.9). Transfection of Huh7 cells showed that all of the five novel NGS-defined HCC-associated SNVs in the small surface region influenced hepatocarcinogenesis pathways, including endoplasmic reticulum-stress and DNA repair systems, as shown by microarray, real-time polymerase chain reaction and western blot analysis. Their carcinogenic mechanisms are worthy of further research. Copyright © 2017 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.

Deguelin exerts potent nematocidal activity via the mitochondrial respiratory chain.

Preston S, Korhonen PK, Mouchiroud L, Cornaglia M, McGee SL, Young ND, Davis RA, Crawford S, Nowell C, Ansell BRE, Fisher GM, Andrews KT, Chang BCH, Gijs MAM, Sternberg PW, Auwerx J, Baell J, Hofmann A, Jabbar A, Gasser RB.

Journal: FASEB J. (2017 Jul 7)

As a result of limited classes of anthelmintics and an over-reliance on chemical control, there is a great need to discover new compounds to combat drug resistance in parasitic nematodes. Here, we show that deguelin, a plant-derived rotenoid, selectively and potently inhibits the motility and development of nematodes, which supports its potential as a lead candidate for drug development. Furthermore, we demonstrate that deguelin treatment significantly increases gene transcription that is associated with energy metabolism, particularly oxidative phosphorylation and mitoribosomal protein production before inhibiting motility. Mitochondrial tracking confirmed enhanced oxidative phosphorylation. In accordance, real-time measurements of oxidative phosphorylation in response to deguelin treatment demonstrated an immediate decrease in oxygen consumption in both parasitic (Haemonchus contortus) and free-living (Caenorhabditis elegans) nematodes. Consequently, we hypothesize that deguelin is exerting its toxic effect on nematodes as a modulator of oxidative phosphorylation. This study highlights the dynamic biologic response of multicellular organisms to deguelin perturbation.-Preston, S., Korhonen, P. K., Mouchiroud, L., Cornaglia, M., McGee, S. L., Young, N. D., Davis, R. A., Crawford, S., Nowell, C., Ansell, B. R. E., Fisher, G. M., Andrews, K. T., Chang, B. C. H., Gijs, M. A. M., Sternberg, P. W., Auwerx, J., Baell, J., Hofmann, A., Jabbar, A., Gasser, R. B. Deguelin exerts potent nematocidal activity via the mitochondrial respiratory chain.

Whipworm kinomes reflect a unique biology and adaptation to the host animal.

Stroehlein AJ, Young ND, Korhonen PK, Chang BCH, Nejsum P, Pozio E, La Rosa G, Sternberg PW, Gasser RB.

Journal: Int J Parasitol.  (2017 Jun 10)

Roundworms belong to a diverse phylum (Nematoda) which is comprised of many parasitic species including whipworms (genus Trichuris). These worms have adapted to a biological niche within the host and exhibit unique morphological characteristics compared with other nematodes. Although these adaptations are known, the underlying molecular mechanisms remain elusive. The availability of genomes and transcriptomes of some whipworms now enables detailed studies of their molecular biology. Here, we defined and curated the full complement of an important class of enzymes, the protein kinases (kinomes) of two species of Trichuris, using an advanced and integrated bioinformatic pipeline. We investigated the transcription of Trichuris suis kinase genes across developmental stages, sexes and tissues, and reveal that selectively transcribed genes can be linked to central roles in developmental and reproductive processes. We also classified and functionally annotated the curated kinomes by integrating evidence from structural modelling and pathway analyses, and compared them with other curated kinomes of phylogenetically diverse nematode species. Our findings suggest unique adaptations in signalling processes governing worm morphology and biology, and provide an important resource that should facilitate experimental investigations of kinases and the biology of signalling pathways in nematodes.

A panel of 130 autosomal single-nucleotide polymorphisms for ancestry assignment in five Asian populations and in Caucasians.

Hwa HL, Lin CP, Huang TY, Kuo PH, Hsieh WH, Lin CY, Yin HI, Tseng LH, Lee JC.

Journal: Forensic Sci Med Pathol. (2017 Jun)

Ancestry informative single-nucleotide polymorphism (AISNP) panels for differentiating between East and Southeast Asian populations are scarce. This study aimed to identify AISNPs for ancestry assignment of five East and Southeast Asian populations, and Caucasians. We analyzed 145 autosomal SNPs of the 627 DNA samples from individuals of six populations (234 Taiwanese Han, 91 Filipinos, 79 Indonesians, 60 Thais, 71 Vietnamese, and 92 Caucasians) using arrays. The multiple logistic regression model and a multi-tier approach were used for ancestry classification. We observed that 130 AISNPs were effective for classifying the ethnic origins with fair accuracy. Among the 130 AISNPs, 122 were useful for stratification between these five Asian populations and 64 were effective for differentiating between Caucasians and these Asian populations. For differentiation between Caucasians and Asians, an accuracy rate of 100% was achieved in these 627 subjects with 50 optimal AISNPs among the 64 effective SNPs. For classification of the five Asian populations, the accuracy rates of ancestry inference using 20 to 57 SNPs for each of the two Asian populations ranged from 74.1% to 100%. Another 14 degraded DNA samples with incomplete profiling were analyzed, and the ancestry of 12 (85.7%) of those subjects was accurately assigned. We developed a 130-AISNP panel for ethnic origin differentiation between the five East and Southeast Asian populations and Caucasians. This AISNP set may be helpful for individual ancestral assignment of these populations in forensic casework.

The genome and transcriptome of Phalaenopsis yield insights into floral organ development and flowering regulation

Jian-Zhi Huang, Chih-Peng Lin, Ting-Chi Cheng, Ya-Wen Huang, Yi-Jung Tsai, Shu-Yun Cheng, Yi-Wen Chen, Chueh-Pai Lee, Wan-Chia Chung, Bill Chia-Han Chang , Shih-Wen Chin, Chen-Yu Lee, Fure-Chyi Chen.

Jornal:PEER-REVIEWED (2016)

The Phalaenopsis orchid is an important potted flower of high economic value around the world. We report the 3.1 Gb draft genome assembly of an important winter flowering Phalaenopsis ‘KHM190’ cultivar. We generated 89.5 Gb RNA-seq and 113 million sRNA-seq reads to use these data to identify 41,153 protein-coding genes and 188 miRNA families. We also generated a draft genome for Phalaenopsis pulcherrima ‘B8802,’ a summer flowering species, via resequencing. Comparison of genome data between the two Phalaenopsis cultivars allowed the identification of 691,532 single-nucleotide polymorphisms. In this study, we reveal that the key role of PhAGL6b in the regulation of labellum organ development involves alternative splicing in the big lip mutant. Petal or sepal overexpressing PhAGL6b leads to the conversion into a lip-like structure. We also discovered that the gibberellin pathway that regulates the expression of flowering time genes during the reproductive phase change is induced by cool temperature. Our work thus depicted a valuable resource for the flowering control, flower architecture development, and breeding of the Phalaenopsis orchids.

Phylogenomic and biogeographic reconstruction of the Trichinella complex.

Pasi K. Korhonen, Edoardo Pozio, Giuseppe La Rosa, Bill C. H. Chang, Anson V. Koehler,Eric P. Hoberg, Peter R. Boag, Patrick Tan, Aaron R. Jex, Andreas Hofmann, Paul W. Sternberg, Neil D. Young & Robin B. Gasser.

Jornal:Nature Communications (2016)

Trichinellosis is a globally important food-borne parasitic disease of humans caused by roundworms of the Trichinella complex. Extensive biological diversity is reflected in substantial ecological and genetic variability within and among Trichinella taxa, and major controversy surrounds the systematics of this complex. Here we report the sequencing and assembly of 16 draft genomes representing all 12 recognized Trichinella species and genotypes, define protein-coding gene sets and assess genetic differences among these taxa. Using thousands of shared single-copy orthologous gene sequences, we fully reconstruct, for the first time, a phylogeny and biogeography for the Trichinella complex, and show that encapsulated and non-encapsulated Trichinella taxa diverged from their most recent common ancestor ~21 million years ago (mya), with taxon diversifications commencing ~10−7 mya.

A De Novo Floral Transcriptome Reveals Clues into Phalaenopsis Orchid Flower Development

Jian-Zhi Huang, Chih-Peng Lin,Ting-Chi Cheng,Bill Chia-Han ChangShu-Yu Cheng,Yi-Wen Chen,Chen-Yu Lee, Shih-Wen Chin, and Fure-Chyi Chen

Jornal:PLoS One (2015)

Phalaenopsis has a zygomorphic floral structure, including three outer tepals, two lateral inner tepals and a highly modified inner median tepal called labellum or lip; however, the regulation of its organ development remains unelucidated. We generated RNA-seq reads with the Illumina platform for floral organs of the Phalaenopsis wild-type and peloric mutant with a lip-like petal. A total of 43,552 contigs were obtained after de novo assembly. We used differentially expressed gene profiling to compare the transcriptional changes in floral organs for both the wild-type and peloric mutant. Pair-wise comparison of sepals, petals and labellum between peloric mutant and its wild-type revealed 1,838, 758 and 1,147 contigs, respectively, with significant differential expression. PhAGL6a (CUFF.17763), PhAGL6b (CUFF.17763.1), PhMADS1 (CUFF.36625.1), PhMADS4 (CUFF.25909) and PhMADS5 (CUFF.39479.1) were significantly upregulated in the lip-like petal of the peloric mutant. We used real-time PCR analysis of lip-like petals, lip-like sepals and the big lip of peloric mutants to confirm the five genes’ expression patterns. PhAGL6a, PhAGL6b and PhMADS4 were strongly expressed in the labellum and significantly upregulated in lip-like petals and lip-like sepals of peloric-mutant flowers. In addition, PhAGL6b was significantly downregulated in the labellum of the big lip mutant, with no change in expression of PhAGL6a. We provide a comprehensive transcript profile and functional analysis of Phalaenopsis floral organs. PhAGL6a PhAGL6b, and PhMADS4 might play crucial roles in the development of the labellum in Phalaenopsis. Our study provides new insights into how the orchid labellum differs and why the petal or sepal converts to a labellum in Phalaenopsis floral mutants. 

Aligning to the sample-specific reference sequence to optimize the accuracy of next-generation sequencing analysis for hepatitis B virus. 

Wen-Chun Liu,Chih-Peng Lin,Chun-Pei Cheng,Cheng-Hsun Ho,Kuo-Lun Lan,Ji-Hong Cheng,Chia-Jui Yen,Pin-Nan Cheng,I-Chin Wu,I-Chen Li,Bill Chia-Han Chang,Vincent S. Tseng,Yen-Cheng Chiu,Ting-Tsung Chang.

Jornal:Hepatology International (2015)


Hepatitis B virus (HBV) quasispecies are crucial in the pathogenesis of chronic liver disease. Next-generation sequencing (NGS) is powerful for identifying viral quasispecies. To improve mapping quality and single nucleotide variant (SNV) calling accuracy in the NGS analysis of HBV, we compared different mapping references, including the sample-specific reference sequence, same genotype sequences and different genotype sequences, according to the sample.


Real Illumina HBV datasets from 86 patients, and simulated datasets from 158 HBV strains in the GenBank database, were used to assess mapping quality. SNV calling accuracy was evaluated using different mapping references to align Real Illumina datasets from a single HBV clone.


Using the sample-specific reference sequence as a mapping reference produced the largest number of mappable reads and coverages. With a different genotype mapping reference, the consensus sequence derived from the Real Illumina datasets of the single HBV clone showed 21 false SNV callings in polymerase and surface genes, the regions most divergent between the mapping reference and this HBV clone. A ~6 % coverage of most of these false SNVs was yielded even with a same genotype mapping reference, but none with the sample-specific reference sequence.


Using sample-specific reference sequences as a mapping reference in NGS analysis optimized mapping quality and the SNV calling accuracy for HBV quasispecies.

Whole-Genome Sequence of an Epidemic Strain of Burkholderia pseudomallei vgh07 in Taiwan. 

Yao-Shen Chen, Hsi-Hsun Lin, Pei-Tan Hsueh, Pei-Ju Liu, Wen-Fan Ni, Wan-Chia ChungChih-Peng Lin, and Ya-Lei Chen

Jornal:Genome Announc. (2015)

Here, we report the complete genome sequence of B. pseudomallei vgh07. This is an epidemic strain that was isolated from a melioidosis patient with arthro-ost eomyelitis in Taiwan.

Evaluation and Application of the Strand-Specific Protocol for Next-Generation Sequencing.   

Kuo-Wang Tsai, Bill Chang, Cheng-Tsung Pan, Wei-Chen Lin, Ting-Wen Chen, and Sung-Chou Li

Jornal:BioMed Research International (2015)

Next-generation sequencing (NGS) has become a powerful sequencing tool, applied in a wide range of biological studies. However, the traditional sample preparation protocol for NGS is non-strand-specific (NSS), leading to biased estimates of expression for transcripts overlapped at the antisense strand. Strand-specific (SS) protocols have recently been developed. In this study, we prepared the same RNA sample by using the SS and NSS protocols, followed by sequencing with Illumina HiSeq platform. Using real-time quantitative PCR as a standard, we first proved that the SS protocol more precisely estimates gene expressions compared with the NSS protocol, particularly for those overlapped at the antisense strand. In addition, we also showed that the sequence reads from the SS protocol are comparable with those from conventional NSS protocols in many aspects. Finally, we also mapped a fraction of sequence reads back to the antisense strand of the known genes, originally without annotated genes located. Using sequence assembly and PCR validation, we succeeded in identifying and characterizing the novel antisense genes. Our results show that the SS protocol performs more accurately than the traditional NSS protocol and can be applied in future studies.

ViQuaS: An improved reconstruction pipeline for viral quasispecies spectra generated by next-generation sequencing. 

Duleepa Jayasundara, I. Saeed, Suhinthan Maheswararajah, B.C. Chang, S-L. Tang and Saman K. Halgamuge

Jornal:Bioinformatics (2015)


The combined effect of a high replication rate and the low fidelity of the viral polymerase in most RNA viruses and some DNA viruses results in the formation of a viral quasispecies. Uncovering information about quasispecies populations significantly benefits the study of disease progression, antiviral drug design, vaccine design and viral pathogenesis. We present a new analysis pipeline called ViQuaS for viral quasispecies spectrum reconstruction using short next-generation sequencing reads. ViQuaS is based on a novel reference-assisted de novo assembly algorithm for constructing local haplotypes. A significantly extended version of an existing global strain reconstruction algorithm is also used.


Benchmarking results showed that ViQuaS outperformed three other previously published methods named ShoRAH, QuRe and PredictHaplo, with improvements of at least 3.1-53.9% in recall, 0-12.1% in precision and 0-38.2% in F-score in terms of strain sequence assembly and improvements of at least 0.006-0.143 in KL-divergence and 0.001-0.035 in root mean-squared error in terms of strain frequency estimation, over the next-best algorithm under various simulation settings. We also applied ViQuaS on a real read set derived from an in vitro human immunodeficiency virus (HIV)-1 population, two independent datasets of foot-and-mouth-disease virus derived from the same biological sample and a real HIV-1 dataset and demonstrated better results than other methods available.

Mitochondrial genomes of Trichinella species and genotypes - a basis for diagnosis, and systematic and epidemiological explorations. 

Namitha Mohandas, Edoardo Pozio, Giuseppe La Rosa, Pasi K. Korhonen, Neil D. Young, Anson V. Koehler, Ross S. Hall, Paul W. Sternberg, Peter R. Boag, Aaron R. Jex, Bill Chang, Robin B. Gasser 

Jornal:Int J Parasitol. (2014)

 In the present study we sequenced or re-sequenced, assembled and annotated 15 mitochondrial genomes representing the 12 currently recognised taxa of Trichinella using a deep sequencing-coupled approach. We then defined and compared the gene order in individual mitochondrial genomes (14 to 17.7 kb), evaluated genetic differences among species/genotypes and re-assessed the relationships among these taxa using the mitochondrial nucleic acid or amino acid sequence data sets. In addition, a rich source of mitochondrial genetic markers was defined that could be used in future systematic, epidemiological and population genetic studies of Trichinella. The sequencing-bioinformatic approach employed herein should be applicable to a wide range of eukaryotic parasites.

Short-Term Exposure to Fluconazole Induces Chromosome Loss in Candida albicans: An Approach to Produce Haploid Cells. 

Fang-Mo Chang, Tsong-Yih Ou, Wei-Ning Cheng, Ming-Li Chou, Kai-Cheng Lee, Yi-Ping Chin, Chih-Peng LinKai-Di Chang, Che-Tong Lin, Ching-Hua Su

Jornal:Fungal Genet Biol. (2014)

Candida albicans is considered to be an obligate diploid fungus. Here, we describe an approach to isolate aneuploids or haploids induced by the short-term (12-16 h) exposure of diploid reference strains SC5314 and CAI4 to the most commonly used antifungal drug, fluconazole, followed by repeated single-cell separation among small morphologically distinct colonies in the inhibition zone. The isolated strains had altered cell morphology and LOH events in the MTL and other marker alleles of the analyzed loci at 8 chromosomes of C. albicans with decreased DNA content. The present study employed next-generation sequencing (NGS) combined flow cytometry analysis of the DNA content to analyze the haploid, autodiploid, and aneuploid strains that arose from the fluconazole treatment instead of using the conventional single nucleotide polymorphism/comparative genome hybridization (SNP/CGH) method. A multiple-alignment tool was also developed based on sequenced data from NGS to establish haplotype mapping for each chromosome of the selected strains. These findings revealed that C. albicans experiences 'concerted chromosome loss' to form strains with homozygous alleles and that it even has a haploid status after short-term exposure to fluconazole. Additionally, we developed a new platform to analyze chromosome copy number using NGS.

AlgaePath: comprehensive analysis of metabolic pathways using transcript abundance data from next-generation sequencing in green algae. 

Han-Qin Zheng,Yi-Fan Chiang-Hsieh, Chia-Hung Chien, Bo-Kai Justin Hsu, Tsung-Lin Liu,Ching-Nen Nathan Chen and Wen-Chi Chang

Jornal: BMC Genomics (2014)


Algae are important non-vascular plants that have many research applications, including high species diversity, biofuel sources, and adsorption of heavy metals and, following processing, are used as ingredients in health supplements. The increasing availability of next-generation sequencing (NGS) data for algae genomes and transcriptomes has made the development of an integrated resource for retrieving gene expression data and metabolic pathway essential for functional analysis and systems biology. In a currently available resource, gene expression profiles and biological pathways are displayed separately, making it impossible to easily search current databases to identify the cellular response mechanisms. Therefore, in this work the novel AlgaePath database was developed to retrieve transcript abundance profiles efficiently under various conditions in numerous metabolic pathways.


AlgaePath is a web-based database that integrates gene information, biological pathways, and NGS datasets for the green algae Chlamydomonas reinhardtii and Neodesmus sp. UTEX 2219-4. Users can search this database to identify transcript abundance profiles and pathway information using five query pages (Gene Search, Pathway Search, Differentially Expressed Genes (DEGs) Search, Gene Group Analysis, and Co-expression Analysis). The transcript abundance data of 45 and four samples from C. reinhardtii and Neodesmus sp. UTEX 2219-4, respectively, can be obtained directly on pathway maps. Genes that are differentially expressed between two conditions can be identified using Folds Search. The Gene Group Analysis page includes a pathway enrichment analysis, and can be used to easily compare the transcript abundance profiles of functionally related genes on a map. Finally, the Co-expression Analysis page can be used to search for co-expressed transcripts of a target gene. The results of the searches will provide a valuable reference for designing further experiments and for elucidating critical mechanisms from high-throughput data.


AlgaePath is an effective interface that can be used to clarify the transcript response mechanisms in different metabolic pathways under various conditions. Importantly, AlgaePath can be mined to identify critical mechanisms based on high-throughput sequencing. To our knowledge, AlgaePath is the most comprehensive resource for integrating numerous databases and analysis tools in algae. The system can be accessed freely online at http://algaepath.itps.ncku.edu.tw.

De Novo Transcriptome Sequence Assembly from Coconut Leaves and Seeds with a Focus on Factors Involved in RNA-Directed DNA Methylation. 

Ya-Yi Huang, Chueh-Pai Lee, Jason L. Fu, Bill Chia-Han Chang, Antonius J.M. Matzke and Marjori Matzke

Jornal:Genes Genomes Genetics (Bethesda) (2014)

Coconut palm (Cocos nucifera) is a symbol of the tropics and a source of numerous edible and nonedible products of economic value. Despite its nutritional and industrial significance, coconut remains under-represented in public repositories for genomic and transcriptomic data. We report de novo transcript assembly from RNA-seq data and analysis of gene expression in seed tissues (embryo and endosperm) and leaves of a dwarf coconut variety. Assembly of 10 GB sequencing data for each tissue resulted in 58,211 total unigenes in embryo, 61,152 in endosperm, and 33,446 in leaf. Within each unigene pool, 24,857 could be annotated in embryo, 29,731 could be annotated in endosperm, and 26,064 could be annotated in leaf. A KEGG analysis identified 138, 138, and 139 pathways, respectively, in transcriptomes of embryo, endosperm, and leaf tissues. Given the extraordinarily large size of coconut seeds and the importance of small RNA-mediated epigenetic regulation during seed development in model plants, we used homology searches to identify putative homologs of factors required for RNA-directed DNA methylation in coconut. The findings suggest that RNA-directed DNA methylation is important during coconut seed development, particularly in maturing endosperm. This dataset will expand the genomics resources available for coconut and provide a foundation for more detailed analyses that may assist molecular breeding strategies aimed at improving this major tropical crop.

The Complete Plastid Genome Sequence of Madagascar Periwinkle Catharanthus roseus (L.) G. Don: Plastid Genome Evolution, Molecular Marker Identification, and Phylogenetic Implications in Asterids 

Chuan Ku, Wan-Chia Chung, Ling-Ling Chen, Chih-Horng Kuo

Jornal:PLoS One (2013)

The Madagascar periwinkle (Catharanthusroseus in the family Apocynaceae) is an important medicinal plant and is the source of several widely marketed chemotherapeutic drugs. It is also commonly grown for its ornamental values and, due to ease of infection and distinctiveness of symptoms, is often used as the host for studies on phytoplasmas, an important group of uncultivated plant pathogens. To gain insights into the characteristics of apocynaceous plastid genomes (plastomes), we used a reference-assisted approach to assemble the complete plastome of C. roseus, which could be applied to other C. roseus-related studies. The C. roseus plastome is the second completely sequenced plastome in the asterid order Gentianales. We performed comparative analyses with two other representative sequences in the same order, including the complete plastome of Coffeaarabica (from the basal Gentianales family Rubiaceae) and the nearly complete plastome of Asclepiassyriaca (Apocynaceae). The results demonstrated considerable variations in gene content and plastome organization within Apocynaceae, including the presence/absence of three essential genes (i.e., accD, clpP, and ycf1) and large size changes in non-coding regions (e.g., rps2-rpoC2 and IRb-ndhF). To find plastome markers of potential utility for Catharanthus breeding and phylogenetic analyses, we identified 41 C. roseus-specific simple sequence repeats. Furthermore, five intergenic regions with high divergence between C. roseus and three other euasterids I taxa were identified as candidate markers. To resolve the euasterids I interordinal relationships, 82 plastome genes were used for phylogenetic inference. With the addition of representatives from Apocynaceae and sampling of most other asterid orders, a sister relationship between Gentianales and Solanales is supported.

Comparative Analysis of the Peanut Witches'-Broom Phytoplasma Genome Reveals Horizontal Transfer of Potential Mobile Units and Effectors

Wan-Chia Chung, Ling-Ling Chen, Wen-Sui Lo, Chan-Pin Lin, Chih-Horng Kuo

Jornal:PLoS One (2013)

Phytoplasmas are a group of bacteria that are associated with hundreds of plant diseases. Due to their economical importance and the difficulties involved in the experimental study of these obligate pathogens, genome sequencing and comparative analysis have been utilized as powerful tools to understand phytoplasma biology. To date four complete phytoplasma genome sequences have been published. However, these four strains represent limited phylogenetic diversity. In this study, we report the shotgun sequencing and evolutionary analysis of a peanut witches'-broom (PnWB) phytoplasma genome. The availability of this genome provides the first representative of the 16SrII group and substantially improves the taxon sampling to investigate genome evolution. The draft genome assembly contains 13 chromosomal contigs with a total size of 562,473 bp, covering ∼90% of the chromosome. Additionally, a complete plasmid sequence is included. Comparisons among the five available phytoplasma genomes reveal the differentiations in gene content and metabolic capacity. Notably, phylogenetic inferences of the potential mobile units (PMUs) in these genomes indicate that horizontal transfer may have occurred between divergent phytoplasma lineages. Because many effectors are associated with PMUs, the horizontal transfer of these transposon-like elements can contribute to the adaptation and diversification of these pathogens. In summary, the findings from this study highlight the importance of improving taxon sampling when investigating genome evolution. Moreover, the currently available sequences are inadequate to fully characterize the pan-genome of phytoplasmas. Future genome sequencing efforts to expand phylogenetic diversity are essential in improving our understanding of phytoplasma evolution.

Complete Genome Sequence of Serratia marcescens WW4.  

Wan-Chia Chung, Ling-Ling Chen, Wen-Sui Lo, Pei-An Kuo, Jenn Tu, Chih-Horng Kuo

Jornal:Genome Announc. (2013)

Serratia marcescens WW4 is a biofilm-forming bacterium isolated from paper machine aggregates. Under conditions of phosphate limitation, this bacterium exhibits intergeneric inhibition of Pseudomonas aeruginosa. Here, the complete genome sequence of S. marcescens WW4, which consists of one circular chromosome (5,241,455 bp) and one plasmid (pSmWW4; 3,248 bp), was determined.

Comparative genome analysis of Spiroplasma melliferum IPMB4A, a honeybee-associated bacterium.

Wen-Sui Lo, Ling-Ling Chen, Wan-Chia Chung, Gail E Gasparich, Chih-Horng Kuo

Jornal:BMC Genomics (2013)


The genus Spiroplasma contains a group of helical, motile, and wall-less bacteria in the class Mollicutes. Similar to other members of this class, such as the animal-pathogenic Mycoplasma and the plant-pathogenic 'Candidatus Phytoplasma', all characterized Spiroplasma species were found to be associated with eukaryotic hosts. While most of the Spiroplasma species appeared to be harmless commensals of insects, a small number of species have evolved pathogenicity toward various arthropods and plants. In this study, we isolated a novel strain of honeybee-associated S. melliferum and investigated its genetic composition and evolutionary history by whole-genome shotgun sequencing and comparative analysis with other Mollicutes genomes.


The whole-genome shotgun sequencing of S. melliferum IPMB4A produced a draft assembly that was ~1.1 Mb in size and covered ~80% of the chromosome. Similar to other Spiroplasma genomes that have been studied to date, we found that this genome contains abundant repetitive sequences that originated from plectrovirus insertions. These phage fragments represented a major obstacle in obtaining a complete genome sequence of Spiroplasma with the current sequencing technology. Comparative analysis of S. melliferum IPMB4A with other Spiroplasma genomes revealed that these phages may have facilitated extensive genome rearrangements in these bacteria and contributed to horizontal gene transfers that led to species-specific adaptation to different eukaryotic hosts. In addition, comparison of gene content with other Mollicutes suggested that the common ancestor of the SEM (Spiroplasma, Entomoplasma, and Mycoplasma) clade may have had a relatively large genome and flexible metabolic capacity; the extremely reduced genomes of present day Mycoplasma and 'Candidatus Phytoplasma' species are likely to be the result of independent gene losses in these lineages.


The findings in this study highlighted the significance of phage insertions and horizontal gene transfer in the evolution of bacterial genomes and acquisition of pathogenicity. Furthermore, the inclusion of Spiroplasma in comparative analysis has improved our understanding of genome evolution in Mollicutes. Future improvements in the taxon sampling of available genome sequences in this group are required to provide further insights into the evolution of these important pathogens of humans, animals, and plants.