The genome and transcriptome of Phalaenopsis yield insights into floral organ development and flowering regulation

Jian-Zhi Huang, Chih-Peng Lin, Ting-Chi Cheng, Ya-Wen Huang, Yi-Jung Tsai, Shu-Yun Cheng, Yi-Wen Chen, Chueh-Pai Lee, Wan-Chia Chung, Bill Chia-Han Chang , Shih-Wen Chin, Chen-Yu Lee, Fure-Chyi Chen.

Jornal:PEER-REVIEWED (2016)

The Phalaenopsis orchid is an important potted flower of high economic value around the world. We report the 3.1 Gb draft genome assembly of an important winter flowering Phalaenopsis ‘KHM190’ cultivar. We generated 89.5 Gb RNA-seq and 113 million sRNA-seq reads to use these data to identify 41,153 protein-coding genes and 188 miRNA families. We also generated a draft genome for Phalaenopsis pulcherrima ‘B8802,’ a summer flowering species, via resequencing. Comparison of genome data between the two Phalaenopsis cultivars allowed the identification of 691,532 single-nucleotide polymorphisms. In this study, we reveal that the key role of PhAGL6b in the regulation of labellum organ development involves alternative splicing in the big lip mutant. Petal or sepal overexpressing PhAGL6b leads to the conversion into a lip-like structure. We also discovered that the gibberellin pathway that regulates the expression of flowering time genes during the reproductive phase change is induced by cool temperature. Our work thus depicted a valuable resource for the flowering control, flower architecture development, and breeding of the Phalaenopsis orchids.

Phylogenomic and biogeographic reconstruction of the Trichinella complex.

Pasi K. Korhonen, Edoardo Pozio, Giuseppe La Rosa, Bill C. H. Chang, Anson V. Koehler,Eric P. Hoberg, Peter R. Boag, Patrick Tan, Aaron R. Jex, Andreas Hofmann, Paul W. Sternberg, Neil D. Young & Robin B. Gasser.

Jornal:Nature Communications (2016)

Trichinellosis is a globally important food-borne parasitic disease of humans caused by roundworms of the Trichinella complex. Extensive biological diversity is reflected in substantial ecological and genetic variability within and among Trichinella taxa, and major controversy surrounds the systematics of this complex. Here we report the sequencing and assembly of 16 draft genomes representing all 12 recognized Trichinella species and genotypes, define protein-coding gene sets and assess genetic differences among these taxa. Using thousands of shared single-copy orthologous gene sequences, we fully reconstruct, for the first time, a phylogeny and biogeography for the Trichinella complex, and show that encapsulated and non-encapsulated Trichinella taxa diverged from their most recent common ancestor ~21 million years ago (mya), with taxon diversifications commencing ~10−7 mya.

A De Novo Floral Transcriptome Reveals Clues into Phalaenopsis Orchid Flower Development

Jian-Zhi Huang, Chih-Peng Lin,Ting-Chi Cheng,Bill Chia-Han ChangShu-Yu Cheng,Yi-Wen Chen,Chen-Yu Lee, Shih-Wen Chin, and Fure-Chyi Chen

Jornal:PLoS One (2015)

Phalaenopsis has a zygomorphic floral structure, including three outer tepals, two lateral inner tepals and a highly modified inner median tepal called labellum or lip; however, the regulation of its organ development remains unelucidated. We generated RNA-seq reads with the Illumina platform for floral organs of the Phalaenopsis wild-type and peloric mutant with a lip-like petal. A total of 43,552 contigs were obtained after de novo assembly. We used differentially expressed gene profiling to compare the transcriptional changes in floral organs for both the wild-type and peloric mutant. Pair-wise comparison of sepals, petals and labellum between peloric mutant and its wild-type revealed 1,838, 758 and 1,147 contigs, respectively, with significant differential expression. PhAGL6a (CUFF.17763), PhAGL6b (CUFF.17763.1), PhMADS1 (CUFF.36625.1), PhMADS4 (CUFF.25909) and PhMADS5 (CUFF.39479.1) were significantly upregulated in the lip-like petal of the peloric mutant. We used real-time PCR analysis of lip-like petals, lip-like sepals and the big lip of peloric mutants to confirm the five genes’ expression patterns. PhAGL6a, PhAGL6b and PhMADS4 were strongly expressed in the labellum and significantly upregulated in lip-like petals and lip-like sepals of peloric-mutant flowers. In addition, PhAGL6b was significantly downregulated in the labellum of the big lip mutant, with no change in expression of PhAGL6a. We provide a comprehensive transcript profile and functional analysis of Phalaenopsis floral organs. PhAGL6a PhAGL6b, and PhMADS4 might play crucial roles in the development of the labellum in Phalaenopsis. Our study provides new insights into how the orchid labellum differs and why the petal or sepal converts to a labellum in Phalaenopsis floral mutants. 

Aligning to the sample-specific reference sequence to optimize the accuracy of next-generation sequencing analysis for hepatitis B virus. 

Wen-Chun Liu,Chih-Peng Lin,Chun-Pei Cheng,Cheng-Hsun Ho,Kuo-Lun Lan,Ji-Hong Cheng,Chia-Jui Yen,Pin-Nan Cheng,I-Chin Wu,I-Chen Li,Bill Chia-Han Chang,Vincent S. Tseng,Yen-Cheng Chiu,Ting-Tsung Chang.

Jornal:Hepatology International (2015)


Hepatitis B virus (HBV) quasispecies are crucial in the pathogenesis of chronic liver disease. Next-generation sequencing (NGS) is powerful for identifying viral quasispecies. To improve mapping quality and single nucleotide variant (SNV) calling accuracy in the NGS analysis of HBV, we compared different mapping references, including the sample-specific reference sequence, same genotype sequences and different genotype sequences, according to the sample.


Real Illumina HBV datasets from 86 patients, and simulated datasets from 158 HBV strains in the GenBank database, were used to assess mapping quality. SNV calling accuracy was evaluated using different mapping references to align Real Illumina datasets from a single HBV clone.


Using the sample-specific reference sequence as a mapping reference produced the largest number of mappable reads and coverages. With a different genotype mapping reference, the consensus sequence derived from the Real Illumina datasets of the single HBV clone showed 21 false SNV callings in polymerase and surface genes, the regions most divergent between the mapping reference and this HBV clone. A ~6 % coverage of most of these false SNVs was yielded even with a same genotype mapping reference, but none with the sample-specific reference sequence.


Using sample-specific reference sequences as a mapping reference in NGS analysis optimized mapping quality and the SNV calling accuracy for HBV quasispecies.

Whole-Genome Sequence of an Epidemic Strain of Burkholderia pseudomallei vgh07 in Taiwan. 

Yao-Shen Chen, Hsi-Hsun Lin, Pei-Tan Hsueh, Pei-Ju Liu, Wen-Fan Ni, Wan-Chia ChungChih-Peng Lin, and Ya-Lei Chen

Jornal:Genome Announc. (2015)

Here, we report the complete genome sequence of B. pseudomallei vgh07. This is an epidemic strain that was isolated from a melioidosis patient with arthro-ost eomyelitis in Taiwan.

Evaluation and Application of the Strand-Specific Protocol for Next-Generation Sequencing.   

Kuo-Wang Tsai, Bill Chang, Cheng-Tsung Pan, Wei-Chen Lin, Ting-Wen Chen, and Sung-Chou Li

Jornal:BioMed Research International (2015)

Next-generation sequencing (NGS) has become a powerful sequencing tool, applied in a wide range of biological studies. However, the traditional sample preparation protocol for NGS is non-strand-specific (NSS), leading to biased estimates of expression for transcripts overlapped at the antisense strand. Strand-specific (SS) protocols have recently been developed. In this study, we prepared the same RNA sample by using the SS and NSS protocols, followed by sequencing with Illumina HiSeq platform. Using real-time quantitative PCR as a standard, we first proved that the SS protocol more precisely estimates gene expressions compared with the NSS protocol, particularly for those overlapped at the antisense strand. In addition, we also showed that the sequence reads from the SS protocol are comparable with those from conventional NSS protocols in many aspects. Finally, we also mapped a fraction of sequence reads back to the antisense strand of the known genes, originally without annotated genes located. Using sequence assembly and PCR validation, we succeeded in identifying and characterizing the novel antisense genes. Our results show that the SS protocol performs more accurately than the traditional NSS protocol and can be applied in future studies.

ViQuaS: An improved reconstruction pipeline for viral quasispecies spectra generated by next-generation sequencing. 

Duleepa Jayasundara, I. Saeed, Suhinthan Maheswararajah, B.C. Chang, S-L. Tang and Saman K. Halgamuge

Jornal:Bioinformatics (2015)


The combined effect of a high replication rate and the low fidelity of the viral polymerase in most RNA viruses and some DNA viruses results in the formation of a viral quasispecies. Uncovering information about quasispecies populations significantly benefits the study of disease progression, antiviral drug design, vaccine design and viral pathogenesis. We present a new analysis pipeline called ViQuaS for viral quasispecies spectrum reconstruction using short next-generation sequencing reads. ViQuaS is based on a novel reference-assisted de novo assembly algorithm for constructing local haplotypes. A significantly extended version of an existing global strain reconstruction algorithm is also used.


Benchmarking results showed that ViQuaS outperformed three other previously published methods named ShoRAH, QuRe and PredictHaplo, with improvements of at least 3.1-53.9% in recall, 0-12.1% in precision and 0-38.2% in F-score in terms of strain sequence assembly and improvements of at least 0.006-0.143 in KL-divergence and 0.001-0.035 in root mean-squared error in terms of strain frequency estimation, over the next-best algorithm under various simulation settings. We also applied ViQuaS on a real read set derived from an in vitro human immunodeficiency virus (HIV)-1 population, two independent datasets of foot-and-mouth-disease virus derived from the same biological sample and a real HIV-1 dataset and demonstrated better results than other methods available.

Mitochondrial genomes of Trichinella species and genotypes - a basis for diagnosis, and systematic and epidemiological explorations. 

Namitha Mohandas, Edoardo Pozio, Giuseppe La Rosa, Pasi K. Korhonen, Neil D. Young, Anson V. Koehler, Ross S. Hall, Paul W. Sternberg, Peter R. Boag, Aaron R. Jex, Bill Chang, Robin B. Gasser 

Jornal:Int J Parasitol. (2014)

 In the present study we sequenced or re-sequenced, assembled and annotated 15 mitochondrial genomes representing the 12 currently recognised taxa of Trichinella using a deep sequencing-coupled approach. We then defined and compared the gene order in individual mitochondrial genomes (14 to 17.7 kb), evaluated genetic differences among species/genotypes and re-assessed the relationships among these taxa using the mitochondrial nucleic acid or amino acid sequence data sets. In addition, a rich source of mitochondrial genetic markers was defined that could be used in future systematic, epidemiological and population genetic studies of Trichinella. The sequencing-bioinformatic approach employed herein should be applicable to a wide range of eukaryotic parasites.

Short-Term Exposure to Fluconazole Induces Chromosome Loss in Candida albicans: An Approach to Produce Haploid Cells. 

Fang-Mo Chang, Tsong-Yih Ou, Wei-Ning Cheng, Ming-Li Chou, Kai-Cheng Lee, Yi-Ping Chin, Chih-Peng LinKai-Di Chang, Che-Tong Lin, Ching-Hua Su

Jornal:Fungal Genet Biol. (2014)

Candida albicans is considered to be an obligate diploid fungus. Here, we describe an approach to isolate aneuploids or haploids induced by the short-term (12-16 h) exposure of diploid reference strains SC5314 and CAI4 to the most commonly used antifungal drug, fluconazole, followed by repeated single-cell separation among small morphologically distinct colonies in the inhibition zone. The isolated strains had altered cell morphology and LOH events in the MTL and other marker alleles of the analyzed loci at 8 chromosomes of C. albicans with decreased DNA content. The present study employed next-generation sequencing (NGS) combined flow cytometry analysis of the DNA content to analyze the haploid, autodiploid, and aneuploid strains that arose from the fluconazole treatment instead of using the conventional single nucleotide polymorphism/comparative genome hybridization (SNP/CGH) method. A multiple-alignment tool was also developed based on sequenced data from NGS to establish haplotype mapping for each chromosome of the selected strains. These findings revealed that C. albicans experiences 'concerted chromosome loss' to form strains with homozygous alleles and that it even has a haploid status after short-term exposure to fluconazole. Additionally, we developed a new platform to analyze chromosome copy number using NGS.

AlgaePath: comprehensive analysis of metabolic pathways using transcript abundance data from next-generation sequencing in green algae. 

Han-Qin Zheng,Yi-Fan Chiang-Hsieh, Chia-Hung Chien, Bo-Kai Justin Hsu, Tsung-Lin Liu,Ching-Nen Nathan Chen and Wen-Chi Chang

Jornal: BMC Genomics (2014)


Algae are important non-vascular plants that have many research applications, including high species diversity, biofuel sources, and adsorption of heavy metals and, following processing, are used as ingredients in health supplements. The increasing availability of next-generation sequencing (NGS) data for algae genomes and transcriptomes has made the development of an integrated resource for retrieving gene expression data and metabolic pathway essential for functional analysis and systems biology. In a currently available resource, gene expression profiles and biological pathways are displayed separately, making it impossible to easily search current databases to identify the cellular response mechanisms. Therefore, in this work the novel AlgaePath database was developed to retrieve transcript abundance profiles efficiently under various conditions in numerous metabolic pathways.


AlgaePath is a web-based database that integrates gene information, biological pathways, and NGS datasets for the green algae Chlamydomonas reinhardtii and Neodesmus sp. UTEX 2219-4. Users can search this database to identify transcript abundance profiles and pathway information using five query pages (Gene Search, Pathway Search, Differentially Expressed Genes (DEGs) Search, Gene Group Analysis, and Co-expression Analysis). The transcript abundance data of 45 and four samples from C. reinhardtii and Neodesmus sp. UTEX 2219-4, respectively, can be obtained directly on pathway maps. Genes that are differentially expressed between two conditions can be identified using Folds Search. The Gene Group Analysis page includes a pathway enrichment analysis, and can be used to easily compare the transcript abundance profiles of functionally related genes on a map. Finally, the Co-expression Analysis page can be used to search for co-expressed transcripts of a target gene. The results of the searches will provide a valuable reference for designing further experiments and for elucidating critical mechanisms from high-throughput data.


AlgaePath is an effective interface that can be used to clarify the transcript response mechanisms in different metabolic pathways under various conditions. Importantly, AlgaePath can be mined to identify critical mechanisms based on high-throughput sequencing. To our knowledge, AlgaePath is the most comprehensive resource for integrating numerous databases and analysis tools in algae. The system can be accessed freely online at http://algaepath.itps.ncku.edu.tw.

De Novo Transcriptome Sequence Assembly from Coconut Leaves and Seeds with a Focus on Factors Involved in RNA-Directed DNA Methylation. 

Ya-Yi Huang, Chueh-Pai Lee, Jason L. Fu, Bill Chia-Han Chang, Antonius J.M. Matzke and Marjori Matzke

Jornal:Genes Genomes Genetics (Bethesda) (2014)

Coconut palm (Cocos nucifera) is a symbol of the tropics and a source of numerous edible and nonedible products of economic value. Despite its nutritional and industrial significance, coconut remains under-represented in public repositories for genomic and transcriptomic data. We report de novo transcript assembly from RNA-seq data and analysis of gene expression in seed tissues (embryo and endosperm) and leaves of a dwarf coconut variety. Assembly of 10 GB sequencing data for each tissue resulted in 58,211 total unigenes in embryo, 61,152 in endosperm, and 33,446 in leaf. Within each unigene pool, 24,857 could be annotated in embryo, 29,731 could be annotated in endosperm, and 26,064 could be annotated in leaf. A KEGG analysis identified 138, 138, and 139 pathways, respectively, in transcriptomes of embryo, endosperm, and leaf tissues. Given the extraordinarily large size of coconut seeds and the importance of small RNA-mediated epigenetic regulation during seed development in model plants, we used homology searches to identify putative homologs of factors required for RNA-directed DNA methylation in coconut. The findings suggest that RNA-directed DNA methylation is important during coconut seed development, particularly in maturing endosperm. This dataset will expand the genomics resources available for coconut and provide a foundation for more detailed analyses that may assist molecular breeding strategies aimed at improving this major tropical crop.


The Complete Plastid Genome Sequence of Madagascar Periwinkle Catharanthus roseus (L.) G. Don: Plastid Genome Evolution, Molecular Marker Identification, and Phylogenetic Implications in Asterids 

Chuan Ku, Wan-Chia Chung, Ling-Ling Chen, Chih-Horng Kuo

Jornal:PLoS One (2013)

The Madagascar periwinkle (Catharanthusroseus in the family Apocynaceae) is an important medicinal plant and is the source of several widely marketed chemotherapeutic drugs. It is also commonly grown for its ornamental values and, due to ease of infection and distinctiveness of symptoms, is often used as the host for studies on phytoplasmas, an important group of uncultivated plant pathogens. To gain insights into the characteristics of apocynaceous plastid genomes (plastomes), we used a reference-assisted approach to assemble the complete plastome of C. roseus, which could be applied to other C. roseus-related studies. The C. roseus plastome is the second completely sequenced plastome in the asterid order Gentianales. We performed comparative analyses with two other representative sequences in the same order, including the complete plastome of Coffeaarabica (from the basal Gentianales family Rubiaceae) and the nearly complete plastome of Asclepiassyriaca (Apocynaceae). The results demonstrated considerable variations in gene content and plastome organization within Apocynaceae, including the presence/absence of three essential genes (i.e., accD, clpP, and ycf1) and large size changes in non-coding regions (e.g., rps2-rpoC2 and IRb-ndhF). To find plastome markers of potential utility for Catharanthus breeding and phylogenetic analyses, we identified 41 C. roseus-specific simple sequence repeats. Furthermore, five intergenic regions with high divergence between C. roseus and three other euasterids I taxa were identified as candidate markers. To resolve the euasterids I interordinal relationships, 82 plastome genes were used for phylogenetic inference. With the addition of representatives from Apocynaceae and sampling of most other asterid orders, a sister relationship between Gentianales and Solanales is supported.

Comparative Analysis of the Peanut Witches'-Broom Phytoplasma Genome Reveals Horizontal Transfer of Potential Mobile Units and Effectors

Wan-Chia Chung, Ling-Ling Chen, Wen-Sui Lo, Chan-Pin Lin, Chih-Horng Kuo

Jornal:PLoS One (2013)

Phytoplasmas are a group of bacteria that are associated with hundreds of plant diseases. Due to their economical importance and the difficulties involved in the experimental study of these obligate pathogens, genome sequencing and comparative analysis have been utilized as powerful tools to understand phytoplasma biology. To date four complete phytoplasma genome sequences have been published. However, these four strains represent limited phylogenetic diversity. In this study, we report the shotgun sequencing and evolutionary analysis of a peanut witches'-broom (PnWB) phytoplasma genome. The availability of this genome provides the first representative of the 16SrII group and substantially improves the taxon sampling to investigate genome evolution. The draft genome assembly contains 13 chromosomal contigs with a total size of 562,473 bp, covering ∼90% of the chromosome. Additionally, a complete plasmid sequence is included. Comparisons among the five available phytoplasma genomes reveal the differentiations in gene content and metabolic capacity. Notably, phylogenetic inferences of the potential mobile units (PMUs) in these genomes indicate that horizontal transfer may have occurred between divergent phytoplasma lineages. Because many effectors are associated with PMUs, the horizontal transfer of these transposon-like elements can contribute to the adaptation and diversification of these pathogens. In summary, the findings from this study highlight the importance of improving taxon sampling when investigating genome evolution. Moreover, the currently available sequences are inadequate to fully characterize the pan-genome of phytoplasmas. Future genome sequencing efforts to expand phylogenetic diversity are essential in improving our understanding of phytoplasma evolution.

Complete Genome Sequence of Serratia marcescens WW4.  

Wan-Chia Chung, Ling-Ling Chen, Wen-Sui Lo, Pei-An Kuo, Jenn Tu, Chih-Horng Kuo

Jornal:Genome Announc. (2013)

Serratia marcescens WW4 is a biofilm-forming bacterium isolated from paper machine aggregates. Under conditions of phosphate limitation, this bacterium exhibits intergeneric inhibition of Pseudomonas aeruginosa. Here, the complete genome sequence of S. marcescens WW4, which consists of one circular chromosome (5,241,455 bp) and one plasmid (pSmWW4; 3,248 bp), was determined.

Comparative genome analysis of Spiroplasma melliferum IPMB4A, a honeybee-associated bacterium.

Wen-Sui Lo, Ling-Ling Chen, Wan-Chia Chung, Gail E Gasparich, Chih-Horng Kuo

Jornal:BMC Genomics (2013)


The genus Spiroplasma contains a group of helical, motile, and wall-less bacteria in the class Mollicutes. Similar to other members of this class, such as the animal-pathogenic Mycoplasma and the plant-pathogenic 'Candidatus Phytoplasma', all characterized Spiroplasma species were found to be associated with eukaryotic hosts. While most of the Spiroplasma species appeared to be harmless commensals of insects, a small number of species have evolved pathogenicity toward various arthropods and plants. In this study, we isolated a novel strain of honeybee-associated S. melliferum and investigated its genetic composition and evolutionary history by whole-genome shotgun sequencing and comparative analysis with other Mollicutes genomes.


The whole-genome shotgun sequencing of S. melliferum IPMB4A produced a draft assembly that was ~1.1 Mb in size and covered ~80% of the chromosome. Similar to other Spiroplasma genomes that have been studied to date, we found that this genome contains abundant repetitive sequences that originated from plectrovirus insertions. These phage fragments represented a major obstacle in obtaining a complete genome sequence of Spiroplasma with the current sequencing technology. Comparative analysis of S. melliferum IPMB4A with other Spiroplasma genomes revealed that these phages may have facilitated extensive genome rearrangements in these bacteria and contributed to horizontal gene transfers that led to species-specific adaptation to different eukaryotic hosts. In addition, comparison of gene content with other Mollicutes suggested that the common ancestor of the SEM (Spiroplasma, Entomoplasma, and Mycoplasma) clade may have had a relatively large genome and flexible metabolic capacity; the extremely reduced genomes of present day Mycoplasma and 'Candidatus Phytoplasma' species are likely to be the result of independent gene losses in these lineages.


The findings in this study highlighted the significance of phage insertions and horizontal gene transfer in the evolution of bacterial genomes and acquisition of pathogenicity. Furthermore, the inclusion of Spiroplasma in comparative analysis has improved our understanding of genome evolution in Mollicutes. Future improvements in the taxon sampling of available genome sequences in this group are required to provide further insights into the evolution of these important pathogens of humans, animals, and plants.