[1]靳建锋,张峰.土壤重要动物白符虫兆(Folsomia candida)基因组的重新组装与注释[J].南京农业大学学报,2020,43(6):1042-1048.[doi:10.7685/jnau.201912054]
 JIN Jianfeng,ZHANG Feng.An improved genome of the soil important animal Folsomia candida[J].Journal of Nanjing Agricultural University,2020,43(6):1042-1048.[doi:10.7685/jnau.201912054]

土壤重要动物白符虫兆(Folsomia candida)基因组的重新组装与注释()




An improved genome of the soil important animal Folsomia candida
靳建锋 张峰
南京农业大学植物保护学院, 江苏 南京 210095
JIN Jianfeng ZHANG Feng
College of Plant Protection, Nanjing Agricultural University, Nanjing 210095, China
Folsomia candidaPacBiogenome assemblygene annotationgene family evolution
[目的] 本文利用公开可用的PacBio测序数据进行白符虫兆(Folsomia candida)基因组的重新组装和注释,以提升该物种基因组组装的连续性和注释基因的完整性。[方法] 使用Flye和Falcon进行组装,使用quickmerge合并组装结果。基因注释由MAKER完成,整合了从头(de novo)预测、转录本和蛋白质同源证据。其中转录组基于StringTie组装结果。[结果] 新组装的基因组大小为221.09 Mb,共113条序列(scaffolds),其中最长scaffold为30.07 Mb,N50长度为13.5 Mb。新组装的基因组组装结果基于通用单拷贝直系同源基因(BUSCO)的完整性评估为96.5%。MAKER注释流程预测了20 080个蛋白质编码基因,其中80.56%和96%的基因获得转录组和UniProt蛋白质数据支持,蛋白质编码基因BUSCO完整性评估为92.4%。此外,结构注释鉴定了253 665条(22.3%)重复序列和661个非编码RNA。基因家族进化分析鉴定了8 876个基因家族,其中48个家族发生了快速进化(扩张或收缩)事件。[结论] 白符虫兆基因组的重新组装与注释的版本与原版本相比有显著提高,scaffold N50由6.5 Mb提高到13.5 Mb,蛋白质编码基因的完整性由84%提高到92.4%。基因家族进化分析为六足动物的进化及土壤生态毒理学提供重要的基础资料和新视角。
[Objectives] To obtain a higher quality of genome of Folsomia candida,this study reassembled and re-annotated the genome of Folsomia candida using publicly available PacBio sequencing data.[Methods] Flye and Falcon assemblers were used to assemble the genome,and two resulting assemblies were merged by using quickmerge. Gene annotation was analyzed with the MAKER pipeline,by integrating ab initio,transcriptome-based evidence,and protein homology-based evidence,of which the transcriptome-based evidence was obtained by StringTie assemble results.[Results] The size of the newly assembled genome was 221.09 Mb and the number of scaffolds was 113,among which the longest scaffold was 30.07 Mb and the N50 length was 13.5 Mb. The new version of assembly genome captured 96.5% complete arthropod Benchmarking Universal Single-Copy Orthologs(BUSCO,n=1 066). We predicted 20 080 protein-coding genes,of which 80.56% were supported by transcriptome-based evidence and 96% were supported by UniProt;the protein-coding genes BUSCO completeness evaluation was 92.4%. We also identified 253 665 repeats and 661 noncoding RNA. We further identified 8 876 gene families,of which 48 experienced significant expansions or contractions.[Conclusions] The new version of genome assembly and annotation indicates a significant improvement in continuity compared to the published version,in which scaffold N50 was increased from 6.5 Mb to 13.5 Mb and the complete of protein coding gene was increased from 84% to 92.4%. Gene family analysis will provide fundamental information and new insights for hexapod evolution and soil ecotoxicology.


[1] 尹文英. 土壤动物学研究的回顾与展望[J]. 生物学通报,2001,36(8):1-3. Yin W Y. A brief review and prospect on soil zoology[J]. Bulletin of Biology,2001,36(8):1-3(in Chinese with English abstract).
[2] Hirst S,Maulik S. On some arthropod remains from the Rhynie Chert(Old Red Sandstone)[J]. Geological Magazine,1926,63:69-71.
[3] Christiansen K A. Springtails[J]. The Kansas School Naturalist,1992,39:1-16.
[4] 陈建秀,麻智春,严海娟,等. 跳虫在土壤生态系统中的作用[J]. 生物多样性,2007,15(2):154-161. Chen J X,Ma Z C,Yan H J,et al. Roles of springtails in soil ecosystem[J]. Biodiversity Science,2007,15(2):154-161(in Chinese with English abstract).
[5] Fountain M T,Hopkin S P. Folsomia candida(Collembola):a "standard" soil arthropod[J]. Annual Review of Entomology,2005,50:201-222.
[6] Fava F,Bertin L. Use of exogenous specialised bacteria in the biological detoxification of a dump site-polychlorobiphenyl-contaminated soil in slurry phase conditions[J]. Biotechnology and Bioengineering,1999,64:240-249.
[7] Fava F,di Gioia D,Marchetti L. Role of the reactor configuration in the biological detoxification of a dump site polychlorobiphenyl-contaminated soil in lab-scale slurry phase conditions[J]. Applied Microbiology and Biotechnology,2000,53:243-248.
[8] Crouau Y,Gisclard C,Perotti P. The use of Folsomia candida(Collembola:Isotomidae)in bioassays of waste[J]. Applied Soil Ecology,2002,19:65-70.
[9] Fava F,Piccolo A. Effects of humic substances on the bioavailability and aerobic biodegradation of polychlorinated biphenyls in a model soil[J]. Biotechnology and Bioengineering,2002,77:204-211.
[10] Lock K,Janssen C R. Effect of new soil metal immobilizing agents on metal toxicity to terrestrial invertebrates[J]. Environmental Pollution,2003,121(1):123-127.
[11] Faddeeva-Vakhrusheva A,Derks M F L,Anvar S Y,et al. Gene family evolution reflects adaptation to soil environmental stressors in the genome of the collembolan Orchesella cincta[J]. Genome Biology and Evolution,2016,8(7):2106-2117.
[12] Faddeeva-Vakhrusheva A,Kraaijeveld K,Martijn F L,et al. Coping with living in the soil:the genome of the parthenogenetic springtail Folsomia candida[J]. BMC Genomics,2017,18:493.
[13] Wu C,Jordan M D,Newcomb R D,et al. Analysis of the genome of the New Zealand giant collembolan(Holacanthella duospinosa)sheds light on hexapod evolution[J]. BMC Genomics,2017,18:795.
[14] Zhang F,Ding Y H,Zhou Q S,et al. A high-quality draft genome assembly of Sinella curviseta:a soil model organism(Collembola)[J]. Genome Biology and Evolution,2019,11(2):521-530.
[15] Waterhouse R M,Seppey M,Sim?o F A,et al. BUSCO applications from quality assessments to gene prediction and phylogenomics[J]. Molecular Biology and Evolution,2018,35(3):543-548.
[16] Falcon Pacific Biosciences[EB/OL].[2020-04-26]. https://github.com/PacificBiosciences/FALCON.
[17] Kolmogorov M,Yuan J,Lin Y,et al. Assembly of long,error-prone reads using repeat graphs[J]. Nature Biotechnology,2019,37(5):540-546.
[18] Chakraborty M,Baldwin-Brown J G,Long A D,et al. Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage[J]. Nucleic Acids Research,2016,44(19):e147.
[19] Bushnell B,Rood J,Singer E. BBMerge-Accurate paired shotgun read merging via overlap[J]. PLoS One,2017,12(10):e0185056.
[20] Vurture G W,Sedlazeck F J,Nattestad M,et al. GenomeScope:fast reference-free genome profiling from short reads[J]. Bioinformatics,2017,33(14):2202-2204.
[21] Walker B J,Abeel T,Shea T,et al. Pilon:an integrated tool for comprehensive microbial variant detection and genome assembly improvement[J]. PLoS One,2014,9(11):e112963.
[22] Altschul S F,Madden T L,Sch?ffer A A,et al. Gapped BLAST and PSI-BLAST:a new generation of protein database search programs[J]. Nucleic Acids Research,1997,25:3389-3402.
[23] Chen Y,Ye W C,Zhang Y D,et al. High speed BLASTN:an accelerated MegaBLAST search tool[J]. Nucleic Acids Research,2015,43(16):7762-7768.
[24] Kim D,Langmead B,Salzberg S L. HISAT:a fast spliced aligner with low memory requirements[J]. Nature Methods,2015,12(4):357-360.
[25] Pertea M,Pertea G M,Antonescu C M,et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads[J]. Nature Biotechnology,2015,33(3):290-295.
[26] Smit A F A,Hubley R. 2008-2015. RepeatModeler Open-1.0[EB/OL].[2020-01-10]. http://www.repeatmasker.org.
[27] Smit A F A,Hubley R,Green P. 2013-2015. RepeatMasker Open-4.0[EB/OL].[2019-10-31]. http://www.repeatmasker.org.
[28] Holt C,Yandell M. MAKER2:an annotation pipeline and genome database management tool for second-generation genome projects[J]. BMC Bioinformatics,2011,12:491.
[29] Stanke M,Steinkamp R,Waack S,et al. AUGUSTUS:a web server for gene finding in eukaryotes[J]. Nucleic Acids Research,2004,32:W309-W312.
[30] Lomsadze A,Ter-Hovhannisyan V,Chernoff Y O,et al. Gene identification in novel eukaryotic genomes by self-training algorithm[J]. Nucleic Acids Research,2005,33(20):6494-6506.
[31] Hoff K J,Lange S,Lomsadze A,et al. BRAKER1:unsupervised RNA-seq-based genome annotation with GeneMark-ET and AUGUSTUS[J]. Bioinformatics,2016,32(5):767-769.
[32] Buchfink B,Xie C,Huson D H. Fast and sensitive protein alignment using DIAMOND[J]. Nature Methods,2015,12(1):59-60.
[33] Finn R D,Attwood T K,Babbitt P C,et al. InterPro in 2017-beyond protein family and domain annotations[J]. Nucleic Acids Research,2017,45(D1):D190-D199.
[34] Finn R D,Bateman A,Clements J,et al. Pfam:the protein families database[J]. Nucleic Acids Res,2014,42(D1):D222-D230.
[35] Mi H Y,Huang X S,Muruganujan A,et al. PANTHER version 11:expanded annotation data from Gene Ontology and Reactome pathways,and data analysis tool enhancements[J]. Nucleic Acids Research,2017,45(D1):D183-D189.
[36] Lewis T E,Sillitoe I,Dawson N,et al. Gene3D:extensive prediction of globular domains in proteins[J]. Nucleic Acids Research,2018,46(D1):D435-D439.
[37] Wilson D,Pethica R,Zhou Y D,et al. SUPERFAMILY:sophisticated comparative genomics,data mining,visualization and phylogeny[J]. Nucleic Acids Research,2009,37(Suppl 1):D380-D386.
[38] Marchler-Bauer A,Bo Y,Han L Y,et al. CDD/SPARCLE:functional classification of proteins via subfamily domain architectures[J]. Nucleic Acids Research,2017,45(D1):D200-D203.
[39] Nawrocki E P,Eddy S R. Infernal 1.1:100-fold faster RNA homology searches[J]. Bioinformatics,2013,29(22):2933-2935.
[40] Lowe T M,Eddy S R. tRNAscan-SE:a program for improved detection of transfer RNA genes in genomic sequence[J]. Nucleic Acids Research,1997,25(5):955-964.
[41] Emms D M,Kelly S. OrthoFinder:solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy[J]. Genome Biology,2015,16:157.
[42] Katoh K,Standley D M. MAFFT multiple sequence alignment software version 7:improvements in performance and usability[J]. Molecular Biology and Evolution,2013,30(4):772-780.
[43] Capella-Gutiérrez S,Silla-Martínez J M,Gabaldón T. trimAl:a tool for automated alignment trimming in large-scale phylogenetic analyses[J]. Bioinformatics,2009,25(15):1972-1973.
[44] Chernomor O,von Haeseler A,Minh B Q. Terrace aware data structure for phylogenomic inference from supermatrices[J]. Systematic Biology,2016,65(6):997-1008.
[45] Sanderson M J. r8s:inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock[J]. Bioinformatics,2003,19(2):301-302.
[46] Han M V,Thomas G W C,Lugo-Martinez J,et al. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3[J]. Molecular Biology and Evolution,2013,30(8):1987-1997.
[47] 王楠,张春玉. 基因编辑技术的研究进展[J]. 国际遗传学杂志,2016,39(4):208-212. Wang N,Zhang C Y. The research progress of gene editing[J]. International Journal of Genetics,2016,39:208-212(in Chinese with English abstract).
[48] 齐琪,王彦,莫雨佳,等. 三七总皂苷对羧酸酯酶体外活性的影响[J]. 中国现代中药,2019,21(6):777-781. Qi Q,Wang Y,Mo Y J,et al. Effect of Panax notoginseng saponins on activity of carboxylesterases in vitro[J]. Modern Chinese Medicine,2019,21(6):777-781(in Chinese with English abstract).
[49] Bankaitis V A,Malehorn D E,Emr S D,et al. The Saccharomyces cerevisiae SEC14 gene encodes a cytosolic factor that is required for transport of secretory proteins from the yeast Golgi complex[J]. Journal of Cell Biology,1989,108:1271-1281.
[50] McGee T P,Skinner H B,Whitters E A,et al. A phosphatidylinositol transfer protein controls the phosphatidylcholine content of yeast Golgi membranes[J]. Cell Biology,1994,124:273-287.
[51] Huijbregts R P H,Topalof L,Bankaitis V A. Lipid metabolism and regulation of membrane trafficking[J]. Traffic,2000,1(3):195-202.
[52] Li X M,Xie Z G,Bankaitis V A. Phosphatidylinositol/phosphatidylcholine transfer proteins in yeast[J]. Biochim Biophys Acta,2000,1486(1):55-71.
[53] Xie Z,Fang M,Bankaitis V A. Evidence for an intrinsic toxicity of phosphatidylcholine to Secl4p-dependent protein transport from the yeast Golgi complex[J]. Molecular Biology of the Cell,2001,12(4):1117-1129.
[54] Dou T H,Gu S H,Zhou Z X,et al. Note:isolation and characterization of a Jerky and JRK/JH8 like gene,tigger transposable element derived 7,TIGD7[J]. Biochemical Genetics,2004,42(7/8):279-285.
[55] Saveliev S. Trypsin/Lys-C mix,a new member of trypsin product line,for enhanced protein mass spectrometry analysis and whole cell yeast and human proteome reference extracts for mass spectrometry method development and instrument performance monitoring[J]. EuPA Open Proteomics,2014,2:63.
[56] 卢戌. 基于第二代测序的转录组组装软件比较研究[D]. 兰州:兰州大学,2013. Lu X. A comparison of transcriptome assembly software for next-generation sequencing technologies[D]. Lanzhou:Lanzhou University,2013(in Chinese with English abstract).


更新日期/Last Update: 1900-01-01