[1]靳建锋,张峰.土壤重要动物白符虫兆(Folsomia candida)基因组的重新组装与注释[J].南京农业大学学报,2020,43(6):1042-1048.[doi:10.7685/jnau.201912054]
 JIN Jianfeng,ZHANG Feng.An improved genome of the soil important animal Folsomia candida[J].Journal of Nanjing Agricultural University,2020,43(6):1042-1048.[doi:10.7685/jnau.201912054]

土壤重要动物白符虫兆(Folsomia candida)基因组的重新组装与注释()




An improved genome of the soil important animal Folsomia candida
靳建锋 张峰
南京农业大学植物保护学院, 江苏 南京 210095
JIN Jianfeng ZHANG Feng
College of Plant Protection, Nanjing Agricultural University, Nanjing 210095, China
Folsomia candidaPacBiogenome assemblygene annotationgene family evolution
[目的] 本文利用公开可用的PacBio测序数据进行白符虫兆(Folsomia candida)基因组的重新组装和注释,以提升该物种基因组组装的连续性和注释基因的完整性。[方法] 使用Flye和Falcon进行组装,使用quickmerge合并组装结果。基因注释由MAKER完成,整合了从头(de novo)预测、转录本和蛋白质同源证据。其中转录组基于StringTie组装结果。[结果] 新组装的基因组大小为221.09 Mb,共113条序列(scaffolds),其中最长scaffold为30.07 Mb,N50长度为13.5 Mb。新组装的基因组组装结果基于通用单拷贝直系同源基因(BUSCO)的完整性评估为96.5%。MAKER注释流程预测了20 080个蛋白质编码基因,其中80.56%和96%的基因获得转录组和UniProt蛋白质数据支持,蛋白质编码基因BUSCO完整性评估为92.4%。此外,结构注释鉴定了253 665条(22.3%)重复序列和661个非编码RNA。基因家族进化分析鉴定了8 876个基因家族,其中48个家族发生了快速进化(扩张或收缩)事件。[结论] 白符虫兆基因组的重新组装与注释的版本与原版本相比有显著提高,scaffold N50由6.5 Mb提高到13.5 Mb,蛋白质编码基因的完整性由84%提高到92.4%。基因家族进化分析为六足动物的进化及土壤生态毒理学提供重要的基础资料和新视角。
[Objectives] To obtain a higher quality of genome of Folsomia candida,this study reassembled and re-annotated the genome of Folsomia candida using publicly available PacBio sequencing data.[Methods] Flye and Falcon assemblers were used to assemble the genome,and two resulting assemblies were merged by using quickmerge. Gene annotation was analyzed with the MAKER pipeline,by integrating ab initio,transcriptome-based evidence,and protein homology-based evidence,of which the transcriptome-based evidence was obtained by StringTie assemble results.[Results] The size of the newly assembled genome was 221.09 Mb and the number of scaffolds was 113,among which the longest scaffold was 30.07 Mb and the N50 length was 13.5 Mb. The new version of assembly genome captured 96.5% complete arthropod Benchmarking Universal Single-Copy Orthologs(BUSCO,n=1 066). We predicted 20 080 protein-coding genes,of which 80.56% were supported by transcriptome-based evidence and 96% were supported by UniProt;the protein-coding genes BUSCO completeness evaluation was 92.4%. We also identified 253 665 repeats and 661 noncoding RNA. We further identified 8 876 gene families,of which 48 experienced significant expansions or contractions.[Conclusions] The new version of genome assembly and annotation indicates a significant improvement in continuity compared to the published version,in which scaffold N50 was increased from 6.5 Mb to 13.5 Mb and the complete of protein coding gene was increased from 84% to 92.4%. Gene family analysis will provide fundamental information and new insights for hexapod evolution and soil ecotoxicology.


