APP下载

Chromosome-level genome assembly of the freshwater snail Bellamya purificata (Caenogastropoda)

2022-08-05WuJin,Xiao-JuanCao,Xue-YanMa

Zoological Research 2022年4期

DEAR EDITOR,

Bellamyapurificata(Gastropoda:Caenogastropoda;Architaenioglossa;Viviparidae:Bellamyinae;Sinotaia),a homotypic synonym ofS.purificata,is widely distributed in freshwater habitats in Asia.It is an economically important edible snail and plays a vital function in freshwater wetland ecology.However,genomic resources for this snail are lacking and no reference genome has been released.In this study,we constructed the first chromosome-level genome ofB.purificatausing PacBio long-read sequencing and highthroughput chromosome conformation capture (Hi-C)technology.In total,33.64 Gb of circular consensus sequencing reads were generated.The preliminary genome assembly was 1.01 Gb in size,with a contig N50 of 45.14 Mb.Using Hi-C data,the assembled sequences were anchored to eight pseudochromosomes.After Hi-C correction,the final genome was 984.33 Mb,with a contig N50 of 37.21 Mb and scaffold N50 of 141.97 Mb.The chromosome anchoring rate was 99.28%.A total of 22 125 protein-coding genes were predicted.Phylogenetic analyses indicated thatB.purificatadiverged fromPomacea canaliculataapproximately 288 million years ago (Ma).We identified 34 expanded and 26 contracted gene families inB.purificatacompared with its most recent common ancestor.Four protein-coding genes under positive selection inB.purificatawere identified (false discovery rate (FDR)<0.05).These genomic data provide a valuable resource for ecological and evolutionary studies of the family Viviparidae,and for genetic improvement ofB.purificata.

Bellamya purificatais the largest snail species in Bellamyinae,Viviparidae,Caenogastropoda,and is widely distributed in freshwater habitats across Asia.It feeds on microalgae and organic detritus,serving as an important consumer in wetland ecosystems (Zhao et al.,2014).During gill respiration,B.purificatacan captureMicrocystisparticles in water to form mucus-encapsulatedMicrocystisclusters.This behavior decreasesMicrocystisbiomass suspended in water for a short period and temporarily reduces bloom states(Qu et al.,2010).In addition,B.purificatacan scrape and consume suspended particles adsorbed on submerged plants(Li et al.,2007),thereby reducing total nitrogen and ammonia nitrogen levels in water (Zhao et al.,2014).

Bellamya purificatais also an economically important edible snail in Asia (Lydeard &Cummings,2019).Over the past several years,“snail rice noodle” dishes (i.e.,rice noodles with snails) have become a “net red food” (i.e.,popular online) in China,creating more than 10 billion yuan in economic benefits each year.The annual demand for freshwater Viviparidae snails in China is about 1.5 million tons (Jin et al.,2022).However,according to the 2021 China Fishery Statistical Yearbook,annual yield of cultured Viviparidae snails in China is only 90 640 tons,indicating a huge gap between supply and demand.To date,most cultured freshwater snails are sourced from slow-growing wild populations.Therefore,to promote the development of snail aquaculture,freshwater snail breeding programs,especially forB.purificata,have become a priority.

Currently,genomic resources forB.purificataare lacking,with only a single transcriptome and proteome analysis performed to screen for shell color-related genes (Huang et al.,2021).However,a detailed reference genome ofB.purificatais required for ecological and evolutionary research and genetic improvement.Third-generation sequencing technologies can expand sequencing reads and provide superior platforms to produce complete and high-quality genomes.In this study,PacBio long-read sequencing and high-throughput chromosome conformation capture (Hi-C)technology were used to assemble a high-quality chromosome-level genome ofB.purificata.Comparative genomic analysis ofB.purificatawith eight other mollusk species was performed to exploreB.purificataevolution in Asia.

To estimate the genome size ofB.purificata,we performedk-mer (k=17) frequency distribution analysis using 181.6 Gb of clean data (Figure 1A).In this process,17 bpk-mers (17-mer)were extracted from the sequencing data and 17-mer frequency was calculated.Here,17-mer analysis conformed to Poisson distribution,with an estimated genome size of 939 Mb.

Figure 1 Characterization of chromosome-level genome of Bellamya purificata

PacBio sequencing data were obtained from the PacBio Sequel II platform (Supplementary Table S1).Data were assembled using Hifiasm (v0.16.1),then polished.Redundancies and haplotigs were removed,producing an assembly with a genome size of 1 006 Mb.Genome length was consistent with that estimated byk-mer analysis.The total number of contigs was 265,with N50 reaching 45.14 Mb.The genome sizes of three other gastropod species (Biomphalaria glabrata,Achatina fulica,andLottia gigantea) range from 359 Mb to 2.12 Gb (Adema et al.,2017;Guo et al.,2019;Simakov et al.,2013).In total,100 786 genomic fragments (based on a step length of 1 kb) were randomly selected and mapped to the Non-Redundant (NT) database,with more than 80% of these fragments aligned to mollusk genomes.Based on Benchmarking Universal Single-Copy Orthologs (BUSCO)analysis,100% (954/954) of complete BUSCO genes were found in the assembly (Supplementary Table S2),including 94.9% complete and single-copy BUSCOs and 1.9% complete and duplicated BUSCOs.The Circos plot is shown in Figure 1B.Overall,the above results indicated that the genome assembly ofB.purificatawas of high quality.

The genome contigs were further anchored and oriented to chromosomes by Hi-C scaffolding.The Hi-C library generated 124.37 Gb of clean data,with 88.25% validly paired.Using LACHESIS software (v0.1.19),99.28% of the assembled sequences were anchored to eight pseudochromosomes.The eight pseudochromosomes were clearly distinguished from the Hi-C heatmap and interactions within the pseudochromosomes were strong (Figure 1C),indicating highquality anchoring.The final assembly yielded a high-quality genome of 984.33 Mb,with contig N50 of 37.21 Mb and scaffold N50 of 141.97 Mb (Supplementary Table S3).

A total of 482.49 Mb of repeat sequences were annotated,accounting for 47.93% of the total genome (Supplementary Table S4).This percentage is approximately the same as that of the genome survey.The major repetitive elements were DNA (20.97%),long terminal repeats (LTRs;8.48%),and long interspersed nuclear elements (LINEs;5.38%)(Supplementary Table S5).De novoand homology-based methods predicted 22 125 protein-coding genes.Average gene length,average CDS length,average exon length,average intron length,and average exon number per gene were 23 810 bp,1 547 bp,400.32 bp,2 739 bp,and 8.46,respectively (Supplementary Table S6).BUSCO evaluation of the predicted genes revealed 923 orthologous genes,including 905 complete single-copy BUSCOs and 18 complete duplicated BUSCOs (Supplementary Table S2).A total of 18 958 genes,accounting for 85.69% of all predicted genes,were annotated using public databases (Supplementary Table S7).For non-coding RNA predictions,we successfully annotated 39 microRNAs (miRNAs),202 transfer RNAs(tRNAs),77 ribosomal RNAs (rRNAs),and 145 small noncoding RNAs (snRNAs),with average lengths of 82,76,333,and 142 bp,respectively (Supplementary Table S8).

Comparison of theB.purificatagenome with that of eight other mollusk species (Pomacea canaliculata,Gastropoda;Biomphalaria glabrata,Gastropoda;A.fulica,Gastropoda;Crassostrea gigas,Bivalvia;Lingula anatina,Lingulata;Lottia gigantea,Gastropoda;Mytilus galloprovincialis,Bivalvia;Patinopecten yessoensis,Bivalvia) revealed 8 943 gene families and 990 single-copy genes.TheB.purificatagenome contained a total of 22 125 genes clustered into 17 659 gene families,including 1 857 unique families.Average gene number per family ranged from 1.385 (B.purificata) to 4.048(M.galloprovincialis) for the nine species (Supplementary Table S9).

Based on the protein sequences of the single-copy genes,we constructed a phylogenetic tree,which showed that the divergence time betweenB.purificataand other shellfish species was about 465.0 (288.0–619.3) Ma (Figure 1D),consistent with previous studies (Sun et al.,2020).In addition,we identified 34 expanded gene families (339 genes) and 26 contracted gene families (72 genes) by comparing the genome ofB.purificatawith its most recent common ancestor(Figure 1E).Four protein-coding genes under positive selection were identified inB.purificata(FDR<0.05,Supplementary Table S10).Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis of the positively selected genes showed enrichment in G proteincoupled protein receptor activity,acetylcholine receptor activity,and taste transduction function (Figure 1F,G).

In summary,we assembled the first chromosome-level genome ofB.purificataby integrating PacBio long reads and Hi-C data.The assembled genome was 984.33 Mb,similar to the estimated size.Due to the intrinsic long length of PacBio sequencing,contig N50 of the assembled genome was 37.21 Mb and scaffold N50 was 141.97 Mb.This genome assembly and analysis provide important data for further study of Viviparidae species,laying a solid foundation for a range of breeding,conservation,and phylogenetic studies ofBellamyain the future.

DATA AVAILABILITY

The whole-genome assembly ofB.purificatawas submitted to NCBI,GSA,and Science Data Bank under PRJNA818874,CRA007090,and CSTR:31253.11.sciencedb.01820,respectively.

SUPPLEMENTARY DATA

Supplementary data to this article can be found online.

COMPETING INTERESTS

The authors declare that they have no competing interests.

AUTHORS’ CONTRIBUTIONS

W.J.and X.J.C.conceived and managed the project.X.Y.M.,G.H.L.,G.C.X.,P.X.,and B.S.collected the sequencing samples.W.J.and X.J.C.wrote the manuscript.D.P.X.and H.B.W.contributed to revision of the manuscript.All authors read and approved the final version of the manuscript.

ACKNOWLEDGEMENTS

We are grateful for the constructive comments from three anonymous reviewers.

Wu Jin1,2,#,Xiao-Juan Cao3,#,Xue-Yan Ma1,2,Guo-Hua Lv1,Gang-Chun Xu1,2,Pao Xu1,2,Bing Sun3,Dong-Po Xu1,2,*,Hai-Bo Wen1,2,*

1Key Laboratory of Integrated Rice-Fish Farming Ecology,

Ministry of Agriculture and Rural Affairs,Freshwater Fisheries Research Center,Chinese Academy of Fishery Sciences,Wuxi,Jiangsu214081,China

2Wuxi Fisheries College,Nanjing Agricultural University,Wuxi,Jiangsu214128,China

3College of Fisheries,Engineering Research Center of Green Development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt,Ministry of Education,Huazhong Agricultural University,Wuhan,Hubei430070,China

#Authors contributed equally to this work

*Corresponding authors,E-mail:xudp@ffrc.cn;wenhb@ffrc.cn