APP下载

Evolutionary genetics of wheat mitochondrial genomes

2023-12-25HuiLinHuFnZhngPeiWngFuHoLu

The Crop Journal 2023年6期

Hui-Lin Hu, Fn Zhng, Pei Wng, Fu-Ho Lu,*

a State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, Kaifeng 475004, Henan, China

b Henan Key Laboratory of Big Data Analysis and Processing, School of Computer and Information Engineering, Henan University, Kaifeng 475004, Henan, China

c School of Mathematics and Statistics, Henan University, Kaifeng 475004, Henan, China

Keywords:Wheat Mitochondrion mtDNA Comparative genomics Polyploidization

ABSTRACT The Triticum-Aegilops complex provides ideal models for the study of polyploidization,and mitochondrial genomes (mtDNA) can be used to trace cytoplasmic inheritance and energy production following polyploidization.In this study, gapless mitochondrial genomes for 19 accessions of five Triticum or Aegilops species were assembled.Comparative genomics confirmed that the BB-genome progenitor donated mtDNA to tetraploid T.turgidum (genome formula AABB), and that this mtDNA was then passed on to the hexaploid T.aestivum (AABBDD).T urartu (AA) was the paternal parent of T.timopheevii (AAGG),and an earlier Ae.tauschii (DD) was the maternal parent of Ae.cylindrica (CCDD).Genic sequences were highly conserved within species,but frequent rearrangements and nuclear or chloroplast DNA insertions occurred during speciation.Four highly variable mitochondrial genes (atp6, cob, nad6, and nad9) were established as marker genes for Triticum and Aegilops species identification.The BB/GG-specific atp6 and cob genes, which were imported from the nuclear genome, could facilitate identification of their diploid progenitors.Genic haplotypes and repeat-sequence patterns indicated that BB was much closer to GG than to Ae.speltoides (SS).These findings provide novel insights into the polyploid evolution of the Triticum/Aegilops complex from the perspective of mtDNA,advancing understanding of energy supply and adaptation in wheat species.

1.Introduction

Polyploidization is a natural phenomenon in plant evolution and a driving force for plant speciation,diversification,and adaptation [1].It is particularly widespread among angiosperms and monocots, and recurrent polyploidization events have been found in all plant species [2].Allopolyploidy and autopolyploidy are the two primary types of polyploids, referring to the involvement of different or the same parents [3].Polyploids possess a robust genetic background and generally show greater fitness than monoploids, making them more likely to survive dramatic climate changes.Plants can obtain new genetic resources through artificial polyploidy via crossbreeding of the parental species.Synthesized hexaploid wheat [4] and tetraploid rice [5] are examples of such resynthesized plants that have proven successful in improving crop yield and establishing new cultivars.

Wheat species (genera Triticum and Aegilops) are excellent models for studying polyploidization.Hexaploid bread wheat(T.aestivum L., AABBDD), one of the top three staple crops worldwide,was generated by two successive allopolyploidization events[6].The first occurred 0.5–0.8 million years ago, between T.urartu(2n=2x=14,AA)and an unidentified BB ancestor,forming the tetraploid T.turgidum L.(2n = 4x = 28, AABB) [7,8].The second event occurred ~8000 years ago between tetraploid T.turgidum ssp.dicoccum and diploid goatgrass, Ae.tauschii (2n = 2x = 14, DD), and gave rise to hexaploid bread wheat[9].The tetraploid T.timopheevii (AAGG) likely evolved less than 0.4 million years ago by hybridization between T.urartu and the GG ancestor [10].

Sequence analysis [11] of numerous nuclear genes has suggested that T.urartu and Ae.tauschii are the paternal ancestors of hexaploid wheat.The unknown BB and GG ancestors have been hypothesized [12,13] to be diploid species in the section Sitopsis,evolutionarily close to Ae.speltoides (2n = 2x = 14, SS).An evolutionary analysis[10]based on wheat chloroplast genomes revealed that SS diverged before AA and DD,and introgression analysis[14]showed that Ae.longissima,Ae.sharonensis,Ae.searsii,and Ae.bicornis(members of the Sitopsis section)were closely related to the DD lineage.

Some Aegilops species have become widespread weeds in the field.Jointed goatgrass, Ae.cylindrica Host (2n = 4x = 28, CCDD),was formed by amphidiploidization of a hybrid between two diploid species: Ae.tauschii Coss.(2n = 2x = 14, DD) and Ae.markgrafii (Greuter) Hammer (syn.Ae.caudata L., 2n = 2x = 14, CC)[15,16].The plasmons (which include mitochondrial [mt] DNA and chloroplast[cp]DNA)of both Ae.tauschii Coss.(syn.Ae.squarrosa L.)and Ae.cylindrica Host reportedly belong to the D type[17],whereas a special accession of Ae.cylindrica was reported to contain a C-type plastome derived from Ae.markgrafii [15].

Analysis of the plasmon permits tracing cytoplasmic DNA evolution following polyploidization in wheat species.Both cpDNA and mtDNA are strictly maternally inherited, making them ideal references for species identification using characteristic markers such as 18S ribosomal RNA.mtDNA undergoes faster structural rearrangement and slower nucleotide sequence mutation than cpDNA [18].Many mitochondrial genes, including atp1, atp4,atp6, atp9, cob, cox3, rpl5, and many other uncharacterized open reading frames(ORFs),may be closely related to cytoplasmic male sterility (CMS) [19,20].The purpose of the present study was to sequence and assemble 19 mitochondrial genomes from Triticum and Aegilops species to gain insights into polyploidization in wheat species from a mitochondrial perspective.These data are expected to further our understanding of adaptations in the mitochondria,the cellular power source,in wheat species during their long polyploid history.

2.Materials and methods

2.1.Plant materials and cultivation

The study employed 19 accessions of five Triticum or Aegilops species: three from T.aestivum (AK58, Lumai 1, and KN9204), five from T.turgidum (Langdon, 81086A, SCAUP, Hoh501, and DM4),five from T.urartu (IX0944 [also known as G1812] [21], IX0946,IX0957, IX0969, and IX-0976), five from Ae.tauschii (AL8/78,CIae23, Y199, SQ523, and Kaifeng 1), and one from Ae.cylindrica(Jinan 190529).Seedlings were cultured at 21 °C in a dark growth chamber at 60% relative humidity.Etiolated seedlings were harvested at two weeks of age.

2.2.Mitochondrial DNA isolation and sequencing

mtDNA was isolated from 3 to 15 g of etiolated leaves per sample following Hao et al.[22]with slight modifications.Crude mitochondrion extract was incubated with DNase I (Beijing Solarbio Science & Technology Co., Ltd., Cat.D8071) for 3 h to further remove nuclear genomic DNA.After the resulting extracts were lysed with proteinase K for 1 h at 50 °C, RNase (Solarbio, Cat.R1030) was added to digest mitochondrial RNA for 1 h at 37 °C.Finally, the Dimer Eraser Kit (Beijing Tsingke Biotech Co., Ltd.,Cat.TSP301) was used to remove small DNA and RNA fragments and to purify the mtDNA.mtDNA integrity and concentrations were measured with a Fragment Analyzer 5400 (Agilent, Santa Clara,CA,USA).Libraries with fragment sizes of ~350 bp were then sequenced on the NovaSeq-6000 platform(Illumina,San Diego,CA,USA) by the NovoGene Company (Beijing, China) following standard protocols.

2.3.Mitochondrial genome assembly

Raw sequence reads were first assessed for quality with MultiQC v1.12 [23].Adapter sequences and low-quality bases were removed with Trimmomatic v0.39 [24].K-mer depth was evaluated using custom scripts (Suppl.File 1: Additional Methods).Clean reads were assembled into contigs using SPAdes v3.14.1[25] with default parameters and multiple k-mer sets (47, 63, 89,99, and 107).The resulting de Bruijn graphs were resolved and integrated into pseudomolecules using graphical fragment assembly with Bandage v0.9.0[26].Assembly errors were further identified and corrected by mapping sequencing reads back to the pseudomolecules (more assembly details in Suppl.File 1).

2.4.Mitochondrial genome annotation

The assembled mtDNA sequences were annotated using both ab initio predictions and homology-based alignments.For ab initio predictions,GeSeq[27]was used to identify protein-coding genes,tRNAscan-SE v2.0.9 [28] was used to identify tRNAs, and Barrnap v0.9(https://github.com/tseemann/barrnap)was used to annotate rRNAs.For homology-based alignments, known mitochondrial genes from closely related species were collected from the NCBI Organelle Genome Resources database (Table S1).Gene evidence from prediction and alignment data were integrated by manual curation.Pseudogenes were identified based on BLAST hits of regions ≥100 bp.Mitochondrial genome maps were plotted with OGDraw v1.1.1 [29].

2.5.Comparative mtDNA analysis

Whole mtDNA sequences were aligned with MUMmer [30] to identify sequence variations between and among species.Genic regions were then aligned with MUSCLE v5.1 [31].To identify intergenic sequence differences, we developed a new program,RepFinder (see Data availability), that reveals shared and speciesspecific repeats among all assemblies.Tandem repeats were identified with trf v4.09 [32].Haplotypes were calculated with DnaSP v6.12.03 [33] and plotted with Network v10.1.0.0 program.

2.6.Mitochondrial gene evolution in Triticum and Aegilops

The assembled sequences were searched against the online NCBI nucleotide database with default parameters.All closely related mtDNA sequences were retrieved: T.aestivum MH051716 and NC_036024, Ae.speltoides AP013107, T.timopheevii AP013106, Elymus sibiricus MZ202552, Hordeum vulgare MN127982, and Thinopyrum obtusiflorum OK120846.Twentyseven single-copy protein-coding genes were selected for phylogenetic tree construction.Sequence alignment and tree construction were performed with MEGA v11.0.13[34].The tree was built using the maximum likelihood method with 1,000 bootstrap replicates.The consensus tree was visualized using iTOL v6.5.2 [35].

3.Results

3.1.Sequence assembly of Triticum and Aegilops mitochondrial genomes

We sequenced mtDNA from 19 Triticum/Aegilops plants,namely one Ae.cylindrica(CCDD),three T.aestivum(AABBDD),five T.turgidum(AABB),five T.urartu(AA),and five Ae.tauschii(DD)accessions(Table 1).The NovaSeq6000 platform was used for sequencing,generating 2×150-nt paired-end reads totaling > 2 Gb of data.Of the raw reads, 30%–50% were verified to be mtDNA, whereas the remaining reads were chloroplast or nuclear DNA contaminants.These results were confirmed with a preliminary k-mer frequency analysis (Suppl.File 1: Additional Methods; Suppl.File 2; Fig.S1).Two accessions, KN9204 (AABBDD) and Y199 (DD), had low mtDNA read yield, owing primarily to the small number of etiolated seedling leaves (Suppl.File 3).

Table 1 Triticum and Aegilops mitochondrial genome assemblies generated for this study.

A multi-k-mer assembly strategy was used to assemble the reads into contigs.To further resolve the complicated Y forks in the de Bruijn graph caused by repeated mitochondrial sequences,the distance information for each read pair was used to correctly link adjacent contigs within a distance of < 500 nt (Suppl.File 1:Additional Methods).This resulted in 19 gapless mtDNA assemblies(Table 1),all of which were in circular conformations,as supported by read pairs mapped to the starting and ending positions.Assembly errors were further evaluated based on k-mer depths and fragment sizes (Suppl.File 1: Additional Methods, Figs.S2–S4).Metadata for the final assemblies are presented in Table 1(see details in Suppl.File 3; Figs.S5–S9).Syntenic sequences were plotted against T.aestivum cv.Chinese Spring (AABBDD) mtDNA(Fig.1).

3.2.Mitochondrial genome annotations

To compare gene sequences across and within species,the gene annotations for our de novo assemblies and previously published mtDNA sequences (Suppl.File 4:Table S1) were manually curated and verified against existing mtDNA genes in the NCBI Organelle Genome Resources database.The total gene number in the mitochondrial genomes varied between species,owing primarily to differences in the copy numbers of atp6, atp8, ccmFN, rrn18, rrn5,rrn26, trnK, trnL, trnfM, trnP, trnQ, and some derivative pseudogenes: rpl2-p, rps19-p, and rrn26-p (Suppl.File 5).Presumed gene flow following speciation and polyploidization events is depicted in Fig.2.Most gene sequences were highly conserved, with 100%identity among accessions of the same species (Suppl.Files 6–8).The most variable genes were atp6,cob,nad6,and nad9(Suppl.File 9), which would accordingly serve as ideal marker genes for species identification.Differences in atp6, cob, and nad6 were caused by nuclear replacements (Tables S2–S4), whereas variations in nad9 were caused by an upstream four-nucleotide insertion that extended the open reading frame (ORF) at the 5′end (Suppl.File 9).The single-exon gene nad6 showed variability at the 3′end and could be used to distinguish between species in the Triticum/Aegilops complex (Suppl.File 9).We did not observe species-specific variations in rrn18, which is commonly used as a marker gene in evolutionary analyses (Suppl.File 7).

Fig.1.MUMmerplot of mitochondrial sequences in Triticum and Aegilops.Red and blue lines indicate respectively forward and reverse direction of the alignments,and gray vertical lines mark the positions of specific genes.atp6,cob,and nad6 were the most highly variable genes in the mitochondrial(mt)genomes of Triticum and Aegilops species.Rearrangements appear to have occurred in atp6-2, whereas transfer of nuclear DNA fragments brought new copies of cob, atp6-1, and nad6 into the BB mtDNA.A fournucleotide (GT)n insertion upstream of nad9 changed the start codon, making the open reading frame longer in the SS/BB/GG lineage.The mtDNAs of T.urartu (AA), Ae.cylindrica (CCDD), Ae.tauschii (DD), T.turgidum (AB), and T.aestivum (AABBDD) were assembled in this study.The reference mtDNA sequences of T.aestivum cv.Chinese Spring, T.timopheevii (AAGG), Ae.speltoides (SS), Ae.longissima, Elymus sibiricus, Thinopyrum obtusiflorum, and Hordeum vulgare were retrieved from NCBI.

Fig.2.Proposed mitochondrial gene flow following polyploidization events.These hypotheses are based on copy-number changes and sequence variations in mitochondrial genes.Symbols:-,gene loss;+,gene gain;*,gene copy number change with the gene name and number change in parenthesis;#,large insertion/deletion,rearrangement,or new gene importation.At(7),a second copy of atp8 was imported from the nuclear genome;mitochondrial nad6 was replaced by a nuclear version,and a 4-bp(GT)n insertion made the nad9 open reading frame longer.At(9)and(10),mitochondrial atp6 was damaged by a mitochondrial DNA rearrangement,then replaced by a nuclear version; at(10), a second copy of atp6 (atp6-1) was inserted, meaning that the BB mitochondrial genome carried two copies each of atp6 and atp8.

3.3.mtDNA inheritance from AABB to AABBDD

The mtDNA sequences were essentially identical between tetraploid and hexaploid wheat species and encoded the same proteins(Suppl.Files 7 and 8), indicating high evolutionary conservation among wheat mitochondrial genomes (Figs.1, S10, and S11).This identity showed that the tetraploid AABB provided cytoplasm for hexaploid AABBDD wheat formation during the second allopolyploidization event ~8000 years ago.Synteny searching revealed 10 blocks in the AABB and AABBDD mitochondrial genomes that were derived from the nuclear genome (Fig.S12; Table S5).BLAST searches suggested that these transfer events occurred in the BB progenitor (Table S5).Transfer of these fragments brought the new genes atp6, orf359, atp8-2, rrn26-p, trnQ-3, and trnK-3 into the mtDNA (Fig.S13).Sequence comparisons with SS and AAGG suggested that the original mitochondrial atp6 gene in BB was damaged by a rearrangement at the 5′end (Fig.S12).However, it was later repaired with a nuclear version of the gene (atp6-2 in Fig.S13: Block 1), causing BB to harbor a mitochondrial copy of atp6 distinct from those of AA,DD,and SS species(Fig.S14).A second copy of atp6 was imported into the mitochondrion by another nuclear insertion(atp6-1 in Fig.S13:Block 5),although the order in which these copies were incorporated is unclear.The repaired copies of atp6 retained the conserved membrane-bound ‘‘ATP_-synt_A’’ domain at the C-terminal, but had an altered N-terminal sequence, which was predicted to be located outside of the membrane (Fig.S14).

The mtDNA of the AABB accession Hoh501 was distinctive within the AABB/AABBDD group.Compared with the mtDNA sequence of Chinese Spring, Hoh501 retained 115 of the 116 variations in the AABB/AABBDD groups(Suppl.File 6),suggesting large differences in the mitochondria.It contained three singlenucleotide polymorphisms (SNPs) in cox2 and nad2 that led to changes in the amino acid sequences, and the pseudogene rps19-p was shorter than in other accessions in the AABB and AABBDD groups.Synteny analysis showed that a specific 2653-bp region was replaced by a 9540-bp nuclear fragment (Fig.S15), which was highly similar (with 99.15% sequence identity) to a portion of a long terminal repeat retrotransposon on chromosome 5B.This replacement meant that Hoh501 had a larger mitochondrial genome than other accessions in this group.This finding was supported by phylogenetic tree and repeat patterns (Figs.3, 4),which suggested that allopolyploidization between AA and BB may have independently occurred at least twice in nature.It suggests genetic exchange between the nucleus and the mitochondria.

3.4.mtDNA inheritance from BB/GG to tetraploid AABB/AAGG

Fig.3.Phylogenetic tree based on 27 single-copy protein-coding mtDNA genes.Three clear clusters are shown: AA, DD, and BB.The mitochondrial genomes of Triticum turgidum(AABB)and T.timopheevii(AAGG)originated from diploids in the BB/SS/GG lineage, which diverged earlier than the AA or DD lineages.An earlier ancestor of DD species donated cytoplasm to Aegilops cylindrica (CCDD), which diverged earlier than Ae.tauschii.Mya: million years ago.

Because the true diploid BB or GG species had not yet been clearly identified, we sought to confirm that the AABB and AAGG mtDNAs were not derived from the AA species.The AA mitochondrial genome was much smaller (392,977 bp) than the AABB(452,522 bp) or AAGG (443,419 bp) genomes (Table 1).Only two SNPs were found in five AA accessions originating from different geographic locations, suggesting high conservation among the AA group (Fig.S16).Sequence alignment based on the MUMmer max-match strategy revealed short, scattered synteny between the AA and AABB/AAGG mtDNAs (Fig.1).The AA mtDNA also displayed many species-specific features,including gene lengths,gene copy numbers, gene absence/presence variations, genic sequence variations, and alternative stop codons in atp4, atp6 (Fig.S14),atp8, cob (Figs.S17 and S18), nad6 (Fig.S19), nad9 (Figs.S20 and S21), rpl5, rps19-p, rrn5, rrn18, trnA, trnK, trnN2, trnP, and trnQ(Suppl.Files 6–8).The phylogenetic tree showed that these species diverged onto two separate evolutionary paths,with the SS/GG/BB lineage diverging prior to AA speciation(Fig.3).The types of repeat sequences were also largely distinct between these two lineages(Fig.4).Thus,the diploid BB/GG progenitors were likely the maternal parents of AABB/AAGG and AA was the paternal parent.

3.5.BB was much closer to GG than to SS

Fig.4.Repeat-sequence patterns in Triticum and Aegilops mitochondrial genomes.The T.turgidum (AABB)/T.aestivum (AABBDD) group at left represents BB mitochondrial genomes (mtDNA).It shows a pattern distinct from those of the AA and DD groups,indicating that the mtDNAs of the tetraploids T.turgidum(AABB)and T.timopheevii (AAGG) are unlikely to have been derived from T.urartu (AA).

Sequence similarity and genic variation showed that the AABB was much more closely related to T.timopheevii (AAGG) than to the Ae.speltoides (SS) species, which was previously thought to be the diploid species phylogenetically closest to the true BB donor among extant plants.Genic sequence alignments showed that,compared to BB, AAGG had only 15 protein-coding variations in the mtDNA, whereas SS had 32 (Fig.5; Suppl.File 7).The genic variation analysis based on chloroplast genes supported this finding as well (Suppl.File 7).The mitochondrial atp6 gene sequences were identical between AAGG(atp6)and AABB(atp6-1 and atp6-2)(Fig.S14; Suppl.Files 7 and 9).This atp6 gene likely resulted from homologous recombination from the GG nuclear genome, because the AA nuclear genome showed no similar sequence.This atp6 gene was also not found in the SS nuclear genome (Table S2), but at least five atp6-like genes with complete ORFs were found in the BB subgenome of tetraploid/hexaploid wheats.Searches against the NCBI NT and NR databases confirmed that this version of atp6 was specific to BB and GG species, although a similar copy was found in Th.obtusiflorum mtDNA, but with 31-nucleotide or 21-amino-acid variations.

Fig.5.Haplotype network of mitochondrial genes in the Triticum/Aegilops complex.An earlier ancestor of Ae.tauschii (DD) donated cytoplasm to form the tetraploid Ae.cylindrica(CCDD),and Ae.tauschii(DD)was divided into three subgroups.The GG ancestor of T.timopheevii(AAGG)was more closely related to the BB ancestor of T.turgidum(AABB) and T.aestivum (AABBDD) than Ae.speltoides (SS).AA, T.urartu.

Seven mitochondrial genes were found to have been transferred from the chloroplast genome.The transfers of most of these genes(trnC, trnS-GGA, trnW, and trnF) occurred early, before the divergence of barley(Fig.2;Suppl.File 5).A fragment of 4523 bp carrying rps7-ct and ndhB-ct was found to have been transferred from the cpDNA to the mtDNA.This event occurred before the SS or AA divergence, leaving all subsequently diverging species with mitochondrial copies of the two genes(Fig.S22).A 1833-bp chloroplast fragment carrying trnA was also copied into the mtDNA of the common ancestor of the SS/BB/GG lineage (Fig.S23).BB and GG species retained two copies of this gene owing to a long sequence duplication in the cpDNA,whereas SS retained only one.It demonstrated a closer relationship between the BB and GG chloroplast genomes (Fig.S23).

3.6.An earlier DD contributed mtDNA to form the tetraploid CCDD

The DD group showed high conservation of genic regions in the mtDNA, but more intergenic variations than the AA or BB groups(Suppl.File 7).Fragment-length analysis confirmed two sequence rearrangements in the Ae.tauschii accessions SQ523 and CIae23(Fig.S24).These findings suggested that mtDNA evolved faster in DD species than in AA or BB species and that subspeciation was responsible.mtDNA sequence synteny confirmed that Ae.tauschii was closely related to Ae.longissima (Fig.S25).

An earlier ancestor carrying the DD genome may have donated cytoplasm to form the tetraploid Ae.cylindrica(CCDD).This notion was supported by the genic variations in CCDD mtDNA, most of which showed consistency with those in the DD mtDNA (Suppl.File 7).However, some AA alleles were still retained in atp4, atp8,ccmFN, and cox2.The DD mtDNA contained a copy of rpl5 and of a unique trnR that were absent in CCDD and AA.trnR-ACG was thus specific to accessions in the DD group and to Ae.longissima(Suppl.File 5).Finally, haplotype network (Fig.5) and phylogenetic tree(Fig.3) showed that the CCDD mtDNA donor diverged earlier than the DD lineage.

Six blocks of mtDNA regions in CCDD species were not present in DD mtDNA (Fig.S26) and were likely transferred from the nuclear genome (Table S3).These fragments, which were transferred after the CCDD polyploidization event,brought nuclear version of the genes cob, cox1, cox3, orf256, trnV, atp1-p, and orf359-p,into the mitochondrial genome (Fig.S27).The cob genes in SS,AAGG (cob-A), and CCDD were longer at the 3′end than those in AA, DD, and Hordeum vulgare.The BB copy of cob was also a long version (Fig.S17), but an 8-bp deletion downstream of the conserved domain restored it to the same size as those in the AA and DD groups (Fig.S18).Patterns of sequence variations for this gene (Figs.S17 and S18) suggested that transfer events occurred in diploid BB, SS, and GG independently.The long version of cob was found in the nuclear genome of only the SS/BB/GG lineage(Table S3),suggesting that the CC donor,Ae.markgrafii,may be closely related to this lineage.

4.Discussion

In this study, we successfully sequenced and assembled 19 mitochondrial genomes using next-generation sequencing, then characterized the gene properties to uncover the evolutionary history of wheat species following polyploidization events.The most challenging component of this study was preparation of pure mtDNA for sequencing library construction.Because the large amounts of pure DNA required for third-generation sequencing could not be obtained, we used an improved strategy on Illumina paired-end reads to resolve de Bruijn graphs and generate gapless mtDNA assemblies.Contaminating nuclear and chloroplast reads could then be easily excluded using read-depth filtering and similarity searches, greatly facilitating gene annotation.

Many studies have shown that mtDNA evolves faster in structure but slower in sequence than cpDNA or nuclear DNA[18],making mtDNA a good tracer for genome evolution.Because the mtDNA in AABB was inherited from BB, we could use it to understand the cell energy supply in BB based on comparative analysis.In the present study,the genic regions of mtDNA were highly conserved across accessions in the AA, DD, and BB groups, although the intergenic regions showed higher species-specific variation.Because mtDNA encodes only an essential subset of proteins required for rapid responses to energy requirements [36], most of the sequence rearrangement and foreign insertions were found in intergenic regions,especially in BB mtDNA.In most cases,intergenic regions can remain damaged with minimal consequences,increasing mtDNA diversity between species.

Mutations or rearrangements can also occur in genic regions.When such occurrences are not lethal, they may result in CMS[37].Nuclear replacements for damaged genes can be imported into the mitochondrial genome to restore gene function.The mitochondrial copies of atp6 in AABB and AAGG species likely resulted from such nuclear replacements, and numerous studies have indicated that this gene is closely associated with CMS [38].We accordingly hypothesize that diploid BB and GG species may have suffered from CMS for some time prior to nuclear gene replacement.CMS would have dramatically reduced the BB/GG population size, leaving these diploid species undiscovered to date.

ATP6 functions as subunit a of the Focomplex,which is the rotator portion of ATP synthase (ATPase).In maize, alterations to this protein increased the amount of free F1complex in the mitochondrial matrix [39].In pepper (Capsicum annuum L.), atp6 silencing increased ATP hydrolysis activity of mitochondrial F1Fo-ATPase and caused pollen abortion [40].These findings suggest that precise mitochondrial gene expression is necessary for normal oxygen utilization and ATP formation.The BB ancestor likely retained two copies each of atp6 and atp8 in the mtDNA,which may have stabilized the F1head in the inner mitochondrial membrane.It might have endowed BB species with the capacity to respond quickly to changes in energy supply requirements and further enhanced phenotypic traits (such as seed size) or adaptability [20,41].

Aegilops cylindrica (CCDD) showed similarities with the BB/GG/SS lineage in some of the most highly variable mitochondrial genes.In particular,cob and nad9 were either imported from the Ae.markgrafii (CC) genome or horizontally transferred from the BB/GG/SS lineage.Based on genic variations of these two genes, we believe the former explanation is more likely,i.e.,the CCDD mitochondrial cob gene,present mainly in the SS/BB/GG lineage,originated in the CC nuclear genome.If so,the SS/BB/GG lineage may be more widespread than previously thought, and there may be other as yet undiscovered members of this lineage.

Given that the SS/BB/GG lineage diverged earlier than the AA or DD groups,species close to the DD lineage,such as Ae.bicornis,Ae.longissima, Ae.searsii, and Ae.sharonensis [14], should be excluded from the search for diploid BB species.SS was not the BB donor,but was the diploid most closely related to BB among currently known species.Although we could not identify the true BB and GG ancestors, primarily due to limited resources, discovery of the BBspecific mitochondrial atp6 and cob genes would bring us to their true ancestors.

Data availability

All raw sequencing data, assembled mitochondrial DNA sequences, and annotations have been deposited in the EBI European Nucleotide Archive (ENA) under accession numbers PRJEB50328 (Triticum aestivum), PRJEB50415 (T.dicoccum and T.turgidum), PRJEB50416 (T.urartu), and PRJEB50417 (Aegilops tauschii and Ae.cylindrica).All scripts and supplementary files used in this study are available at https://github.com/lufuhao/mitochondria-simulator (RRID: SCR_023430).The source code for RepFinder is available at https://github.com/lufuhao/repfinder(RRID: SCR_023429).

CRediT authorship contribution statement

Hui-Lin Hu: experiments, data analysis;Fan Zhang: programming design;Pei Wang:algorithm design;andFu-Hao Lu:project administration, data curation, writing the manuscript.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

We are grateful to A.Bently (Consultative Group for International Agricultural Research), R.Horsnell (The National Institute of Agricultural Botany), and F.Longin (University of Hohenheim)for information about the origin of Triticum turgidum Hoh501.We thank A.Levy and M.Feldman (The Weizmann Institute of Science) for species identification of Aegilops cylindrica Jinan 190529 and H.-Q.Ling (Chinese Academy of Sciences) for Triticum urartu seeds.This work was supported by the 111 Project(D16014).

Appendix A.Supplementary data

Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2023.09.011.