A case study of a micro-inversion event in dark brown fibre cotton (Gossypium hirsutum)
2020-08-26TinwngWenTinYoChunyunYouZhongxuLin
Tinwng Wen, Tin Yo, Chunyun You, Zhongxu Lin,
aNational Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070,Hubei, China
bCotton Research Institute,Shihezi Academy of Agriculture Science,Shihezi 832003,Xinjiang,China
ABSTRACT Structural variation is a major type of genetic variation that can potentially induce powerful genetic effects. In this study, we examined the Inv(A07)p1.09p2.23 genetic inversion in brown fibre cotton at the individual and population genetics levels. A dark-brown fibre mutant that resulted from a distant hybridization between Gossypium barbadense and G.hirsutum, and a natural population including 30 dark-brown, 70 light-brown and 21 white fibre cotton accessions were collected to perform a functional study of this micro-inversion.The results showed that Inv(A07)p1.09p2.23 can be detected by high-throughput resequencing method, and induce micro-deletion, gene disruption (Ghir_A07G000980) and abnormal gene expression in the breakpoint regions. Inv(A07)p1.09p2.23 existed in only dark-brown fibre cotton, had undergone negative selection in elite brown fibre cultivars,and was significantly associated with fibre colour and nine fibre traits. In the Inv(A07)p1.09p2.23 region, nucleotide diversity was lower, recombination was absent, and linkage disequilibrium was higher. Overall, this inversion event in dark-brown fibre cotton produced significant genetic effects, and this study will guide us to better understand the genetic effects of inversion events in dark-brown fibre cotton.
1. Introduction
Genetic variants generally include three types, single nucleotide variants (SNVs), insertions/deletions (InDels) and structural variants (SVs), which are common in humans and contribute to complex traits [1]. Chromosomal inversion, one type of SV,is usually defined as a simple event involving two breakpoint rearrangements that changes the gene order on a chromosome [2]. Such inversions affect environmental adaptation, genome divergence and evolution, and complex traits[3]. In humans, a number of cases of complex disease have been confirmed to carry large inversions; simultaneously,genomic fragment deletion and duplication,and gene disruption have been found in inversion cases [4-7]. In animals,inversions can induce the differentiation and adaptation of species to environments because the inverted segment suppresses recombination [8,9]; for example, a mega-base inversion was found to be associated with the trans-oceanic divergence of Atlantic cod ecotypes [10]. In plants, a recent inversion was found to have captured multiple linked quantitative trait loci (QTL) under selection in Boechera stricta[11]. However, relatively few inversion studies have been reported in crops. With the development of reference genomes, whole genome resequencing technology and computational methodology [12-14], it has become possible to conduct functional studies of genetic inversion in crops, for example, the LUMPY software can detect the structural variations from signals of whole genome resequencing data including read-pair,split-read and read-depth[15].
In cotton, genetic inversions have been roughly detected by high-density genetic linkage maps with simple sequence repeat (SSR) markers [16-18]. Presently, the next generation sequencing (NGS) method provides a promising and accurate approach to detect SVs.With the publication of the reference genomes of two diploid(G.arboreum and G.raimondii)and two tetraploid cottons (G. hirsutum and G. barbadense) [19-21],comparative genomics allow the detection of large-scale rearrangements to explore the history of genome evolution among G.arboreum(A2),G.raimondii(D5),G.hirsutum(AD1),and
G. barbadense (AD2). Ultra-dense genetic and physical maps constructed by the NGS method were used to identify 19 inversions in an interspecific population of tetraploid cotton[22]. Because a number of genetic inversions have been discovered in cotton, it is important and interesting to study their genetic effects.
As previously reported, genetic inversion is common during interspecific genome recombination in cotton [22]. Lc1is a dominant locus controlling fibre colour, quality and yield in brown fibre cotton[23],and a genomic inversion(named as Inv(A07)p1.09p2.23) has been identified around Lc1[24], but the detection process and the genetic effects of Inv(A07)p1.09p2.23 have not been well demonstrated. Understanding the genetic effects of this inversion on Lc1is a key to dissecting the genetic architecture of brown fibre cotton.
In this study, we performed a general study of this microinversion in the genetic background of a dark-brown fibre mutant,and natural dark-brown,light-brown and white fibre cotton lines: (1) Inv(A07)p1.09p2.23 was identified by highthroughput resequencing method, inversion markers were designed, and the distributions of Inv(A07)p1.09p2.23 were tested; (2) the physical position and breakpoints of Inv(A07)p1.09p2.23 were fine-mapped; (3) the expression levels of genes in Inv(A07)p1.09p2.23 were evaluated between brown fibre and white fibre cotton; and (4) the nucleotide diversity and linkage disequilibrium (LD) between the inversion region and normal region in the brown fibre cotton population were analyzed. This study is beneficial for increasing our understanding of the genetic effects of inversion in cotton.
2. Materials and methods
2.1. Materials
To examine the inversion event and its genetic effects,HD208(G. hirsutum), Pima90-53 (G. barbadense), and a dark-brown fibre mutant(ys)were collected.The ys mutant was generated from the hybridization between Pima90-53 and HD208.Previously, the F2population between ys mutant and HD208 have been developed and the genetic linkage map of brown fibre locus have been constructed [23]. In addition, to study the effects of this inversion at the population genetics level,a 121-accession panel including 70 light-brown fibre, 30 darkbrown fibre, and 21 white fibre accessions was genotyped with inversion markers and phenotyped in multiple environments in our previous study(Table S1)[23].
2.2.Detection of genetic variations
The young leaves of three accessions (HD208, Pima90-53 and ys) were collected and extracted for genomic DNA. We performed resequencing of these three accessions in 30×genomic coverage with the Hiseq 2000 platform. The clean data was deposited in the NCBI databank (PRJNA412456), and aligned with the G. hirsutum reference genome [21] by BWA v.0.7.10.The Binary Alignment/Map(BAM)files were processed with the modules of GATK v.3.1.1 [25], the resulted SNPs and InDels were filtered with parameters. Lumpy v0.2.13 [15],SVtyper (https://github.com/hall-lab/svtyper) were applied to call and genotype the structural variations, and VCFtools v.0.1.14 was applied to filter the structural variations with parameter (minQ >200). The major pipeline of the SV analysis was deposited in website(https://github.com/tianwangwen888/SV-analysis/blob/master/SV_calling.sh). The BAM files of the breakpoint border of the inversion were viewed by Integrative Genomics Viewer(IGV)software[26].
The genetic inversion nomenclature starts with “Inv”,followed by the chromosome name and the physical location of two breakpoints in short arm(p)or long arm(q).
2.3.Inversion and transcript analyses
Young leaves from HD208, Pima90-53, ys, and the 121 accessions in the brown fibre cotton population were used to extract DNA by the modified CTAB method [23]. The polymorphic inversion marker primers (Table S2) were used to genotype the inversion event and visualized on agarose gel.The PCR products amplified from ys by the inversion marker primers were cloned into the PGEM-T (Promega, WI) vector and sequenced by the Sanger method. Sequence alignment analysis was performed using DNAMAN 6.0 software.
To test the transcript levels of the disrupted gene induced by inversion,total RNAs were extracted from fibres at 5 days post anthesis (DPA) using a DP432 plant RNA kit (Tiangen Biotech,Beijing).Approximately 3 μg of RNA was reverse-transcribed by SuperScript III reverse transcriptase(Invitrogen,Cat.No.18080-093, Waltham, MA, USA) in a 20 μL reaction mixture. The primers designed to test the transcript levels by reverse transcription PCR are listed in Table S2. On the other hand, in order to check the gene expressions around the inversion region, the published RNA-seq data of 8 DPA (SRR3715274 and SRR3715275)and 20 DPA(SRR3715276 and SRR3715277)samples of brown fibre and white fibre cotton were downloaded from the NCBI database [24]. TopHat 2.0.13 [27] was applied to align the clean reads to the TM-1 reference genome[21],the mapped BAM file was filtered and treated by SAMtools 1.3[28],and HTSeq 0.8.0[29]was applied to calculate the read counts of each gene in each sample, finally, the read counts (read count >50 at least one sample)and fold change(absolute value of log2(fold change)>2)between brown and white fibre at two stages were applied to evaluate the genes with significant expression difference.
2.4. Population diversity and LD analyses
The re-sequencing data of the brown fibre cotton population were deposited under the accession number PRJNA412456 in the NCBI databank.The high-quality clean reads were aligned to the TM-1 reference genome [21] using BWA software. SNP calling was conducted by GATK v.3.1.1,SAMtools and BCFtools software as described in our previous report [30]. A subset of SNPs in the inversion region was extracted from the total SNPs with minor allele frequency(MAF)above 0.05 in the 121-accession panel[30],and the nucleotide diversity of the target region was calculated by VCFtools 4.0.The LD of the inversion and normal groups in the target region was calculated and plotted by PopLDdecay [31].
3. Results
3.1. Identification of genetic variations by high-throughput resequencing
A total of six kinds of genetic variations were identified among the HD208, Pima90-53 and ys mutant by highthroughput resequencing method. In summary, 13,869,521 SNPs,3,615,501 InDels and 52,258 SVs(Inversion,Duplication,Deletion and Break-end) were identified (Fig. 1, Table 1).Results showed that the SNPs and InDels are common in genome. In comparison, the SVs showed relatively rare in genome-wide scale. We focused on the 134 inversions and found an inversion event in A07: 1.09 Mb (Table S3), which nearly located in the brown fibre locus (Lc1). Previously, this inversion has been reported [24], but it did not show any detected methods and precise breakpoint location. Here, the results confirmed the efficacy of SV detection method and provided a physical location (A07: 1.09-2.23 Mb) for the inversion events(named as Inv(A07)p1.09p2.23)(Fig.1).
Fig.1-High-density genetic variation map among HD208,Pima90-53 and ys mutant.(a)The karyotype of G.hirsutum reference genome and band mark of Inv(A07)p1.09p2.23.(b)SNP distribution in genome-wide scale.(c)InDel distribution in genomewide scale.(d) SV distribution in genome-wide scale.
3.2. Micro-deletion, gene loss and abnormal gene expression associated with Inv(A07)p1.09p2.23 in brown fibre cotton
Two pairs of primers (Pair1 and Pair2 in Table S2) were designed to cover the inverted breakpoints, and the distribution of the inversion event in natural cotton accessions was checked with multiple combinations of inversion markers(Pair1-Pair6). The inversion marker combination, Pair5 and Pair6,amplified genomic fragments from the dark-brown fibre mutant(ys),but no fragments were amplified from Pima90-53 and HD208.Pair3 and Pair4 showed no amplification products in ys, Pima90-53 and HD208. These results showed that the inverted segment existed in ys, and it was in opposite direction and rotated 180°compared to the reference genome,and the sense strand in the reference genome became the antisense strand in ys(Fig.2a).The PCR products amplified by Pair5 and Pair6 in ys were cloned and sequenced, and the sequences were aligned with the reference genome. The results showed that the left breakpoint was located at A07:1,091,606-1,092,243,and a 638-bp segment was deleted,while the right breakpoint was at A07:2,230,195-2,230,279,and a 67-bp segment was deleted(Fig.2a).
Table 1-Six kinds of genetic variations measured at the genome-wide scale among HD208,Pima90-53 and ys mutant.
Fig.2-Inversion structure of Inv(A07)p1.09p2.23 and gene disruption of Ghir_A07G000980.(a)Inversion structure in the darkbrown fibre mutant.“+” and“-” indicate the sense and antisense strands in the reference genome,respectively.The stars indicate the truncated fragments.The green colour indicates the inverted fragment.The numbers means the physical position in genome.The arrows indicate the direction of alignment with the reference genome.(b)Structure of Ghir_A07G000980 disrupted by breakpoints and micro-deletion.(c)Transcript of Ghir_A07G000980 derived from RNA of HD208,ys,and Pima90-53.Gh_ub7 was used as the control.
Analysing the inversion region revealed that the right breakpoint border of the inversion lay in the intergenic region,while the left breakpoint border and a 638-bp fragment lay in a gene (Ghir_A07G000980) body region (Fig. 2a, b), we also observed that the re-sequencing data of the breakpoint regions were absent in the ys mutant by IGV view(Fig.S1).Primers were designed to check the transcript expression of Ghir_A07G000980 in ys, Pima90-53 and HD208 (Table S2). Ghir_A07G000980 showed significant transcript level differences between ys and HD208/Pima90-53(Fig.2c).PCR product clones also showed that no transcript of Ghir_A07G000980 was detected in ys. On the other hand,transcript analysis between the published RNA-seq data of brown fibre and white fibre samples at 8 DPA and 20 DPA showed that the genes around the breakpoint were affected significantly; considering the read count and fold change, the significantly affected genes around the breakpoint included Ghir_A07G000950 (pectin lyase-like superfamily protein in Arabidopsis),Ghir_A07G000960(oxidative stress 3 in Arabidopsis),Ghir_A07G000970 (MAP kinase kinase 6 in Arabidopsis) and Ghir_A07G000980 (protein of unknown function) in the left border. Ghir_A07G000950 may have an important role in fibre production, because it related to cell wall modification.Ghir_A07G002090 (TT2 gene) in the right border, which have been previously identified as the dominant gene of brown fibre[23](Table S4).
3.3.Genetic relationships revealed by genetic variations in Inv(A07)p1.09p2.23 in a population
Based on the genetic variations of the inversion region, we investigated the genomic consistency of the inverted fragment in natural cotton accessions. Principal component analysis (PCA) and phylogenetic analysis were performed with 30 dark-brown, 70 light-brown and 21 white fibre cotton accessions. Both PCA and the phylogenetic tree showed that the dark-brown, light-brown and white fibre accessions were clearly separated (Fig. 3), and the dark-brown fibre cotton group had a more distant relationship from the light-brown and white fibre groups (Fig. 3b), which indicated that genetic variations in Inv(A07)p1.09p2.23 can generally distinguish different fibre colour groups.
Fig.3-Genetic analyses of dark-brown,light-brown and white fibre cotton accessions based on the Inv(A07)p1.09p2.23 region.(a)Principal component analysis of the natural cotton accessions.(b) Phylogenetic tree of the natural cotton accessions.The ovals indicate the dark-brown fibre cotton clusters.
3.4. Fibre colour, quality, yield and negative selection associated with Inv(A07)p1.09p2.23 in a population
To analyse the distribution of the inversion event in natural cotton accessions, the polymorphic inversion markers were applied to genotype the 121-accession panel. The population genotypes demonstrated that only 27 of the 30 dark-brown fibre accessions harboured the inverted segment (Table S1,Fig. 4a). Nine fibre traits of the 121-accession panel showed that the accessions in the inversion group had lower quality and yield than the accessions in the normal group (P <0.01,Student’s t-test) (Fig. 4b). An examination of eighteen elite brown fibre cultivars showed that only one dark-brown fibre cultivar, Zhongmian 81, harboured the inverted segment; the other 17 elite cultivars did not harbour the inverted segment(Table S1). The ratio of inversion accessions in 121-accession panel is 27%,however,the ratio of inversion in elite cultivar is only 5.5%.The distribution of the inverted fragment in brown fibre cottons suggested that the mega-base inversion event had undergone negative selection in breeding of elite brown fibre cultivars.
Fig.4-Distribution and phenotypic effects of Inv(A07)p1.09p2.23 in cotton.(a)Distribution of the inversion in the 121-accession panel.(b) Differences in nine fibre traits between the inversion and normal groups.
3.5.Genetic effects of Inv(A07)p1.09p2.23 in the population
Based on our previous study, we have identified a cosegregated region around Lc1in a linkage population constructed by ys mutant and HD208, therefore, we mapped the co-segregated markers to reference genome,and found that it located in Inv(A07)p1.09p2.23 region (Table S5). It suggested that no recombination event was detected in inverted region in the linkage population.To analyse the genetic effects of Inv(A07)p1.09p2.23 in the natural brown fibre cotton population,a subset of 8534 SNPs (MAF >0.05) in A07: 0-4.0 Mb was extracted to perform genetic diversity analysis. The 121 accessions were classified into an inversion group,containing 27 dark-brown fibre accessions, and a normal group with 3 dark-brown, 70 light-brown and 21 white fibre accessions. The nucleotide diversity around the inversion region was lower in the inversion group than in the normal group, and the nucleotide diversity in A07: 3.5-4.0 Mb,which is outside the inverted region, was higher in the inversion group (Fig. 5a). The nucleotide diversity ratio between the inversion and normal groups was consistent with this trend(Fig.5b).
LD was performed between the inversion and normal groups in the region of A07: 1.09-2.23 Mb, and the LD decay curve showed that the inversion group was always located above the normal group. LD was approximately 61 kb in the normal group (r2= 0.1) and 192 kb (r2= 0.1) in the inversion group(Fig.5c).
Fig.5-Nucleotide diversity and linkage disequilibrium(LD)between the inversion and normal groups.(a)Nucleotide diversity in the inversion and normal groups.The red fitted curve represents the inversion group.The black fitted curve represents the normal group.Pi,nucleotide diversity.(b)Nucleotide diversity ratios in the inversion and normal groups.The pink curve is fitted to the single nucleotide diversity ratio. (c)LD comparison between the inversion and normal groups.
4. Discussion
4.1. Concurrent genetic variations accompanied by inversion events in dark-brown fibre cotton
From published reports,SNPs and InDels have been identified in a MYB gene (Ghir_A07G002090) related to the brown fibre phenotype [23,32] that is located near the Inv(A07)p1.09p2.23 breakpoint(Table S3).Thus,dark-brown fibre cotton generally harbours multiple kinds of genetic variations,including SNPs,InDels, large fragment losses and an inversion (Fig. 2). In this study,we found that 27 out of 30 dark brown fibre accessions harboured the inversion fragment, the exceptional 3 dark brown fibre accessions may result from different mutation event.The ys mutant was derived from a distant hybridization between G. hirsutum cultivar HD208 and G. barbadense acc.Pima90-53 [23], and dark-brown fibre cottons harbour the genomic signature of G. barbadense, suggesting that distant hybridization may induce diverse genetic variations in this region. However, a recombination hotspot was also found near the inversion region [23]. Therefore, this inversion may induce abnormal recombination in the linkage population.Distant hybridizations are known to induce frequent doublestrand breaks (DSBs) during recombination [33]. These DSBs can be repaired by two major repairing systems:homologous recombination and nonhomologous DNA end joining (NHEJ)[34].Considering the major genetic effects of the variations in Lc1,the NHEJ repair process might be implicated in DSB repair at this locus.NHEJ is a major pathway of DSB repair in higher plants; it functions independently of homologous sequences and can induce genetic variations.
4.2. Relationship between the inversion event and fibre traits of dark-brown fibre cottons
Fibre colour has been considered to have a negative relationship with agronomic traits;the colour intensity of brown fibre accessions significantly decreased the fibre length,fibre unity,fibre strength, lint percentage and weight [23,35]. Here, the inversion group showed a significant difference from the normal group in the genome (Fig. 3) and in nine fibre traits(Fig. 4); thus, negative selection existed in the dark-brown fibre cotton.In humans,one inversion is rare in Africans but is common and undergoing positive selection in Europeans[36];as reported, loss of function in the AP3B1 gene when it is disrupted by a chromosome 5 inversion could induce Hermansky-Pudlak syndrome 2 [37]. Interestingly, the inversion induced the abnormal expression of genes(Table S4)and Ghir_A07G000980 transcript loss (Fig. 2b, c) in the breakpoint region of the dark-brown fibre accessions,which may indicate loss-of-function mutations in dark-brown fibre cotton. In the future, we can apply the CRISPR/Cas9 system to create targeted inversion mutations to confirm the biological function of Inv(A07)p1.09p2.23[38].
4.3.Nucleotide diversity is reduced and LD is increased within the inversion
Nucleotide diversity is defined as variation in a specific locus in a pairwise comparison,and it is an important parameter in populations and species [39]. Various factors (natural selection, artificial domestication and physical location of nucleotides in the genome) can shape nucleotide diversity [40-43].LD, which can determine the association mapping resolution to uncover genes related to complex traits, is another important attribute for population genetics [44]. In higher plants, LD is principally determined by population size,inbreeding,genetic isolation,recombination rate,and natural and artificial selection[45].
Here, the inversion group showed lower nucleotide diversity and higher LD than the normal group around the inversion region (Fig. 5), and the inversion region did not undergo recombination (Table S5); thus, the recombination rate is the key factor for nucleotide diversity and LD. An inversion can induce recombination suppression, one example is that crossing-over has not been observed in a region adjacent to inversion breakpoints in Drosophila pseudoobscura [46]. In this study, the inversion occurred in A07: 1.09-2.23 Mb, and the nucleotide diversity decreased in this region, suggesting that this inversion event could affect the diversity of adjacent regions because it forms an inversion loop in individuals heterozygous for the inversion compared with recombination in the same segment in homozygous normal controls [47,48].
In conclusion, the inversion event Inv(A07)p1.09p2.23 was significantly associated with fibre colour and agronomic traits. Because Inv(A07)p1.09p2.23 induced a series of genetic effects, such as the abnormal expression of genes in the breakpoint region, decreased nucleotide diversity and increased LD, this inversion has undergone negative selection in breeding of elite brown fibre cultivars. Uncovering the genetic effects of Inv(A07)p1.09p2.23 is beneficial for breeding better brown fibre cottons.
Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2020.02.002.
Declaration of competing interest
The authors declare no conflict of interest.
Acknowledgments
This work was supported by the Fundamental Research Funds for the Central Universities (2662015PY097) and the Breeding of New Early Maturing and High-quality Coloured Cotton Varieties(2016HZ09).
Author contributions
LZX designed this study and revised the manuscript, YCY from Shihezi Academy of Agriculture Science provided 100 brown fibre accessions, WTW and YT performed experiment and data analyses, and WTW drafted the manuscript. All the authors read and approved the final version of the manuscript.
杂志排行
The Crop Journal的其它文章
- Development of oligonucleotide probes for FISH karyotyping in Haynaldia villosa, a wild relative of common wheat
- Identification of herbicide resistance loci using a genome-wide association study and linkage mapping in Chinese common wheat
- QTL mapping of adult plant resistance to stripe rust and leaf rust in a Fuyu 3/Zhengzhou 5389 wheat population
- Multi-environment QTL mapping of crown root traits in a maize RIL population
- Profiling of seed fatty acid composition in 1025 Chinese soybean accessions from diverse ecoregions
- Breeding effects on the genotype × environment interaction for yield of durum wheat grown after the Green Revolution: The case of Spain