APP下载

Numeral Nomenclature Proposed for Microhaplotype Alleles to Exert Eff icient Forensic Applications

2022-12-22SONGJiaojiaoZHANGChiKANGKelaiCHENQingfengJIAnquanYEJianWANGLe

刑事技术 2022年6期

SONG Jiaojiao, ZHANG Chi, KANG Kelai, CHEN Qingfeng, JI Anquan, YE Jian, WANG Le

(National Engineering Laboratory for Forensic Science, MPS’ Key Laboratory of Forensic Genetics,Institute of Forensic Science, Ministry of Public Security (MPS), Beijing 100038, China)

ABSTRACT: Microhaplotypes, an emerging type of forensic genetic marker, have been being used for individual identif ication, ancestry inference and mixture deconvolution as they were incessantly explored and developed. Nevertheless,the concise microhaplotype allelic names have not yet been suggested although there are standardized nomenclatures that were proposed for microhaplotype loci. Here, a proposal was put forward for discussion about microhaplotype alleles being designated with Arabic numerals. For a microhaplotype consisted of single nucleotide polymorphisms (SNPs), the SNPs are locally ordered with their positions in human genome, and have the relevant RefSNP alleles in the dbSNP database be accepted as their possible genotypes. Microhaplotype alleles are allowed to list in every possible combination of the RefSNP alleles before they are arranged in alphabetical order. The ordered alleles are subsequently named with consecutive positive integers starting from 1. Such a nomenclature would be convenient for forensic applications, especially the mixture deconvolution, capable of being enrolled into the software for forensic genetic calculations including PowerStats, Arlequin and STRUCTURE.

KEY WORDS: forensic DNA; microhaplotype; allele; nomenclature

1 Introduction

Microhaplotypes are combinations of two or more closely linked single nucleotide polymorphisms(SNPs)enclosed within 300 bp DNA fragmental sequence[1-3].As an emerging type of genetic marker,microhaplotypes are potential to exert multiple forensic applications, e.g.,individual identification[1-2], family/clan relationship analysis[1,4], ancestry inference[1,5-9]and mixture deconvolution[5,9-10].

In 2016, Kidd proposed the nomenclature for microhaplotype loci[11]. Instead of using the complicated rs numbers of SNPs, it suggests that microhaplotypes be named with standardized symbols starting from the letters “mh” followed with chromosome number,lab designation and a lab-specific number. Chen and colleagues introduced a different nomenclature for microhaplotype loci, yet also admitted that they recognized Kidd’s proposal[12]. A widely acceptant nomenclature and a unique name for each locus will definitely assist microhaplotype research and data handling. However, there have not been suggested concise allelic names for microhaplotypes till present.

2 Materials and methods

2.1 Samples

This work adopted four DNA samples: two donated ones plus two commercial genomic products: 9947A(Thermo Fisher Scientific, Waltham, MA, USA) and 2800M (Promega, Madison, WI, USA), having been approved by the Ethical Review Board of Institute of Forensic Science, Ministry of Public Security of China.DNA donors have given their written informed consent.

2.2 Sequencing experiments

mh21KK-320amplification was carried out with the forward and reverse primers:5′TGACTGGGAGGCTGTGGAGA 3′ and 5′TGCTGGAATTAGAGGCGTGA3′. Libraries were prepared with 1 ng of inputgenomic DNA to have undergone into the treatmentof TruSeq DNAPCR-Free HT Sample Preparation Kit (Illumina, SanDiego, CA,USA), successively being diluted to 20 pM for a singlerun sequencingwith MiSeq ReagentNano Kit (Illumina)on a MiSeqFGx machine (Illumina) so that the reads of 250 bases were thus brought forth. The MHTyper software was selected for microhaplotype allele callingand read counting[13], with the sequencing depth threshold set at 50 reads.

2.3 Proposal for nomenclature of microhaplotype alleles

Microhaplotype alleles were here to tentatively designate with Arabic numerals as follows: the forward strand in human genome was to alignmicrohaplotypes;GRCh38, as the most up-to-date sequence assembly until this writing, was therefore to take as the reference sequences; the nomenclature was therewith to: (a) list the genotype-based alleles relating to a microhaplotype locus, (b) establish a unified order of alleles, and (c)nominate alleles with Arabic numerals. To be concrete,SNPs, consisting of a microhaplotype, were to be ordered by their positions in human genome, having the RefSNP alleles in the dbSNP database (https://www. ncbi.nlm.nih.gov/SNP/index.html) accepted as their possible genotypes and all the microhaplotype alleles listed in every possible combination of the RefSNP alleles.The listed microhaplotype alleles were to arrange in alphabetical order, having been named with consecutive positive integers starting from 1. Rare microhaplotype alleles, resulting from SNP genotypes not listed as RefSNP alleles in the database, would still be denoted with the combination of consisting SNP genotypes and placed behind all the numeral-named alleles within the locus.

3 Results

Microhaplotypes, being combinations of SNPs, are ordinarily straightforward to name their alleles through combining the genotypes of their consisting SNPs, as did in previous publications[1,12,14-17]. When a microhaplotype contains only two SNPs, these genotype-based allelic names are adequate. Whereas,as a microhaplotype consists of more SNPs, the all-elic names will become long, complicated, and even similar to each other. As an example, a simulated DNA mixture was prepared with 9947A plus 2800M and two donated samples R1 and R2(9947A∶R1∶R2∶2800M = 1∶2∶4∶8). The f ive samples (9947A, 2800M, R1, R2 and their mixture made above) were sequenced and genotyped at the locus mh21K-K-320. As illustrated in Fig.1A, the mixture caused trouble to analyze with such genotype-based allelic names even if the sequencing data were f ine. It is too complicated to adopt the series of base symbols as a genotype, leaving the allelic information unable to f igure out due to failure of credible comparisons into which may take a lot of time and likely lead to mistakes.

Multiallelic is an essential advantage of microhaplotypes compared to SNPs. The effective number of alleles (Ae) and informativeness (In) tend to be higher for the microhaplotype loci that cover more SNPs (Fig.2). These loci are more useful in forensic applications, yet the complicated allele names are unacceptable.

Fig.2 Histograms of Ae (A) and In (B) for 2~5 microhaplotypes consisting of different numbers of SNPs (Data for 130 microhaplotypes[18] were used for plotting. Error bars: standard deviations)

Table 1 and 2 listed the respective microhaplotype alleles of locus mh21KK-320 and mh07KK-081 plus their corresponding numeral names, with the latter locus containing an insertion-deletion site. This nomenclature can be utilized to generate unique names for alleles of any microhaplotype locus to be fit after a look-up about the RefSNP alleles of their consisting SNPs in dbSNP database. To make it more convenient, the allelic names for another 130 reported microhaplotypes[18-19]were presented in supplementary materials (Table S1).Besides, this nomenclature had been already integrated into the software MHTyper of microhaplotype data analysis[13]so that the conversion would be convenient to perform from genotype-based allelic names to the numeral representing ones.

Table 1 Proposed numeral names of mh21KK-320 alleles

Table 2 Proposed numeral names of mh07KK-081 alleles

According to the nomenclature proposed above, the genotype-based allelic names in Fig.1A were substituted with the numeral allelic ones in Fig.1B. Although it is relatively simple to show the genotype of mh21KK-320 of the four independent samples, as illustrated in Fig.1A, yet their mixture is time-consuming and confusing to demonstrate with genotype-based allelic names.However, it is facile to recognize the genotypes of each sample as designated with numeral name in Fig. 1B.

Fig.1 Histograms of sequencing depths for the locus mh21KK-320, with the alleles being named by bases (A) and digital numbers (B) (Horizontal axis, read counts)

4 Discussion

Proper and concise names of alleles are essential for forensic genetic markers to exert effective applications.Due to the forensic application of next generation sequencing, STRs have been accepted as sequence-based genetic markers rather than the length-based, having hence brought expectation about the nomenclature for those sequence-based STR alleles and discussed extensively[20]. Microhaplotypes, as one new and promising type of genetic marker, require well-accepted nomenclature for both loci and alleles. In this work, a proposal was put forward on numeral nomenclature for microhaplotype alleles to designate. Its suitability for forensic application stands on the following reasons.Firstly, it avoids the complex combination of SNP genotypes and can be of especial advantages in mixture deconvolution with which convenience would be provided for the relevant forensic application. Secondly,the alleles of each microhaplotype locus are able to be arranged in the order of Arabic numerals. This will offer a unif ied order of alleles within each locus for data exhibition and communication, as in Fig.1 and Table S1. Thirdly, numeral allelic designation would be easily accepted by forensic scientists because of its similarity to the forensic-available STR alleles. Finally, genetic polymorphisms in different populations are necessary for microhaplotypes to exert eff icient forensic applications,among which the current involving bioinformatics software for population genetics, e.g., PowerStats[21],Arlequin[22]and STRUCTURE[23], accepts only numbers as the imported allelic names for statistics[24]. It should be noted that the nomenclature described here is designed for microhaplotypes with already-defined consisting SNPs. When new variants are discovered in an existing microhaplotype sequence, the updated microhaplotype alleles and numeral names should be specified accordingly.

Supplementary materials

For the supplementary materials of this article, please see: http://www.xsjs-cifs.com/CN/abstract/abstract6830.shtml.