金媛媛 李丹 杨明
摘 要: 针对现有学科竞赛学员选拔中对评估数据缺少有效利用的问题,提出一种基于熵加权聚类的挖掘算法,对学科数据集合进行聚类,从而实现科学合理的人才挑选机制。采用人工统计对数据进行采集和归一化预处理,并利用稀疏分数进行数据特征选择,实现非必要聚类特征的过滤。通过熵加权聚类算法挖掘具有最优解的竞赛成员分配方案。实例分析结果表明,相比标准的Apriori算法,熵加权聚类算法运行效率更高,验证了提出方法的合理性和有效性。
关键词: 聚类分析; 人才评估; 熵加权; 数据挖掘; 归一化预处理; 数据特征选择
中图分类号: TN911.1?34; TP309 文献标识码: A 文章编号: 1004?373X(2019)19?0112?03
Abstract: In order to solve the problem of the lack of effective use of the evaluation data in the selection of existing academic contestants, a mining algorithm based on entropy?weighted clustering is proposed to cluster the subject data sets to achieve a scientific and rational mechanism of talent selection. The data is collected and normalized by manual statistic approach, and the sparse scores are used to select the data features for filtering of the non?essential clustering features. The entropy weighted clustering algorithm is used to mine the competition member allocation scheme with the optimal solution. The example analysis results show that the entropy?weighted clustering algorithm is more efficient than the standard Apriori algorithm, which verifies the rationality and effectiveness of the proposed method.
Keywords: cluster analysis; talent assessment; entropy weighting; data mining; normalization preprocessing; data feature selection
0 引 言
1 数据的预处理
1.1 统计数据的归一化
4 结 语
[1] HUNG C C, PENG W C, LEE W C. Clustering and aggrega?ting clues of trajectories for mining trajectory patterns and routes [J]. The VLDB journal, 2015, 24(2): 169?192.
[2] LEONI M D, AALST W M P V D, DEES M. A general process mining framework for correlating, predicting and cluste?ring dynamic behavior based on event logs [J]. Information systems, 2016, 56(3): 235?257.
[3] FATHIAN M, JAFARIAN?MOGHADDAM A R. New cluste?ring algorithms for vehicular Ad?hoc network in a highway communication environment [J]. Wireless networks, 2015, 21(8): 2765?2780.
[4] 李毓兰.改进Apriori算法及其在信息学奥赛学员选拔中的应用[D].泉州:华侨大学,2015.
LI Yulan. Improved Apriori algorithm and its application in the selection of informatics students [D]. Quanzhou: Huaqiao University, 2015.
[5] CASTRO P M. Normalized multiparametric disaggregation: an efficient relaxation for mixed?integer bilinear problems [J]. Journal of global optimization, 2016, 64(4): 765?784.
[6] GLEASON S, RUF C S, CLARIZIA M P, et al. Calibration and unwrapping of the normalized scattering cross section for the cyclone global navigation satellite system [J]. IEEE transactions on geoscience & remote sensing, 2016, 54(5): 2495?2509.
[7] BORNMANN L, HAUNSCHILD R. Normalization of mendeley reader impact on the reader?and paper?side: a comparison of the mean discipline normalized reader score (MDNRS) with the mean normalized reader score (MNRS) and bare reader counts [J]. Journal of informetrics, 2016, 10(3): 776?788.
[8] ZHANG C, ZHOU S. Renormalized and entropy solutions for nonlinear parabolic equations with variable exponents and L1 data [J]. Journal of differential equations, 2017, 248: 1376?1400.
[9] BORNMANN L, THOR A, MARX W, et al. The application of bibliometrics to research evaluation in the humanities and social sciences: an exploratory study using normalized Google Scholar data for the publications of a research institute [J]. Journal of the association for information science & technology, 2016, 67(11): 2778?2789.
[10] 魏霖静,宁璐璐,郭斌,等.大数据中基于熵加权的稀疏分数特征选择聚类算法[J].计算机应用研究,2018,35(8):2293?2294.
WEI Linjing, NING Lulu, GUO Bin, et al. Sparse?segment feature selection clustering algorithm based on entropy weigh?ting in big data [J]. Application research of computers, 2018, 35(8): 2293?2294.
[11] YANG M S, NATALIANI Y. A feature?reduction fuzzy cluste?ring algorithm based on feature?weighted entropy [J]. IEEE transactions on fuzzy systems, 2018, 26(2): 817?835.
[12] KAWAMURA T, SEKINE M, MATSUMURA K. Detecting hypernym/hyponym in science and technology thesaurus using entropy?based clustering of word vectors [J]. International journal of semantic computing, 2017, 11(4): 17?24.
[13] 李敏,李彩霞,魏霖静.基于熵加权的四叉树分解单帧图像去雾[J].计算机工程与设计,2017,38(6):1575?1579.
LI Min, LI Caixia, WEI Linjing. Four?tree decomposition of single frame image defogging based on entropy weighting [J]. Computer engineering and design, 2017, 38(6): 1575?1579.
[14] HAFEZALKOTOB Arian, ASHKAN Hafezalkotob. Extended MULTIMOORA method based on Shannon entropy weight for materials selection [J]. Journal of Industrial engineering international, 2016, 12(1): 1?13.