
软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
Journal of Software, 2022,33(4):1338−1353 [doi: 10.13328/j.cnki.jos.006466] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563
基于可辨识矩阵的完全自适应 2D 特征选择算法
∗
谢娟英
,
吴肇中
(陕西师范大学 计算机科学学院, 陕西 西安 710119)
通信作者: 谢娟英, E-mail: xiejuany@snnu.edu.cn
摘 要: 针对基于信息增益与皮尔森相关系数的特征选择算法 FSIP (feature selection based on information gain and
Pearson correlation coefficient)存在的特征子集选取需要人工参与的问题, 提出基于可辨识矩阵的完全自适应2D 特
征选择算法 DFSIP (discernibility based FSIP). DFSIP 算法完全自适应地发现特征子集, 每次选择当前特征中最重
要的一个特征, 并以此特征约简可辨识矩阵, 剔除冗余特征, 最终自适应地获得最优特征子集. 依据最优特征子
集构建 K-ELM 分类器来评价最优特征子集的类别辨识能力. 在基因数据集的实验测试以及与 FSIP, mRMR, LLE
Score, DRJMIM, AVC, AMID 算法的实验比较和统计重要性检测表明: DFSIP 算法能够自动选择出辨识能力更强的
特征子集, 基于此特征子集的分类器具有很好的分类性能.
关键词: 可辨识矩阵; 特征辨识度; 特征独立性; 特征选择; 信息增益; 皮尔森相关系数
中图法分类号: TP18
中文引用格式: 谢娟英, 吴肇中. 基于可辨识矩阵的完全自适应 2D 特征选择算法. 软件学报, 2022, 33(4): 1338–1353.
http://www.jos.org.cn/1000-9825/6466.htm
英文引用格式: Xie JY, Wu ZZ. Totally Adaptive 2D Feature Selection Algorithm Based on Discernibility Matrix. Ruan Jian Xue
Bao/ Journal of Software, 2022, 33(4): 1338−1353 (in Chinese). http://www.jos.org.cn/1000-9825/6466.htm
Totally Adaptive 2 D Feature Selec tion Algorithm Based on Disc ernibil ity Ma trix
XIE Juan-Ying, WU Zhao-Zhong
(School of Computer Science, Shaanxi Normal University, Xi’an 710119, China)
Abstra ct : To overcome the limitations of the FSIP (feature selection based on information gain and Pearson correlation coefficient)
feature selection algorithm that need human to determine the borderline to detect the feature subsets, the totally adaptive 2D feature
selection algorithm is proposed in this study based on discernibility matrix. It is referred to as DFSIP (discernibility based FSIP). DFSIP
introduces discernibility matrix into the feature selection process of FSIP. It first initializes the candidate feature set comprising all
features and constructs the initial discernibility matrix, then it detects the most significant feature from the current candidate feature set,
so as to add it to feature subset and use it to reduce the discernibility matrix. After that the candidate feature set is updated using the union
of the cells of the reduced discernibility matrix, and the most significant feature is detected from the current candidate feature set again, so
as to put it into the feature subset and use it to reduce the discernibility matrix, and the candidate feature set is updated again. This process
repeats till there is not any feature left in the candidate feature set. The power of DFSIP is tested on very famous gene expression datasets,
and its performance is compared with that of the popular feature selection algorithms including FSIP, mRMR, LLE Score, DRJMIM, AV C,
and AMID by comparing the performance of the K-ELM classifier built using the feature subset detected by these feature selection
algorithms. In addition, the significant test is done to verify whether or not there is the significant difference between DFSIP and FSIP as
well as other compared feature selection algorithms. The experimental results demonstrate that DFSIP is superior to the compared ones,
especially it has the significant difference to LLE Score, DRJMIM, and AMID feature selection algorithms. Although there is not
∗ 基金项目: 国家自然科学基金(62076159, 61673251, 12031010); 国家重点研发计划(2016YFC0901900); 中央高校基本科研业
务费专项资金(GK202105003); 研究生培养创新基金(2016CSY009, 2018TS078)
本文由“面向开放场景的鲁棒机器学习”专刊特约编辑陈恩红教授、李宇峰副教授、邹权教授推荐.
收稿时间: 2021-03-10; 修改时间: 2021-07-16; 采用时间: 2021-08-27; jos 在线出版时间: 2021-10-26
评论