LIU Yueting.Imbalanced dataset classification algorithm based on NDSVM[J].Journal of Yanbian University,2018,44(01):43-48.
基于近邻密度改进的SVM不平衡数据集分类算法
- Title:
- Imbalanced dataset classification algorithm based on NDSVM
- Keywords:
- support vector machine; imbalanced dataset; neighbor density; uneven distribution; boundary
- 分类号:
- TP391
- 文献标志码:
- A
- 摘要:
- 针对不平衡数据集数据分布不均匀及边界模糊的特点,提出基于近邻密度改进的SVM(NDSVM)不平衡数据集分类算法.该算法先计算多数类内每个样本的近邻密度值,然后依据该密度值选出多数类中位于边界区域、靠近边界区域的与少数类数目相等的样本分别与少数类完成SVM初始分类,最后用所得的支持向量机和剩余的多数类样本完成初始分类器迭代优化.人工数据集和UCI数据集的实验结果表明,与WSVM、ALSMOTE -SVM和基本SVM算法相比,本文算法分类效果良好,能有效改进SVM算法在分布不均匀及边界模糊数据集上的分类性能.
- Abstract:
- Aimed at the data of uneven distribution and indistinct boundary in imbalanced dataset, imbalanced dataset classification algorithm based on neighbor density support vector machine(NDSVM)is proposed. In this algorithm, neighbor density value of each sample in the majority is solved firstly. According to the density, the data which on the majority class border or close to the border is equal to the minority samples in quantity, which are selected and the minority class complete SVM initial classification. Then the resulting support vector machine and residual data in the majority class optimize the initial classifier. The simulation results of experiments on the manual and UCI dataset show that compared with WSVM, ALSMOTE-SVM and SVM, NDSVM has better classification performance, which effectively improve the classification performance of SVM algorithm on the uneven distribution and indistinct boundary in imbalanced dataset.
参考文献/References:
[1] Jason V H, Taghi K. Knowledge discovery from imbalanced and noisy data[J]. Data & Knowledge Engineering,2009,68:1513-1542.
[2] 翟云,杨炳儒,曲武.不平衡类数据挖掘研究综述[J].计算机科学,2010,37(10):27-32.
[3] 张静静.基于不平衡数据集的支持向量机模型与算法研究[D].北京:中国农业大学,2015.
[4] 李勇,刘战东,张海军.不平衡数据的集成分类算法综述[J].计算机应用研究,2014,31(5):1287-1291.
[5] 程险峰,李军,李雄飞.一种基于欠采样的不平衡数据分类算法[J].计算机工程,2011,37(13):147-149.
[6] 李荣陆,胡运发.基于密度的KNN文本分类器训练样本裁剪方法[J].计算机研究与发展,2004,41(4):539-545.
[7] Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: syn-thetic minority over-sampling technique[J]. Journal of Artifi-cial Intelligence Research, 2002,16:321-357.
[8] 张永,李卓然,刘小丹.基于主动学习SMOTE的非均衡数据分类[J].计算机应用与软件,2012,29(3):91-93.
[9] 孟军.不平衡数据集分类算法的研究[D].江苏:南京理工大学,2014.
[10] Wang Chinheng, Lee Lamhong, Rajkumar R, et al. Ahybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine[J]. Expert Systems with Applications, 2012,39:11880-11888.
[11] 王超学,张涛,马春森.基于聚类权重分阶段的SVM解不平衡数据集分类[J].计算机工程与应用,2015,51(21):133-137.
[12] 刘万里,刘三阳,薛贞霞.不平衡支持向量机的平衡方法[J].模式识别与人工智能,2008,21(2):136-141.
[13] 杨扬,李善平.基于实例重要性的SVM解不平衡数据分类[J].模式识别与人工智能,2009,22(6):913-918.
[14] Lin Y, Lee Y, Wahba G. Support vector machines for classification in non standard situations[J]. Machine Learning, 2002,46(1/2/3):191-201.
相似文献/References:
[1]刘悦婷,张燕.CMSFLA-SVM算法在人脸识别中的应用[J].延边大学学报(自然科学版),2015,41(04):337.
LIU Yueting,ZHANG Yan.Application of CMSFLA-SVM algorithm in face recognition[J].Journal of Yanbian University,2015,41(01):337.
备注/Memo
收稿日期: 2017-09-19
基金项目: 2015年甘肃省高等学校科研项目(2015B -132)
作者简介: 刘悦婷(1979—),女,副教授,研究方向为电子、自动控制理论.