GUO Xiaohui.The improved method based on LDA topic model foremotion classification of text corpus[J].Journal of Yanbian University,2018,44(03):266-273.
基于LDA主题模型的文本语料情感分类改进方法
- Title:
- The improved method based on LDA topic model for emotion classification of text corpus
- 分类号:
- TP309.3
- 文献标志码:
- A
- 摘要:
- 针对传统LDA主题模型无法体现词与词之间的顺序及关联性这一不足,提出一种改进的加权W-LDA情感分类方法.首先,在该模型的主题采样及其分布期望计算过程中引入平均加权值,以此避免与主题紧密相关词被高频词所淹没,从而提高主题间的区分度; 然后,以提取到的高质量文档-主题分布及主题-词向量为基础,引入支持向量机算法(SVM),构建一个集有情感词分析与提取、主题分布计算与情感分类功能的文本语料情感分析方法; 最后,利用真实的教学评价数据和公共评论集对本文方法的有效性进行了验证.结果表明,本文提出的方法在主题区分度
- Abstract:
- An improved weighted W-LDA emotional classification method is proposed to solve the problem that the traditional LDA topic model can not reflect the order and relevance among words. Firstly, the average weighted value is used in the theme sampling and distribution expectation calculation process of the model, which avoid some important words related to the theme were drowned by high-frequency words. So these measures contribute to improve the degree of descrimination among the subjects. Secondly, based on the extracted high-quality document-subject distribution and theme-word vector, with the support vector machine algorithm(SVM)involved, a emotion classification method on comentary corpus is proposed in this article. Its functions include the analysis and exaction of emotion words, the topic distribution computation and emotion classifiction. Finally, some experiments are perfomed on the real teaching evaluation data and public comment data. The experimental results show that the proposed method has many advantages over the classific SVM and literatur [15] for the degree of descrimination the topics, the classification accuracy and F1-Measure.
参考文献/References:
[1] 庄丽榕,叶东毅.基于CSLSTM网络的文本情感分类[J].计算机系统应用,2018,27(2):230-235.
[2] 周红庆,吴扬扬.中文客户评论对象特征的抽取与聚类方法[J].微型机与应用,2014,33(15):69-71.
[3] Hai Zhen, Chang Kuiyu, Kim J. Implicit feature identification via co-occurrence association rule mining[C]//Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing, 2011:393-404.
[4] Wang Wei, Xu Hua, Wan Wei. Implicit feature identification via hybrid association rule mining[J]. Expert Systems with Applications, 2013,40(9):3518-3531.
[5] Chinsha T C, Joseph S. Asyntactic approach for aspect based opinion mining[C]//2015 IEEE International Conference on Semantic Computing, 2015:24-31.
[6] 张庆庆,刘西林.基于深度信念网络的文本情感分类研究[J].西北工业大学学报(社会科学版),2016,36(1):62-66.
[7] Tang D Y, Qin B, Liu T, et al. Document modeling with gated recurrent neural network for sentiment classification[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal, 2015:1422-1432.
[8] 唐晓波,朱娟,杨丰华.基于情感本体和KNN算法的在线评论情感分类研究[J].情报理论与实践,2016,39(6):110-114.
[9] 刘鸿宇,赵妍妍,秦兵,等.评价对象抽取及其倾向性分析[J].中文信息学报,2010,24(1):84-88.
[10] 尹裴,王洪伟.面向产品特征的中文在线评论情感分类:以本体建模为方法[J].系统管理学报,2016,25(1):103-114.
[11] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003,3(4):993-1022.
[12] 夏火松,刘建,朱慧毅.中文情感分类挖掘预处理关键技术比较研究[J].情报杂志,2011,30(9):160-163.
[13] Jin J, Liu Y, Ji P, et al. Understanding big consumer opinion data for market-driven product design[J]. International Journal of Production Research, 2016,54(10):3019.
[14] 李实,叶强,李一军,等.中文网络客户评论的产品特征挖掘方法研究[J].管理科学学报,2009,12(2):185-189.
[15] 李杰,李欢.基于深度学习的短文本评论产品特征提取及情感分类研究[J].情报理论与实践,2018,41(2):141-146.
[16] 杨丰凯,袁海静.稳健学生t回归模型变点估计的Gibbs抽样算法[J].统计与决策,2017,22(16):10-14.
[17] Lecun Y, Bengio Y, Hington G. Deep learning[J]. Nature, 2015,521(7553):436-444.
备注/Memo
收稿日期: 2018-05-21 基金项目: 福建省教育厅科研项目(JA15631)
作者简介: 郭晓慧(1984—),女,讲师,研究方向为个性化推荐算法、数据挖掘.