LIU Wenting.The extraction of micro -blog new sentiment word based onimproved mutual information[J].Journal of Yanbian University,2019,45(04):349-355.
基于改进互信息的微博新情感词提取
- Title:
- The extraction of micro -blog new sentiment word based on improved mutual information
- 文章编号:
- 1004-4353(2019)04-0349-07
- Keywords:
- micro -blog; new sentiment words; N -gram segmentation; multiword mutual information; sentiment similarity between the words
- 分类号:
- TP391
- 文献标志码:
- A
- 摘要:
- 针对微博新词的情感倾向分析问题,提出了一种改进互信息的微博新情感词提取方法.首先,对预处理后的微博数据进行N元切分,以此得到候选字串; 然后,通过计算多字互信息(multiword mutual information,MMI)和左右侧邻接熵对候选字串进行扩展和过滤得到候选新词,再将候选新词与相应词典进行对比得到新词; 最后,通过词间情感相似度(sentiment similarity between the words,SW)计算出新词的情感倾向值,从而得到新情感词.实验结果显示,该方法对新词情感倾向识别的准确率、召回率和F1值比文献[4]方法分别提高了13.14%、5.81%和8.59%,因此该方法具有很好的应用价值.
- Abstract:
- Aiming at the problem of sentiment tendency analysis of new words in micro -blog, a method of extracting new sentiment words based on improved mutual information was proposed. Firstly, N-gram segmentation method are performed in the preprocessed micro -blog data to obtain candidate string. Then, the candidate word string was expanded and filtered by calculating the multiword mutual information(MMI)and the left and right adjacency entropy to obtain the candidate new words. And the candidate new words were screened to obtain new words by comparing the corresponding dictionaries. Finally, the sentiment tendency value of the new word was calculated by sentiment similarity between the words(SW), and the new sentiment word was obtained. The experimental results show that the precision rate, recall rate and F1 value of the method for micro -blog new sentiment word recognition are 13.14%, 5.81% and 8.59% higher than those in the literature [4]. Therefore, the method has good application value.
参考文献/References:
[1] HUANG M L, YE B R, WANG Y C, et al. New word detection for sentiment analysis[C]//52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, USA: ACL, 2014:531-541.
[2] TANG Z, FU Z M, GONG Z R, et al. A parallel conditional random fields model based on spark computing environment[J]. Journal of Grid Computing, 2017,15(3):1-20.
[3] 雷一鸣,刘勇,霍华.面向网络语言基于微博语料的新词发现方法[J].计算机工程与设计,2017(3):789-794.
[4] 王非.基于微博的情感新词发现研究[J].软件学报,2015,36(11):6-8.
[5] 李勇敢,周学广,孙艳,等.中文微博情感分析研究与实现[J].软件学报,2017,28(12):3183-3205.
[6] 唐晓波,刘广超.细粒度情感分析研究综述[J].图书情报工作,2017,61(5):132-140.
[7] LI W W, LI Y Q, WANG Y. Chinese microblog sentiment analysis based on sentiment features[C]//Asia-Pacific Web Conference. Suzhou, China: Springer, 2016,9932:385-388.
[8] ZHAO C J, WANG S G, LI D Y. Exploiting social and local contexts propagation for inducing Chinese microblog-specific sentiment lexicons[J]. Computer Speech and Language, 2019,55:57-81.
[9] HAO Z F, CAI R C, YANG Y Y, et al. A dynamic conditional random field based framework for sentence -level sentiment analysis of Chinese microblog[C]//IEEE International Conference on Computational Science & Engineering. Guangzhou, China: IEEE, 2017:135-142.
[10] 张仰森,郑佳,黄改娟,等.基于双重注意力模型的微博情感分析方法[J].清华大学学报(自然科学版),2018,58(2):122-130.
[11] 张婧,黄锴宇,梁晨,等.面向中文社交媒体语料的无监督新词识别研究[J].中文信息学报,2018,32(3):17-25.
[12] 张华平,高凯,黄河燕,等.大数据搜索与挖掘[M].北京:科学出版社,2014:107.
备注/Memo
收稿日期: 2019-08-14 作者简介: 柳文婷(1996—),女,硕士研究生,研究方向为自然语言处理及数据挖掘.
基金项目: 安徽省高校拔尖人才培育项目(gxbjZD15); 安徽省自然科学基金面上项目(1908085MF189)