LIU Baochao,CUI Rongyi*.Mutual incentive entity verification algorithm based on the max Jaccard similarity[J].Journal of Yanbian University,2015,41(01):42-45.
基于最大Jaccard相似度的互激励实体验证算法
- Title:
- Mutual incentive entity verification algorithm based on the max Jaccard similarity
- 分类号:
- TP391.1
- 文献标志码:
- A
- 摘要:
- 针对基于规则的信息抽取技术提出了一种互激励实体验证算法.该算法兼顾了信息抽取过程中互激励算法的优点,并在此基础上引入了实体等待队列,用于存储未被成功验证的实体,并以最大Jaccard相似度为原则进行实体验证.实验结果表明,将该算法应用在基于规则的参考文献命名实体抽取中,其抽取的准确率要比SermeX系统高约15%,比Para Tools系统高约40%.
- Abstract:
- The technology of information extraction rules is proposed based on a mutual incentive entity authentication algorithm. The algorithm has both advantages of information extraction in the process of incentive algorithm, and on the basis of introducing the entity waiting queue, used to store has not been successfully verified entity, with the max Jaccard similarity principle of entity authentication.The experimental results show that, if the algorithm is applied in the reference named entity extraction, the extraction precision is higher than SermeX system about 15%, and is higher than Para Tools system about 40%.
参考文献/References:
[1] 李洪亮,黄莉.基于规则的百科人物属性抽取算法的研究[D].成都:西南交通大学,2013:11-25.
[2] 李湘东,霍亚勇,黄莉.图书网页的自动识别及书目信息抽取研究[J].现代图书情报技术,2014(4):71-74.
[3] 郭志鑫,金海,陈汉华.SemreX中基于语义的文档参考文献元数据信息抽取[J].计算机研究与发展,2006,43(8):1368-1374.
[4] Cheng Xianyi, Zhu Qian, Wang Jin. The Principle and Application of Chinese Information Extraction[M]. Beijing: Science Press, 2010:181-182.
[5] 孙明,陆春生,徐秀星,等.一种基于SVM和AdaBoost的Web实体信息抽取方法[J].计算机应用与软件,2013,30(4):101-106.
[6] 张秀秀,马建霞.PDF科技论文语义元数据的自动抽取研究[J].现代图书情报技术,2009(2):102-106.
[7] Li Chaoguang, Zhang Ming, Deng Zhihong, et al. Automatic metadata extraction for scientific documents[J]. Computer Engineering and Applications, 2002,21(10):189-191.
[8] Liu Wei, Yan Hualiang. A unified and automatic web news object extraction approach[J]. Computer Engineering, 2012,38(11):167-169.
[9] Zhang M, Yin P, Deng Z H, et al. SVM+BiHMM: a hybrid statistic model for metadata extraction[J]. Journal of Software, 2008,19(2):358-368.
[10] Wang Shuang. Research of web information extraction technology oriented to digital tourism website[D]. Xi’an: Xidian University, 2012.
[11] 龚立群,马宝英,常晓荣.科技文献元数据自动抽取研究综述[J].计算机系统应用,2013,22(3):11-15.
[12] 杨春磊,邵堃基.基于模式匹配的结构化信息抽取研究[D].合肥:合肥工业大学,2013:11-30.
[13] 陈先军.文后参考文献引著质量及其审查方法[J].中国科技期刊研究,2014,25(9):1145-1148.
备注/Memo
收稿日期: 2014-12-07 基金项目: 吉林省科技发展计划项目(20140101186JC)*通信作者: 崔荣一(1962—),男,博士,教授,研究方向为模式识别、智能计算.