YI Zhiwei,ZHAO Yahui*,CUI Rongyi.Implementation of a multilingual abstractive automaticsummarization method[J].Journal of Yanbian University,2019,45(03):254-259.
一种多语种生成式自动摘要方法的实现
- Title:
- Implementation of a multilingual abstractive automatic summarization method
- 文章编号:
- 1004-4353(2019)03-0254-06
- Keywords:
- abstractive; automatic summarization; multilingual; joint training
- 分类号:
- TP391.1
- 文献标志码:
- A
- 摘要:
- 为实现多语种生成式自动摘要,基于序列到序列(Seq2Seq)模型提出了一种多语种生成式自动摘要方法.首先,按照传统的多语种自动摘要方法,将中、英、朝3个语种的语料分开训练,得到3个模型,并观察各模型在测试集上的表现; 其次,按照本文提出的多语种自动摘要法,将中、英、朝3种语言的语料放在一起共同训练出一个模型,然后运用该模型分别运行中文、英文、朝文语料的测试集,并观察模型的表现; 最后,用同一个测试集测试模型改进前后的摘要生成效果.实验结果表明,本文方法生成多语种自动摘要的效果与传统方法相近,但因本文方法只用一个模型即可实现多语种自动摘要,因此更具有适用性.
- Abstract:
- In order to realize multilingual abstractive automatic summarization, a multilingual abstractive automatic summarization method is proposed based on the sequence-to-sequence(Seq2Seq)model. Firstly, according to the traditional multilingual automatic summarization method, the corpora of Chinese, English and Korean languages were trained separately, and three models were obtained to observe their performance on the test set. Secondly, according to the multilingual automatic summarization proposed in this paper, the method combined the Chinese, English and Korean corpora together to train a model, and then used this model to run the test set of Chinese, English and Korean corpus separately to observe the performance of the model. Finally, the same test set was used to test the effect of summarization before and after improving the model. The experimental results show that the effect of multilingual automatic summarization method proposed in this paper is similar to the traditional method, but can realize multilingual automatic summarization with only one model, so it is more applicable than traditional methods.
参考文献/References:
[1] LUHN H P. The automatic creation of literature abstracts[J]. IBM Journal of Research & Development, 1958,2(2):159-165.
[2] MIHALCEA R, TARAU P. TextRank: bringing order into texts[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing(EMNLP). Stroudsburg, PA: ACL, 2004:404-411.
[3] ERKAN G, RADEV D R. LexRank: graph-based lexical centrality as salience in text summarization[J]. Journal of Artificial Intelligence Research, 2004,22:457-479.
[4] PAGE L, BRIN S, MOTWANI R, et al. The pagerank citation ranking: bringing order to the web[J]. Stanford Digital Libraries Working Paper, 1998,9(1):1-14.
[5] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[EB/OL].[2019-3-17]. https://arxiv.org/pdf/1409.3215.pdf.
[6] CHOPRA S, AULI M, RUSH A M. Abstractive sentence summarization with attentive recurrent neural networks[C]//Proceedings of NAACL -HLT. San Diego: NAACL, 2016:93-98.
[7] ZHOU Qingyu, YANG Nan, WEI Furu, et al. Selective encoding for abstractive sentence summarization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2017:1095-1104.
[8] MA Shuming, SUN Xu, LIN Junyang, et al. A hierarchical end-to-end model for jointly improving text summarization and sentiment classification[EB/OL].[2019-3-17]. https://arxiv.org/pdf/1805.01089.pdf.
[9] SATHASIVAM S, ABDULLAH W A T W. Logic learning in hopfield networks[EB/OL].[2019-3-17]. https://arxiv.org/ftp/arxiv/papers/0804/0804.4075.pdf.
[10] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997,9(8):1735-1780.
[11] CHO K, GULCEHRE C, BAHDANAU D, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[EB/OL].[2019-3-17]. https://arxiv.org/pdf/1406.1078.pdf.
[12] TAO Lei, ZHANG Yu, WANG Sida, et al. Simple recurrent units for highly parallelizable recurrence[EB/OL].[2019-3-17]. https://arxiv.org/pdf/1709.02755.pdf.
[13] LIN C Y. ROUGE: A package for automatic evaluation of summaries[C]//In Proceedings of Workshop on Text Summarization Branches Out, Post -Conference Workshop of ACL. Stroudsburg, PA: ACL, 2004.
备注/Memo
收稿日期: 2019-04-17
*通信作者: 赵亚慧(1974—),女,副教授,研究方向为自然语言文本处理.