XU Bowen,JIN Xiaofeng*.Korean speech retrieval method based on speech recognition[J].Journal of Yanbian University,2021,47(03):273-278.
基于语音识别的朝鲜语语音检索方法
- Title:
- Korean speech retrieval method based on speech recognition
- 文章编号:
- 1004-4353(2021)03-0273-06
- 分类号:
- TP391.42
- 文献标志码:
- A
- 摘要:
- 针对基于语音识别的语音检索方法对语言模型的强依赖问题,通过改进声学模型学习框架提出了一种新的朝鲜语语音检索方法.该方法首先修改KoSpeech框架的网络模型,通过训练得到了朝鲜语的声学模型; 其次通过语音文档分割方法构建了语音文档索引库; 最后利用编辑距离匹配的方法实现了语音检索.实验结果表明,改进的朝鲜语声学模型学习框架降低了语音检索方法对语言模型的依赖和大规模数据集的要求.当k取9时, top -k评价方法的检索均值平均精度达到86.74%, 召回率达到95.25%, 该结果表明本文提出的方法是有效的,具有一定的实际应用价值.
- Abstract:
- Aiming the issue that recognition based speech retrieval method relies heavily on language model, a novel Korean speech retrieval method based on improved acoustic model learning framework is proposed. First, Korean acoustic model is trained by modified KoSpeech framework network model. Second, speech documents index library is constructed by speech document segementation method. Finally, Levenshtein distance matching method is used to implementation speech retrieval. Experiments result show that proposed improved model of Korean acoustic reduces the dependency of language model and the requirement of largescale dataset for retrieval method. For the top -k evaluation method, mAP and recall rate reach best to 86.74% and 95.25% respectively when k=9, so it is firmly demonstrated that the proposed method is effective and has certain practical application value.
参考文献/References:
[1] AKIBA T, NISHIZAKI H, NANJO H, et al.Overview of the NTCIR - 12 SpokenQuery&Doc - 2 task[C]//Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies.Tokyo: NII, 2016:167-179.
[2] BURGET L, CERNOCK J, FAPSO M, et al.Indexing and search methods for spoken document[J].Lect Notes Comput Sci, 2006,4188(1):351-358.
[3] 李伟.基于内容的汉语语音检索技术研究与系统实现[D].北京:清华大学,2011.
[4] 金惠琴.基于维吾尔语语音关键词检索的研究[D].乌鲁木齐:新疆大学,2013.
[5] LIU H, FAN T, WU P.Audio - visual keyword spotting for mandarin based on discriminative local spatial - temporal descriptors[C]//2014 22nd International Conference on Pattern Recognition.Stockholm: IEEE, 2014:785-790.
[6] 王朝松,韩纪庆,郑铁然.基于非均匀MCE准则的DNN关键词检测系统中声学模型的训练[J].智能计算机与应用, 2015,5(5):15-17.
[7] 李鹏,屈丹.采用词图相交融合的语音关键词检测方法[J].信号处理,2015(6):702-709.
[8] CHEN I F, NI C, LIM B P, et al.A keyword-aware language modeling approach to spoken keyword search[J].J Signal Process Syst, 2016,82(2):197-206.
[9] ZHUANG Y, CHANG X, QIAN Y, et al.Unrestricted vocabulary keyword spotting using LSTM - CTC[C]//Proceedings of 2016 Interspeech.San Francisco: ISCA, 2016:938-942.
[10] DHANANJAY R, AFSANEH A, HRVÉ B.Phonetic subspace features for improved query by example spoken term detection[J].Speech Commun, 2018,103:27-36.
[11] HUANG J, GHARBIEH W, SHIM H S, et al.Query - by - example keyword spotting system using multi - head attention and softtriple loss[C]//2021 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP).Toronto: IEEE, 2021:6858-6862.
[12] KIM S, BAE S, WON C.Open - source toolkit for end - to - end Korean speech recognition[J].Software Impacts, 2021,7:100054.
[13] CHAN W, JAITLY N, LE Q, et al.Listen, attend and spell: A neural network for large vocabulary conversational speech recognition[C]//2016 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP).Shanghai: IEEE, 2016:4960-4964.
[14] CHUNG J, GULCEHRE C, CHO K H, et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv preprint arXiv:1412.3555v1, 2014.
备注/Memo
收稿日期: 2021-04-07
基金项目: 吉林省教育厅“十三五”科学技术项目(JJKH20191126KJ); 延边大学外国语言文学世界一流学科建设项目(18YLPY14)
*通信作者: 金小峰(1970—),男,硕士,教授,研究方向为语音信息处理、计算机视觉及机器人技术.