LIU Shuangjun,JIN Xiaofeng,CUI Rongyi*.Research on speech similarity based on frame symbolization[J].Journal of Yanbian University,2014,40(01):45-48.
基于帧符号化的语音相似性度量方法
- Title:
- Research on speech similarity based on frame symbolization
- 分类号:
- TP391.41
- 文献标志码:
- A
- 摘要:
- 提出了将语音帧符号化后度量语音相似性的方法.首先,去除语音段中的静音部分,并提取每帧语音的MFCC参数; 其次,将MFCC参数进行k均值聚类和KNN分类,并根据分类结果对语音信号进行符号化; 最后,采用编辑距离计算语音段之间的相似性.实验表明,将语音符号化后,音频之间的可区分性更加明显,识别率也有了明显提高.
- Abstract:
- We presented a method to measure similarity of speech by using frame symbolization. Firstly, removing silence parts from speech segments, MFCC coefficients were extracted from each frame. Secondly, MFCC coefficients were classified by KNN-classification algorithm in terms of k -means clustering results, and speech signals to do symbolization processing according to the classification. Finally, speech similarity was computed by using Levenshtein distance. Experiment results show that frame symbolization makes distinction between different speeches are more obvious, and recognition rate has improved significantly.
参考文献/References:
[1] 张自强.基于内容的音频匹配研究[D].上海:华东师范大学,2012.
[2] 李丙洋.基于音频内容的多媒体文件相似性快速比对研究[D].哈尔滨:哈尔滨工业大学,2013.
[3] 李超,熊璋,朱成军.基于距离相关图的音频相似性度量方法[J].北京航空航天大学学报,2006,32(2):224-227.
[4] Subramanya S, Abdou Y. Segmentation of audio data based on the binary images of the audio samples[C]//In: Proc of Inter Conference on Intelligent Systems. Denver: ISCA, 1999:137-141.
[5] Foote J. Automatic audio segmentation using a measure of audio novelty[C]//In: Proc of ICME 2000. NY: IEEE, 2000:452-455.
[6] 曹文晓.哼唱检索中基于分段信息的匹配算法研究[D].北京:清华大学,2010.
[7] Skowronski M D, Harris J G. Increased MFCC filter bandwidth for noise-robust phoneme recognition[C]//In:IEEE International Conference on Acoustics, Speech, and Signal Processing. Florida: IEEE, 2002:801-804.
[8] 蔡碧野,吴一帆,谢中科,等.数据挖掘中聚类的研究[J].计算机工程与应用,2003,17(2):39-42.
[9] 孙岩,吕世聘,王秀坤,等.基于结构学习的KNN分类算法[J].计算机科学,2007,34(12):184-187.
[10] Levenshtein V L. Binary codes capable of correcting deletions, insertions and reversals[J]. Doklady Akademii Nauk SSSR, 1966,163(4):707-710.
[11] Itakura F. Minimum prediction residual principle applied to speech recognition[C]//In: IEEE Trans Acoustics, Speech, and Signal Proc. IEEE: 1975,23(1):67-72.
备注/Memo
收稿日期: 2013-12-24*通信作者: 崔荣一(1962—),男,博士,教授,研究方向为模式识别、智能计算.