论文总字数:19893字
摘 要
声纹是指独有的能识别人体或物体的声音特征,是通过电声学器件呈现的附带语音信息的声波频谱。声纹识别也常被唤为话者识别,其概念为:将话者的声音经过一系列的加工后,智能识别的话者有没有在语料库中,是语料库中的哪一个人。语音信号中本身就包含着语音内容特征与话者自己的特性等各种元素,但是这些元素的组合方式更为无序,要想从中提取出纯粹的说话人个性特征十分困难,然而从语音信号中提炼出能够准确反映话者的特征参数却是声纹识别的关键。因此本文着重探讨了如何进行有效的声纹识别的特征提取。
本文研究的是文本无关的声纹识别特征提取方法。由于识别的对象既可以是人、也可以是物,因此研究人员将此系统分为以下几类,即文本有关型、与文本无关型和文本提示型。与文本无关的声纹识别技术,在训练模式和识别模式下都不规定话者的语音内容,即其识别对象是较为随意的状态下的语音信号。虽然与文本无关的声纹识别实现起来很困难,但它具有其独特的优点:用户使用起来方便随意,可应用的范围宽泛等,这些优点让其在司法鉴定、安全监测等领域占据了一定的地位。本文选取的就是与文本无关的声纹识别。
本文采用了梅尔频率倒谱系数(Mel-Frequency Cepstral Coefficient, MFCC)提取声学特征。研究人员研究出了许多种类的声学特征,本文也会介绍其中的最为常见的三种。MFCC是语音识别最基本的特征量,用MFCC减去一定的时间范围内的倒谱均值,就可以简单实现噪声去除和信号传输的失真校正,因此本文选取了MFCC来提取声学特征。
本文研究了高斯混合模型(Gaussian Mixture Model, GMM)的说话人识别系统。GMM包括好几类模型,划分相对来说较为细致,符合实验对多类别划分的要求,因此适用于对象比较繁杂的建模,而且面对这种情形其自适应的效果较好,能解决别的模型不能解决的问题。本文对此主要研究了GMM下的EM算法和初始值的选择。
关键词:声纹识别;文本无关;特征提取;MFCC;GMM
Research on Feature Extraction Method of Text Independent Voiceprint Recognition
Abstract
Voiceprint refers to the unique sound characteristics that can identify the human body or object. It is the spectral of sound waves with speech information presented by electroacoustic devices. Voiceprint recognition is also often called speaker recognition. Its concept is: after a series of processing of the speaker's voice, it can intelligently identify whether the speaker is in the corpus and which person is in the corpus. Voice the voice signal itself contains the content characteristics and their characteristics of various elements, such as the speaker's words more disorderly, but these elements combination to extract pure speaker personality traits is very difficult, however, derived from the speech signal can reflect the characteristic parameters of the speaker's words but it is the key to a voiceprint recognition. Therefore, this paper focuses on how to extract the features of effective voiceprint recognition.
This paper studies the text independent voiceprint recognition feature extraction method. According to the speaker, voiceprint recognition can be divided into the following categories: text related, text independent and text suggestive. The voiceprint recognition technology, which has nothing to do with text, does not specify the speech content of the speaker in both the training mode and the recognition mode, that is, the recognition object is the speech signal in a relatively random state. Although it is difficult to realize the voiceprint recognition which has nothing to do with the text, it has its unique advantages: it is convenient for users to use, and can be used in a wide range, etc. These advantages make it occupy a certain position in the fields of judicial identification, security monitoring and so on. This paper selects the voiceprint recognition which has nothing to do with the text.
In this paper, MFCC is used to extract acoustic features. There are many kinds of acoustic features that have been studied by researchers, and three of the most common are described here. MFCC is the most basic feature quantity of speech recognition. By subtracting the cepstral mean of MFCC within a certain time range, the noise removal and distortion correction of signal transmission can be simply realized. Therefore, this paper chooses MFCC to extract acoustic features.
This paper studies the speaker recognition system of GMM. GMM includes several types of models, the division is relatively detailed, in line with the requirements of the experiment for the classification of multiple categories, so it is suitable for the modeling of complex objects, and in the face of such a situation, its adaptive effect is better, can solve the problem that other models can not solve. This paper mainly studies the EM algorithm and the choice of initial value under GMM.
Keywords:Voiceprint Recognition;Text Irrelevant;Feature Extraction;MFCC;GMM
目 录
摘 要 I
Abstract II
第一章 引 言 1
1.1 研究背景及意义 1
1.2 国内外发展现状 1
1.3 论文的主要研究内容与工作安排 2
1.4 本章小结 2
第二章 声纹识别技术的分析 3
2.1 声纹识别原理 3
2.2 语音数据预处理 4
2.3 声纹特征提取 5
2.3.1 LPCC参数的提取 5
2.3.2 MFCC参数的提取 7
2.3.3 GFCC参数的提取 8
2.4 本章小结 8
第三章 基于高斯混合模型的声纹识别 9
3.1 基于高斯混合模型的声纹识别概述 9
3.2 GMM模型参数描述 9
3.2.1 GMM模型参数估计 10
3.2.2 GMM模型参数的初始化 11
3.3 本章小结 12
第四章 系统测试与分析 13
4.1 实验环境 13
4.1.1 实验的软硬件环境 13
4.1.2 实验的语音库 13
4.2 系统功能实现 13
4.2.1 声纹注册模块功能实现 13
4.2.2 声纹识别模块功能实现 17
4.3 本章小结 18
第五章 总结与展望 19
5.1 工作总结 19
5.2 研究展望 19
致 谢 20
参考文献(References) 21
附 录 22
第一章 引 言
1.1 研究背景及意义
随着现代科技的发展,我们越来越依赖于计算机,AI也随之成为了近年的研究热点。语音识别自计算机诞生以来就一直是人类梦寐以求的技术,在以前的科幻电影中,人类就是用语音向计算机传达指令的,现如今,让计算机拥有五感已不再是空谈了,如今,声纹技术就让计算机拥有了听觉。虽然声纹技术在学术上已经有很丰富的研究背景了,但在生活上还处于人们对其认知匮乏和应用面窄的阶段,这既是我们面临的挑战,同时也是一个为不可多得的机遇。
剩余内容已隐藏,请支付后下载全文,论文总字数:19893字
该课题毕业论文、开题报告、外文翻译、程序设计、图纸设计等资料可联系客服协助查找;