论文总字数:48571字
摘 要
情感识别是情感计算的重要研究方向之一,通过对情感外在表现特征使计算机感知人类的情感信息,从而更好地理解和表达人类的情感信息,实现高效和谐的人机交互。近年来,多模态情感识别成为研究热点。相较于单模态单一的情感表征而言,多模态可融合更多的情感信息,实现多种情感信息的互补,从而更好地表征情感状态。人脸表情和语音特征是人类情绪表现形式中较为显性和更容易分析的相关信息。因此,本论文侧重于研究语音情感识别和人脸表情识别的特征提取和分类方法,以及双模融合的方法,具体工作如下:
语音和人脸特征提取。本文针对语音特征提取,采用openSmile工具箱,提取了InterSpeech2010标准语音特征集,共1582维。针对人脸特征提取,提取动态三维人脸几何特征,其中,采用慢特征分析法自动检测表情峰值,提高了效率,并对非定长的动态特征归一化。
情感识别分类器。本文对分离学习和联合学习方法进行了研究。关于分离学习,首先采用自动编码器对高维特征进行特征学习和降维,然后进行有监督的学习微调神经网络。关于联合学习,同时考虑分类器分类误差和自动编码器重构误差,对神经网络进行训练。
双模融合方法。本文先分别对语音和人脸表情进行了单模情感识别研究。采用一种新的自适应权值的加权方法对单模识别进行融合,得到分类结果。
在IEMOCAP数据库上进行实验,对引导型情感数据进行样本筛选,语音和三维几何动态特征提取,以及情感分类。单模态实验结果表明,语音情感识别,识别悲伤等消极情绪效果较好,人脸表情识别,识别高兴等积极情绪效果较好。双模态实验结果表明,基于自适应权重的双模态决策层融合识别率比单模态识别率高10%左右,能实现优势互补。
关键词:双模态情感识别,语音情感识别,人脸情感识别,三维动态几何特征提取,自动编码器,决策层融合
Abstract
Emotion Recognition is a significant part of affective computing. It makes computers recognize people’s emotion through external emotional features they can perceive to understand and express humans’ emotion better so that we can achieve effective and harmonious human-computer interaction. In recent years, multi-modal emotion recognition becomes research focus. Multi-modal emotion recognition can integrate more emotional information than single-modal emotion recognition. Multi-modal emotion recognition can realize mutual benefits of all kinds of emotional information to get better representative for emotional states than single emotional representative of single-modal emotion recognition. Facial expressions and audio features are the most obvious and easiest information to analyze. Thus this paper focuses on emotion recognition on facial expressions and audio features. Specific contents are as follows:
- Audio and visual feature Extraction
As for audio feature extraction, we extract INTERSPEECH 2010 Paralinguistic Challenge feature set by the openSMILE toolbox, feature dimension of which is 1582. With regard to visual feature extraction, dynamic 3-dimensional geometric features are extracted. To be mentioned, we use slow feature analysis algorithm to get emotional peek frame and then get dynamic features of fixed length.
- Classifiers for emotion recognition
In this paper, we discuss separate and joint learning. About separate learning, first we use auto-encoders for feature learning and feature reduction. Then, we update the neutral network by supervised learning. As for joint learning, reconstruction error and classification error are taken into consideration at the same time while training the network.
- Bio-modal fusion methods
First, we research on single emotion recognition on either audio or audio features. Then, we use an adaptive weighted method for decision-level fusion to get results after fusion.
- Experiments on IEMOCAP
We do experiments on IEMOCAP. we select reliable samples in improvised data, extract audio and visual emotional features and make classification. Results of tests on single-modal emotion recognition show that speech emotion recognition has a good performance on recognizing negative emotion while emotion recognition on facial expressions has a good performance on positive emotion. Results of tests on multi-modal emotion recognition show that correct recognition rate of bio-modal emotion recognition is 10% higher than single-modal emotion recognition.
KEY WORDS: Bio-modal emotion recognition, speech emotion recognition, facial expression recognition, 3-dimensional dynamic geometric feature extraction, auto-encoder, decision-level fusion
目 录
摘 要 I
Abstract II
第一章 绪论 1
1.1 选题背景和意义 1
1.2 国内外研究现状 1
1.2.1 语音情感识别的国内外研究现状 1
1.2.2 人脸表情识别的国内外研究现状 3
1.2.3 多模态情感识别的国内外研究现状 5
1.3 本文研究内容及组织结构 5
1.3.1 研究内容 5
1.3.2 组织结构 6
第二章 双模情感识别综述 8
2.1 双模情感识别系统框架 8
2.2 情感的定义与分类 8
2.3 双模情感数据库 9
2.4 特征提取方法 11
2.4.1 语音特征提取 11
2.4.2 表情特征提取 11
2.5 情感识别分类方法 12
2.6 双模态融合方法 13
2.7 本章小结 13
第三章 基于语音和图像的双模态情感识别特征提取 14
3.1 语音特征提取 14
3.1.1 语音预处理 14
3.1.2 常用语音特征和语音特征集 15
3.1.3 openSMILE提取语音特征 17
3.2 人脸表情特征提取 18
3.2.1 人脸表情数据集 18
3.2.2 人脸表情数据预处理 19
3.2.3 峰值表情自动检测 20
3.2.4 动态特征提取 22
3.3 本章小结 23
第四章 基于自动编码器的深度学习 24
4.1 BP神经网络 24
4.2 自动编码器和Softmax分类器 29
4.2.1 自动编码器 29
4.2.2 Softmax分类 31
4.3 基于自动编码器的两阶段分离学习 32
剩余内容已隐藏,请支付后下载全文,论文总字数:48571字
该课题毕业论文、开题报告、外文翻译、程序设计、图纸设计等资料可联系客服协助查找;