
软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
Journal of Software,2018,29(4):10601070 [doi: 10.13328/j.cnki.jos.005412] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563
多文化场景下的多模态情感识别
陈师哲
,
王
帅
,
金
琴
(中国人民大学 信息学院,北京 100872)
通讯作者: 金琴, E-mail: qjin@ruc.edu.cn
摘 要: 自动情感识别是一个非常具有挑战性的课题,并且有着广泛的应用价值.探讨了在多文化场景下的多模
态情感识别问题.从语音声学和面部表情等模态分别提取了不同的情感特征,包括传统的手工定制特征和基于深度
学习的特征,并通过多模态融合方法结合不同的模态,比较不同单模态特征和多模态特征融合的情感识别性能.在
CHEAVD 中文多模态情感数据集和 AFEW 英文多模态情感数据集进行实验,通过跨文化情感识别研究,验证了文化
因素对于情感识别的重要影响,并提出 3 种训练策略提高在多文化场景下情感识别的性能,包括:分文化选择模型、
多文化联合训练以及基于共同情感空间的多文化联合训练,其中,基于共同情感空间的多文化联合训练通过将文化
影响与情感特征分离,在语音和多模态情感识别中均取得最好的识别效果.
关键词: 情感识别;多文化场景;语音情感特征;面部表情特征;多模态融合;深度卷积神经网络
中图法分类号: TP391
中文引用格式: 陈师哲,王帅,金琴.多文化场景下的多模态情感识别.软件学报,2018,29(4):10601070. http://www.jos.org.cn/
1000-9825/5412.htm
英文引用格式: Chen SZ, Wang S, Jin Q. Multimodal emotion recognition in multi-cultural conditions. Ruan Jian Xue Bao/
Journal of Software, 2018,29(4):10601070 (in Chinese). http://www.jos.org.cn/1000-9825/5412.htm
Multimodal Emotion Recogni tion in Multi-Cultur al Conditions
CHEN Shi-Zhe, WANG Shuai, JIN Qin
(School of Information, Renmin University of China, Beijing 100872, China)
Abstra ct : Automatic emotion recognition is a challenging task with a wide range of applications. This paper addresses the problem of
emotion recognition in multi-cultural conditions. Different multi-modal features are extracted from audio and visual modalities, and the
emotion recognition performance is compared between hand-crafted features and automatically learned features from deep neural
networks. Multimodal feature fusion is also explored to combine different modalities. The CHEAVD Chinese multimodal emotion dataset
and AFEW English multimodal emotion dataset are utilized to evaluate the proposed methods. The importance of the culture factor for
emotion recognition through cross-culture emotion recognition is demonstrated, and then three different strategies, including selecting
corresponding emotion model for different cultures, jointly training with multi-cultural datasets, and embedding features from
multi-cultural datasets into the same emotion space, are developed to improve the emotion recognition performance in the multi-cultural
environment. The embedding strategy separates the culture influence from original features and can generate more discriminative emotion
features, resulting in best performance for acoustic and multimodal emotion recognition.
Key words: emotion recognition; multi-cultural condition; acoustic emotion feature; facial expression feature; multimodal fusion;
deepconvolutional neural networks
基金项目: 国家重点研发计划(2016YFB1001200)
Foundation items: National Key Research and Development Program of China (2016YFB1001200)
本文由“多媒体大数据处理与分析”专题特约编辑赵耀教授、李波教授、华先胜研究员、文继荣教授、蒋刚毅教授、常冬霞副
教授推荐.
收稿时间: 2017-04-30; 修改时间: 2017-06-26; 采用时间: 2017-10-13; jos 在线出版时间: 2017-12-01
CNKI 网络优先出版: 2017-12-04 06:49:15, http://kns.cnki.net/kcms/detail/11.2560.TP.20171204.0649.012.html
评论