融合动作特征的多模态情绪识别

doi:10.11996/JG.j.2095-302X.2022061159

图学学报 ›› 2022, Vol. 43 ›› Issue (6): 1159-1169.DOI: 10.11996/JG.j.2095-302X.2022061159

• 图像处理与计算机视觉 • 上一篇下一篇

融合动作特征的多模态情绪识别

清华大学计算机科学与技术系，北京 100084

出版日期:2022-12-30 发布日期:2023-01-11
基金资助:
清华大学自主科研计划(20211080093)；博士后面上资助(2021M701891)；国家自然科学基金(62202257，61725204)

Multimodal emotion recognition with action features

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Online:2022-12-30 Published:2023-01-11
Supported by:
Tsinghua University Initiative Scientific Research Program (20211080093); China Postdoctoral Science Foundation (2021M701891); National Natural Science Foundation of China (62202257, 61725204)

摘要/Abstract

摘要：

近年来，利用计算机技术实现基于多模态数据的情绪识别成为自然人机交互和人工智能领域重要的研究方向之一。利用视觉模态信息的情绪识别工作通常都将重点放在脸部特征上，很少考虑动作特征以及融合动作特征的多模态特征。虽然动作与情绪之间有着紧密的联系，但是从视觉模态中提取有效的动作信息用于情绪识别的难度较大。以动作与情绪的关系作为出发点，在经典的 MELD 多模态情绪识别数据集中引入视觉模态的动作数据，采用 ST-GCN 网络模型提取肢体动作特征，并利用该特征实现基于 LSTM 网络模型的单模态情绪识别。进一步在 MELD 数据集文本特征和音频特征的基础上引入肢体动作特征，提升了基于 LSTM 网络融合模型的多模态情绪识别准确率，并且结合文本特征和肢体动作特征提升了上下文记忆模型的文本单模态情绪识别准确率，实验显示虽然肢体动作特征用于单模态情绪识别的准确度无法超越传统的文本特征和音频特征，但是该特征对于多模态情绪识别具有重要作用。基于单模态和多模态特征的情绪识别实验验证了人体动作中含有情绪信息，利用肢体动作特征实现多模态情绪识别具有重要的发展潜力。

关键词:

"> 动作特征, 情绪识别, 多模态, 动作与情绪, 视觉模态

Abstract: In recent years, using knowledge of computer science to realize emotion recognition based on multimodal data has become an important research direction in the fields of natural human-computer interaction and artificial intelligence. The emotion recognition research using visual modality information usually focuses on facial features, rarely considering action features or multimodal features fused with action features. Although action has a close relationship with emotion, it is difficult to extract valid action information from the visual modality. In this paper, we started with the relationship between action and emotion, and introduced action data extracted from visual modality to classic multimodal emotion recognition dataset, MELD. The body action features were extracted based on ST-GCN model, and the action features were applied to the LSTM model-based single-modal emotion recognition task. In addition, body action features were introduced to bi-modal emotion recognition in MELD dataset, improving the performance of the fusion model based on the LSTM network. The combination of body action features and text features enhanced the recognition accuracy of the context model with pre-trained memory compared with that only using the text features. The results of the experiment show that although the accuracy of body action features for emotion recognition is not higher than those of traditional text features and audio features, body action features play an important role in the process of multimodal emotion recognition. The experiments on emotion recognition based on single-modal and multimodal features validate that people use actions to convey their emotions, and that using body action features for emotion recognition has great potential.

Key words: , action features, emotion recognition, multimodality, action and emotion, visual modality

中图分类号:

TP 391

孙亚男, 温玉辉, 舒叶芷, 刘永进. 融合动作特征的多模态情绪识别 [J]. 图学学报, 2022, 43(6): 1159-1169.

SUN Ya-nan, WEN Yu-hui, SHU Ye-zhi, LIU Yong-jin . Multimodal emotion recognition with action features[J]. Journal of Graphics, 2022, 43(6): 1159-1169.

[1]	李小波 , 李阳贵 , 郭宁 , 范震 . 融合注意力机制的 YOLOv5 口罩检测算法[J]. 图学学报, 2023, 44(1): 16-25.
[2]	邵文斌, 刘玉杰, 孙晓瑞, 李宗民. 基于残差增强注意力的跨模态行人重识别[J]. 图学学报, 2023, 44(1): 33-40.
[3]	张倩, 王夏黎, 王炜昊, 武历展, 李超. 基于多尺度特征融合的细胞计数方法[J]. 图学学报, 2023, 44(1): 41-49.
[4]	单芳湄 , 王梦文 , 李敏 , . 融合注意力机制的肠道息肉分割多尺度卷积神经网络 [J]. 图学学报, 2023, 44(1): 50-58.
[5]	潘森垒 , 钱文华 , 曹进德 , 徐丹 . 基于注意力机制的东巴画情感分类[J]. 图学学报, 2023, 44(1): 59-66.
[6]	张晨阳, 曹艳华, 杨晓忠. 基于分数阶小波与引导滤波的多聚焦图像融合方法 [J]. 图学学报, 2023, 44(1): 77-87.
[7]	谷雨, 赵军. 列车闸瓦钎及闸瓦故障图像检测算法研究 [J]. 图学学报, 2023, 44(1): 88-94.
[8]	闫善武, 肖洪兵, 王瑜, 孙梅. 融合行人时空信息的视频异常检测 [J]. 图学学报, 2023, 44(1): 95-103.
[9]	黄志勇, 韩莎莎, 陈致君, 姚玉, 熊彪, 马凯. 一种用于视频对象分割的仿 U 形网络 [J]. 图学学报, 2023, 44(1): 104-111.
[10]	梁奥 , 李峙含 , 花海洋 , . PointMLP-FD：基于多级自适应下采样的点云分类模型 [J]. 图学学报, 2023, 44(1): 112-119.
[11]	潘东辉, 金映含, 孙旭, 刘玉生, 张东亮. CTH-Net：从线稿和颜色点生成服装图像的 CNN-Transformer 混合网络 [J]. 图学学报, 2023, 44(1): 120-130.
[12]	陈亚超 , 樊彦国 , 禹定峰 , 樊博文 . 考虑法向离群的自适应双边滤波点云平滑及 IMLS 评价方法 [J]. 图学学报, 2023, 44(1): 131-138.
[13]	王玉萍 , 曾毅 , 李胜辉 , 张磊 . 一种基于 Transformer 的三维人体姿态估计方法[J]. 图学学报, 2023, 44(1): 139-145.
[14]	王佳栋 , 曹娟 , 陈中贵 . 保特征的点云骨架提取算法[J]. 图学学报, 2023, 44(1): 146-157.
[15]	刘振晔, 陈仁杰, 刘利刚. 基于边长的三维形状插值[J]. 图学学报, 2023, 44(1): 158-165.

融合动作特征的多模态情绪识别

Multimodal emotion recognition with action features

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价