Journal of Graphics ›› 2026, Vol. 47 ›› Issue (2): 286-295.DOI: 10.11996/JG.j.2095-302X.2026020286
• Image Processing and Computer Vision • Previous Articles Next Articles
FANG Youjiang, WANG Shihao, ZHANG Liang, DUAN Keran, LIU Yue, WEI Xiaopeng, YANG Xin(
)
Received:2025-06-03
Accepted:2025-12-13
Online:2026-04-30
Published:2026-05-20
Contact:
YANG Xin
Supported by:CLC Number:
FANG Youjiang, WANG Shihao, ZHANG Liang, DUAN Keran, LIU Yue, WEI Xiaopeng, YANG Xin. Cross-modal consistency detection via graph topological feature extraction[J]. Journal of Graphics, 2026, 47(2): 286-295.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2026020286
Fig. 2 The architecture of the proposed GCPNet, which includes the graph topology extraction and enhancement module and the hierarchical interactive attention graph module
| 参数 | Train | Validation | Test |
|---|---|---|---|
| All | 19 812 | 2 410 | 2 409 |
| Positive | 9 572 | 1 042 | 1 037 |
| Negative | 10 240 | 1 368 | 1 372 |
Table 1 Dataset Statistics for MMSD2.0
| 参数 | Train | Validation | Test |
|---|---|---|---|
| All | 19 812 | 2 410 | 2 409 |
| Positive | 9 572 | 1 042 | 1 037 |
| Negative | 10 240 | 1 368 | 1 372 |
| 模态 | 模型 | Acc | P | R | F1 |
|---|---|---|---|---|---|
| 文本模态 | TextCNN [ | 71.61 | 64.62 | 75.22 | 69.52 |
| Bi-LSTM [ | 72.48 | 68.02 | 68.08 | 68.05 | |
| SMSD [ | 73.56 | 68.45 | 71.55 | 69.97 | |
| RoBERTa [ | 79.66 | 76.74 | 75.70 | 76.21 | |
| 图像模态 | ResNet [ | 65.50 | 61.17 | 54.39 | 57.58 |
| ViT [ | 72.02 | 65.26 | 74.83 | 69.72 | |
| 跨模态 | HFM [ | 70.57 | 64.84 | 69.05 | 66.88 |
| Att-BERT [ | 80.03 | 76.28 | 77.82 | 77.04 | |
| CMGCN [ | 79.83 | 75.82 | 78.01 | 76.90 | |
| HKE [ | 76.50 | 73.48 | 71.07 | 72.25 | |
| DIP [ | 80.59 | 75.52 | 81.14 | 78.23 | |
| DynRT [ | 70.37 | 63.02 | 75.15 | 68.55 | |
| G2SAM [ | 79.43 | 72.04 | 85.20 | 78.04 | |
| Multi-view CLIP [ | 85.64 | 80.33 | 88.24 | 84.10 | |
| DAIE [ | 84.33 | 82.43 | 81.91 | 82.17 | |
| 本文 | GCPNet (ours) | 86.06 | 83.74 | 87.15 | 85.41 |
Table 2 Experimental results on the MMSD2.0 datasets/%
| 模态 | 模型 | Acc | P | R | F1 |
|---|---|---|---|---|---|
| 文本模态 | TextCNN [ | 71.61 | 64.62 | 75.22 | 69.52 |
| Bi-LSTM [ | 72.48 | 68.02 | 68.08 | 68.05 | |
| SMSD [ | 73.56 | 68.45 | 71.55 | 69.97 | |
| RoBERTa [ | 79.66 | 76.74 | 75.70 | 76.21 | |
| 图像模态 | ResNet [ | 65.50 | 61.17 | 54.39 | 57.58 |
| ViT [ | 72.02 | 65.26 | 74.83 | 69.72 | |
| 跨模态 | HFM [ | 70.57 | 64.84 | 69.05 | 66.88 |
| Att-BERT [ | 80.03 | 76.28 | 77.82 | 77.04 | |
| CMGCN [ | 79.83 | 75.82 | 78.01 | 76.90 | |
| HKE [ | 76.50 | 73.48 | 71.07 | 72.25 | |
| DIP [ | 80.59 | 75.52 | 81.14 | 78.23 | |
| DynRT [ | 70.37 | 63.02 | 75.15 | 68.55 | |
| G2SAM [ | 79.43 | 72.04 | 85.20 | 78.04 | |
| Multi-view CLIP [ | 85.64 | 80.33 | 88.24 | 84.10 | |
| DAIE [ | 84.33 | 82.43 | 81.91 | 82.17 | |
| 本文 | GCPNet (ours) | 86.06 | 83.74 | 87.15 | 85.41 |
Fig. 4 Case study of graph topology feature extraction ((a) Input sarcasm sample; (b) Image patching and fully connected graph construction; (c) GCN-enhanced topological activation (darker red indicates high-weighted key conflicting cues))
| [1] | WEN C S, JIA G L, YANG J F. DIP: dual incongruity perceiving network for sarcasm detection[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 2540-2550. |
| [2] | VERMA P, SHUKLA N, SHUKLA A P. Techniques of sarcasm detection: a review[C]// 2021 International Conference on Advance Computing and Innovative Technologies in Engineering. New York: IEEE Press, 2021: 968-972. |
| [3] |
GODARA J, ARON R, SHABAZ M. Sentiment analysis and sarcasm detection from social network to train health-care professionals[J]. World Journal of Engineering, 2022, 19(1): 124-133.
DOI URL |
| [4] |
LI J N, PAN H L, LIN Z, et al. Sarcasm detection with commonsense knowledge[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3192-3201.
DOI URL |
| [5] | RAO M V, C S. Detection of sarcasm on amazon product reviews using machine learning algorithms under sentiment analysis[C]// The 6th International Conference on Wireless Communications, Signal Processing and Networking. New York: IEEE Press, 2021: 196-199. |
| [6] |
ZHANG Y Z, WANG J L, LIU Y C, et al. A multitask learning model for multimodal sarcasm, sentiment and emotion recognition in conversations[J]. Information Fusion, 2023, 93: 282-301.
DOI URL |
| [7] | DUTTA P, BHATTACHARYYA C K. Multi-modal sarcasm detection in social networks: a comparative review[C]// The 6th International Conference on Computing Methodologies and Communication. New York: IEEE Press, 2022: 207-214. |
| [8] | SCHIFANELLA R, DE JUAN P, TETREAULT J, et al. Detecting sarcasm in multimodal social platforms[C]// The 24th ACM International Conference on Multimedia. New York: ACM, 2016: 1136-1145. |
| [9] | XU N, ZENG Z X, MAO W J. Reasoning with multimodal sarcastic tweets via modeling cross-modality contrast and semantic association[EB/OL]. [2025-04-03]. https://aclanthology.org/2020.acl-main.349/. |
| [10] | LIANG B, LOU C W, LI X, et al. Multi-modal sarcasm detection with interactive in-modal and cross-modal graphs[C]// The 29th ACM International Conference on Multimedia. New York: ACM, 2021: 4707-4715. |
| [11] | TIAN Y, XU N, ZHANG R K, et al. Dynamic routing transformer network for multimodal sarcasm detection[EB/OL]. [2025-04-03]. https://aclanthology.org/2023.acl-long.139/. |
| [12] | WEI Y W, YUAN S Z, ZHOU H Y, et al. G2SAM: graph-based global semantic awareness method for multimodal sarcasm detection[C]// The 38th AAAI Conference on Artificial Intelligence. Washington: AAAI Press, 2024: 9151-9159. |
| [13] | RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[EB/OL]. [2025-04-03]. https://proceedings.mlr.press/v139/radford21a. |
| [14] | 甘宇祥, 王亚博, 薛均晓, 等. 基于情感特征的新冠肺炎疫情舆情演化分析[J]. 图学学报, 2021, 42(2): 222-229. |
| GAN Y X, WANG Y B, XUE J X, et al. Public opinion evolution analysis of “COVID-19 epidemic” based on sentiment feature[J]. Journal of Graphics, 2021, 42(2): 222-229 (in Chinese). | |
| [15] | 黄欢, 孙力娟, 曹莹, 等. 基于注意力的短视频多模态情感分析[J]. 图学学报, 2021, 42(1): 8-14. |
|
HUANG H, SUN L J, CAO Y, et al. Multimodal sentiment analysis of short videos based on attention[J]. Journal of Graphics, 2021, 42(1): 8-14 (in Chinese).
DOI |
|
| [16] | ALQAHTANI A, ALHENAKI L, ALSHEDDI A. Text-based sarcasm detection on social networks: a systematic review[J]. International Journal of Advanced Computer Science and Applications, 2023, 14(3): 313-328. |
| [17] |
SHRIVASTAVA M, KUMAR S. A pragmatic and intelligent model for sarcasm detection in social media text[J]. Technology in Society, 2021, 64: 101489.
DOI URL |
| [18] | GUPTA S, SINGH R, SINGLA V. Emoticon and text sarcasm detection in sentiment analysis[C]// The 1st International Conference on Sustainable Technologies for Computational Intelligence. Cham: Springer, 2020: 1-10. |
| [19] |
LIU J, TIAN S W, YU L, et al. Image-text fusion transformer network for sarcasm detection[J]. Multimedia Tools and Applications, 2024, 83(14): 41895-41909.
DOI |
| [20] | PAN H L, LIN Z, FU P, et al. Modeling intra and inter-modality incongruity for multi-modal sarcasm detection[EB/OL]. [2025-04-03]. https://aclanthology.org/2020.findings-emnlp.124/. |
| [21] | SANGWAN S, AKHTAR M S, BEHERA P, et al. I didn’t mean what I wrote! Exploring multimodality for sarcasm detection[C]// 2020 International Joint Conference on Neural Networks. New York: IEEE Press, 2020: 1-8. |
| [22] | CAI Y T, CAI H Y, WAN X J. Multi-modal sarcasm detection in twitter with hierarchical fusion model[EB/OL]. [2025-04-03]. https://aclanthology.org/P19-1239/. |
| [23] | LIANG B, LOU C W, LI X, et al. Multi-modal sarcasm detection via cross-modal graph convolutional network[EB/OL]. [2025-04-03]. https://aclanthology.org/2022.acl-long.124/. |
| [24] | LIU H, WANG W Y, LI H L. Towards multi-modal sarcasm detection via hierarchical congruity modeling with knowledge enhancement[EB/OL]. [2025-04-03]. https://aclanthology.org/2022.emnlp-main.333/. |
| [25] |
穆大强, 李腾. 基于多模态融合的人脸反欺骗技术[J]. 图学学报, 2020, 41(5): 750-756.
DOI |
| MU D Q, LI T. Face anti-spoofing technology based on multi-modal fusion[J]. Journal of Graphics, 2020, 41(5): 750-756 (in Chinese). | |
| [26] | 孙亚男, 温玉辉, 舒叶芷, 等. 融合动作特征的多模态情绪识别[J]. 图学学报, 2022, 43(6): 1159-1169. |
|
SUN Y N, WEN Y H, SHU Y Z, et al. Multimodal emotion recognition with action features[J]. Journal of Graphics, 2022, 43(6): 1159-1169 (in Chinese).
DOI |
|
| [27] | YU Z, YU J, FAN J P, et al. Multi-modal factorized bilinear pooling with co-attention learning for visual question answering[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 1839-1848. |
| [28] | YU Z, YU J, CUI Y H, et al. Deep modular co-attention networks for visual question answering[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 6274-6283. |
| [29] | LU J S, YANG J W, BATRA D, et al. Hierarchical question-image co-attention for visual question answering[C]// The 30th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2016: 289-297. |
| [30] | HAMILTON W, YING Z, LESKOVEC J. Inductive representation learning on large graphs[C]// The 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017, 30: 1025-1035. |
| [31] | QIN L B, HUANG S J, CHEN Q G, et al. MMSD2.0: towards a reliable multi-modal sarcasm detection system[EB/OL]. [2025-04-03]. https://aclanthology.org/2023.findings-acl.689/. |
| [32] | KIM Y. Convolutional neural networks for sentence classification[EB/OL]. [2025-04-03]. https://aclanthology.org/D14-1181/. |
| [33] |
GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural Networks, 2005, 18(5/6): 602-610.
DOI URL |
| [34] | XIONG T, ZHANG P R, ZHU H B, et al. Sarcasm detection with self-matching networks and low-rank bilinear pooling[C]// The World Wide Web Conference. New York: ACM, 2019: 2115-2124. |
| [35] | LIU Y H, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL]. (2019-07-26) [2025-04-03]. http://arxiv.org/abs/1907.11692. |
| [36] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778. |
| [37] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words:transformers for image recognition at scale[EB/OL]. [2025-04-03]. https://openreview.net/pdf?id=YicbFdNTTy. |
| [38] |
WU Q F, FANG W L, ZHONG W Y, et al. Dual-level adaptive incongruity-enhanced model for multimodal sarcasm detection[J]. Neurocomputing, 2025, 612: 128689.
DOI URL |
| [1] | ZHOU Qiang, HUANG Yaoqiu, SHI Weimin, ZHOU Zhong. Video attractiveness assessment method for scenic live stream recommendations [J]. Journal of Graphics, 2026, 47(2): 264-274. |
| [2] | PANG Min, LI Zhentang, ZHANG Yuan, CUI Xiaokang, XIONG Fengguang. 3D model reconstruction based on retrieval and deformation techniques [J]. Journal of Graphics, 2026, 47(2): 368-379. |
| [3] | WANG Mingwei, ZHAO Jianhua, SUN Zhihong, SUI Peng, LU Xiaojun. A study on knowledge mining and reuse for non-standard tool design based on deep belief network [J]. Journal of Graphics, 2026, 47(2): 411-422. |
| [4] | ZHAI Yongjie, WANG Zixuan, ZHANG Zhenqi, ZHOU Xunqi, WANG Qianming. A vehicle damage classification model incorporating dual attention and weighted dynamic convolution [J]. Journal of Graphics, 2026, 47(1): 17-28. |
| [5] | XIANG Mengli, HUANG Zhiyong, SHE Yali, DING Tuojun. An image matching method for large viewpoint variation scenarios [J]. Journal of Graphics, 2026, 47(1): 90-98. |
| [6] | BO Wen, JU Chen, LIU Weiqing, ZHANG Yan, HU Jingjing, CHENG Jinghan, ZHANG Changyou. Degradation-driven temporal modeling method for equipment maintenance interval prediction [J]. Journal of Graphics, 2025, 46(6): 1233-1246. |
| [7] | YU Nannan, MENG Zhengyu, FANG Youjiang, SUN Chuanyu, YIN Xuefeng, ZHANG Qiang, WEI Xiaopeng, YANG Xin. Frequency-aware hypergraph fusion for event-based semantic segmentation [J]. Journal of Graphics, 2025, 46(6): 1267-1273. |
| [8] | ZHANG Xinyun, ZHANG Liwen, ZHOU Li, LUO Xiaonan. Coffee fruit maturity prediction model based on image blocking interaction [J]. Journal of Graphics, 2025, 46(6): 1274-1280. |
| [9] | XIAO Kai, YUAN Ling, CHU Jun. Unsupervised cycle-consistent learning with dynamic memory-augmented for unmanned aerial vehicle videos tracking [J]. Journal of Graphics, 2025, 46(6): 1281-1291. |
| [10] | CAO Lujing, LU Peng. A video colorization method based on multiple reference images [J]. Journal of Graphics, 2025, 46(6): 1316-1326. |
| [11] | LIU Bokai, YIN Xuefeng, SUN Chuanyu, GE Huilin, WEI Ziqi, JIANG Yutong, PIAO Haiyin, ZHOU Dongsheng, YANG Xin. Research on UAV three-dimensional scene navigation based on deep reinforcement learning [J]. Journal of Graphics, 2025, 46(5): 1010-1017. |
| [12] | ZUO Yuqi, ZHANG Yunfeng, ZHANG Qiuyue, XU Yingcheng. Knowledge-aware recommendation based on hypergraph representation learning and Transformer model optimization [J]. Journal of Graphics, 2025, 46(5): 1050-1060. |
| [13] | ZHAI Yongjie, ZHAI Bangchao, HU Zhedong, YANG Ke, WANG Qianming, ZHAO Xiaoyu. Adaptive feature fusion pyramid and attention mechanism-based method for transmission line insulator defect detection [J]. Journal of Graphics, 2025, 46(5): 950-959. |
| [14] | YANG Jiaxi, YU Letian, BAO Qirui, BI Sheng, MA Xiaodou, Yang Shengqi, JIANG Yutong, FANG Jianru, WEI Xiaopeng, YANG Xin. Object depth estimation methods for high photon flux environments [J]. Journal of Graphics, 2025, 46(4): 756-762. |
| [15] | NIU Hang, GE Xinyu, ZHAO Xiaoyu, YANG Ke, WANG Qianming, ZHAI Yongjie. Vibration damper defect detection algorithm based on improved YOLOv8 [J]. Journal of Graphics, 2025, 46(3): 532-541. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||