图学学报 ›› 2025, Vol. 46 ›› Issue (1): 104-113.DOI: 10.11996/JG.j.2095-302X.2025010104
收稿日期:
2024-08-22
接受日期:
2024-11-21
出版日期:
2025-02-28
发布日期:
2025-02-14
通讯作者:
邓正杰(1980-),男,副教授,博士。主要研究方向为人工智能、网络空间安全、虚拟现实和计算机教育等。E-mail:jet_dunn@qq.com第一作者:
董佳乐(1999-),女,硕士研究生。主要研究方向为人工智能系统安全。E-mail:dongjiale1107@163.com
基金资助:
DONG Jiale1(), DENG Zhengjie1,2(
), LI Xiyan1, WANG Shiyun1
Received:
2024-08-22
Accepted:
2024-11-21
Published:
2025-02-28
Online:
2025-02-14
Contact:
DENG Zhengjie (1980-), associate professor, Ph.D. His main research interests cover artificial intelligence, cyberspace security, virtual reality and computer education, etc. E-mail:jet_dunn@qq.comFirst author:
DONG Jiale (1999-), master student. Her main research interest covers security of artificial intelligence systems. E-mail:dongjiale1107@163.com
Supported by:
摘要:
在当今社会,面部伪造技术的迅速发展对社会安全构成了巨大挑战,尤其是在深度学习技术被广泛应用于生成逼真的伪造视频的背景下。这些高质量的伪造内容不仅威胁到个人隐私,还可能被用于不法活动。面对这一挑战,传统的基于单一特征的伪造检测方法已经难以满足检测需求。因此,提出了一种基于频域和空域多特征融合的深度伪造检测方法,以提高面部伪造内容的检测准确率和泛化能力。并将频域动态划分为3个频带来提取在空域中无法挖掘的伪造伪影;对空域使用EfficientNet_b4网络和Transformer架构多尺度划分图像块来计算不同块的差异、根据上下图像块之间的一致性信息来进行检测以及捕获更精细的伪造特征信息;最后使用查询-键-值机制的融合块,将上述中的频域和空域的方法进行融合,从而更全面地挖掘到2个域中的特征信息,提升伪造检测的准确性和迁移性。大量的实验结果显示该方法有效,其性能明显优于传统深度伪造检测方法。
中图分类号:
董佳乐, 邓正杰, 李喜艳, 王诗韵. 基于频域和空域多特征融合的深度伪造检测方法[J]. 图学学报, 2025, 46(1): 104-113.
DONG Jiale, DENG Zhengjie, LI Xiyan, WANG Shiyun. Deepfake detection method based on multi-feature fusion of frequency domain and spatial domain[J]. Journal of Graphics, 2025, 46(1): 104-113.
图4 真伪图像分析示例((a)真实图像;(b)~(c)伪造图像,其中像素点亮度提高了0.3)
Fig. 4 Example of real vs fake image analysis ((a) Real image; (b)~(c) Forged images, where the brightness of pixels is increased by 0.3)
模型 | DF | FS | F2F | NT |
---|---|---|---|---|
文献[ | 93.40 | 92.93 | 93.51 | 79.37 |
文献[ | 91.65 | 87.03 | 90.73 | 60.57 |
文献[ | 94.50 | 84.50 | 80.30 | 74.00 |
文献[ | 97.30 | 94.20 | 81.80 | 79.40 |
文献[ | 97.80 | 93.40 | 88.70 | 84.20 |
本文模型 | 95.88 | 95.82 | 95.57 | 87.00 |
表1 在FaceForensics++数据集上的准确率结果/%
Table 1 Accuracy results on the FaceForensics++ dataset/%
模型 | DF | FS | F2F | NT |
---|---|---|---|---|
文献[ | 93.40 | 92.93 | 93.51 | 79.37 |
文献[ | 91.65 | 87.03 | 90.73 | 60.57 |
文献[ | 94.50 | 84.50 | 80.30 | 74.00 |
文献[ | 97.30 | 94.20 | 81.80 | 79.40 |
文献[ | 97.80 | 93.40 | 88.70 | 84.20 |
本文模型 | 95.88 | 95.82 | 95.57 | 87.00 |
模型 | Insight | Text2img | Inpainting | |||
---|---|---|---|---|---|---|
ACC/% | AUC | ACC/% | AUC | ACC/% | AUC | |
RECCE[ | 58.99 | 63.12 | 38.14 | 35.12 | 51.35 | 51.52 |
本文模型 | 90.21 | 96.33 | 96.50 | 99.32 | 92.81 | 97.98 |
表2 在DFF数据集上的结果
Table 2 Results on the DFF dataset
模型 | Insight | Text2img | Inpainting | |||
---|---|---|---|---|---|---|
ACC/% | AUC | ACC/% | AUC | ACC/% | AUC | |
RECCE[ | 58.99 | 63.12 | 38.14 | 35.12 | 51.35 | 51.52 |
本文模型 | 90.21 | 96.33 | 96.50 | 99.32 | 92.81 | 97.98 |
划分方式 | DF | FS | F2F | NT |
---|---|---|---|---|
能量 | 95.73 | 95.26 | 95.36 | 86.05 |
信息熵 | 95.08 | 95.27 | 95.29 | 85.61 |
平均 | 95.34 | 95.36 | 95.16 | 85.27 |
动态 | 95.88 | 95.82 | 95.57 | 87.00 |
表3 对频域的不同划分方式实验准确率比较/%
Table 3 Comparison of experimental accuracy on different frequency domain partitioning methods/%
划分方式 | DF | FS | F2F | NT |
---|---|---|---|---|
能量 | 95.73 | 95.26 | 95.36 | 86.05 |
信息熵 | 95.08 | 95.27 | 95.29 | 85.61 |
平均 | 95.34 | 95.36 | 95.16 | 85.27 |
动态 | 95.88 | 95.82 | 95.57 | 87.00 |
划分方式 | DF | FS | F2F | NT |
---|---|---|---|---|
能量 | 99.33 | 99.34 | 99.14 | 93.58 |
信息熵 | 99.30 | 99.20 | 99.30 | 93.28 |
平均 | 99.37 | 99.25 | 99.23 | 93.94 |
动态 | 99.39 | 99.34 | 99.33 | 94.66 |
表4 对频域的不同划分方式实验AUC值比较/%
Table 4 Comparison of experimental AUC values on different frequency domain partitioning methods/%
划分方式 | DF | FS | F2F | NT |
---|---|---|---|---|
能量 | 99.33 | 99.34 | 99.14 | 93.58 |
信息熵 | 99.30 | 99.20 | 99.30 | 93.28 |
平均 | 99.37 | 99.25 | 99.23 | 93.94 |
动态 | 99.39 | 99.34 | 99.33 | 94.66 |
块大小 | DF | FS | F2F | NT |
---|---|---|---|---|
80×80 | 93.47 | 93.58 | 93.13 | 83.02 |
40×40 | 94.72 | 94.29 | 94.39 | 83.90 |
20×20 | 94.88 | 94.82 | 95.08 | 85.61 |
10×10 | 95.55 | 94.97 | 95.16 | 86.42 |
本文模型 | 95.88 | 95.82 | 95.57 | 87.00 |
表5 空域不同尺度划分方式实验准确率比较/%
Table 5 Comparison of experimental accuracy on different scale partitioning methods in the spatial domain module/%
块大小 | DF | FS | F2F | NT |
---|---|---|---|---|
80×80 | 93.47 | 93.58 | 93.13 | 83.02 |
40×40 | 94.72 | 94.29 | 94.39 | 83.90 |
20×20 | 94.88 | 94.82 | 95.08 | 85.61 |
10×10 | 95.55 | 94.97 | 95.16 | 86.42 |
本文模型 | 95.88 | 95.82 | 95.57 | 87.00 |
方法 | FaceForensis++ | Celeb_DF |
---|---|---|
Two-stream[ | 70.10 | 53.83 |
Meso4[ | 84.70 | 54.80 |
DSP-FWA[ | 93.00 | 64.60 |
Capsule[ | 96.60 | 57.50 |
Two Branch[ | 93.18 | 73.41 |
SMIL[ | 96.80 | 56.30 |
本文模型 | 97.19 | 73.81 |
表6 不同方法的跨数据集AUC结果
Table 6 AUC results across different datasets for various methods
方法 | FaceForensis++ | Celeb_DF |
---|---|---|
Two-stream[ | 70.10 | 53.83 |
Meso4[ | 84.70 | 54.80 |
DSP-FWA[ | 93.00 | 64.60 |
Capsule[ | 96.60 | 57.50 |
Two Branch[ | 93.18 | 73.41 |
SMIL[ | 96.80 | 56.30 |
本文模型 | 97.19 | 73.81 |
图7 基线模型和本文模型可解释性分析结果((a)原图;(b)基线模型;(c)本文模型)
Fig. 7 Interpretability analysis results of the baseline model and the model proposed in this paper ((a) Original image; (b) Baseline model; (c) Ours)
模型 | 空域 | 频域 | 融合 | DF | FS | F2F | NT |
---|---|---|---|---|---|---|---|
1 | √ | - | - | 95.41 | 95.08 | 95.01 | 86.14 |
2 | - | √ | - | 95.54 | 95.76 | 95.13 | 85.35 |
3 | √ | √ | - | 95.50 | 95.80 | 95.35 | 86.57 |
4 | √ | √ | √ | 95.88 | 95.82 | 95.57 | 87.00 |
表7 消融实验结果/%
Table 7 Ablation experiment results/%
模型 | 空域 | 频域 | 融合 | DF | FS | F2F | NT |
---|---|---|---|---|---|---|---|
1 | √ | - | - | 95.41 | 95.08 | 95.01 | 86.14 |
2 | - | √ | - | 95.54 | 95.76 | 95.13 | 85.35 |
3 | √ | √ | - | 95.50 | 95.80 | 95.35 | 86.57 |
4 | √ | √ | √ | 95.88 | 95.82 | 95.57 | 87.00 |
[1] | WANG B, HUANG L Q, HUANG T Q, et al. Two-stream Xception structure based on feature fusion for DeepFake detection[J]. International Journal of Computational Intelligence Systems, 2023, 16(1): 134. |
[2] | LIU D C, CHEN T, PENG C L, et al. Attention consistency refined masked frequency forgery representation for generalizing face forgery detection[EB/OL]. [2024-06-20]. https://arxiv.org/abs/2307.11438. |
[3] | LUAN T, LIANG G Q, PEI P F. Interpretable DeepFake detection based on frequency spatial transformer[J]. International Journal of Emerging Technologies and Advanced Applications, 2024, 1(2): 19-25. |
[4] | COCCOMINI D A, MESSINA N, GENNARO C, et al. Combining EfficientNet and vision transformers for video deepfake detection[C]// The 21st International Conference on Image Analysis and Processing. Cham: Springer, 2022: 219-229. |
[5] | ZHU X Y, FEI H Y, ZHANG B, et al. Face forgery detection by 3D decomposition and composition search[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(7): 8342-8357. |
[6] | DURALL R, KEUPER M, PFREUNDT F J, et al. Unmasking DeepFakes with simple features[EB/OL]. [2024-06-20]. https://arxiv.org/abs/1911.00686. |
[7] | GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]// The 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014: 2672-2680. |
[8] | KINGMA D P. Auto-encoding variational Bayes[EB/OL]. [2024-06-20]. https://openreview.net/forum?id=33X9fd2-9FyZd. |
[9] | WANG S Y, WANG O, ZHANG R, et al. CNN-generated images are surprisingly easy to spot... for now[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 8692-8701. |
[10] | BAI W M, LIU Y F, ZHANG Z P, et al. AUNet: learning relations between action units for face forgery detection[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 24709-24719. |
[11] | TAN M X, LE Q. EfficientNet: rethinking model scaling for convolutional neural networks[EB/OL]. [2024-06-20]. https://proceedings.mlr.press/v97/tan19a.html. |
[12] | LIANG W Y, WU Y F, WU J S, et al. FAClue: exploring frequency clues by adaptive frequency-attention for Deepfake detection[C]// 2023 42nd Chinese Control Conference. New York: IEEE Press, 2023: 7621-7626. |
[13] | AFCHAR D, NOZICK V, YAMAGISHI J, et al. MesoNet: a compact facial video forgery detection network[C]// 2018 IEEE International Workshop on Information Forensics and Security. New York: IEEE Press, 2018: 1-7. |
[14] | QIAN Y Y, YIN G J, SHENG L, et al. Thinking in frequency: face forgery detection by mining frequency-aware clues[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 86-103. |
[15] | NIRKIN Y, WOLF L, KELLER Y, et al. DeepFake detection based on discrepancies between faces and their context[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(10): 6111-6121. |
[16] | LIN K H, HAN W H, LI S D, et al. IR-Capsule: two-stream network for face forgery detection[J]. Cognitive Computation, 2023, 15(1): 13-22. |
[17] | JIN X, WU N, JIANG Q, et al. A dual descriptor combined with frequency domain reconstruction learning for face forgery detection in deepfake videos[J]. Forensic Science International: Digital Investigation, 2024, 49: 301747. |
[18] | SONG H X, HUANG S Y, DONG Y P, et al. Robustness and generalizability of deepfake detection: a study with diffusion models[EB/OL]. [2024-06-20]. https://arxiv.org/abs/2309.02218. |
[19] |
翟永杰, 李佳蔚, 陈年昊, 等. 融合改进Transformer的车辆部件检测方法[J]. 图学学报, 2024, 45(5): 930-940.
DOI |
ZHAI Y J, LI J W, CHEN N H, et al. The vehicle parts detection method enhanced with Transformer integration[J]. Journal of Graphics, 2024, 45(5): 930-940 (in Chinese).
DOI |
|
[20] | ZHOU P, HAN X T, MORARIU V I, et al. Two-stream neural networks for tampered face detection[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2017: 1831-1839. |
[21] | LI Y Z, LYU S W. Exposing DeepFake videos by detecting face warp artifacts[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2019: 46-52. |
[22] | NGUYEN H H, YAMAGISHI J, ECHIZEN I. Capsule-forensics: using capsule networks to detect forged images and videos[C]// ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing. New York: IEEE Press, 2019: 2307-2311. |
[23] | MASI I, KILLEKAR A, MASCARENHAS R M, et al. Two-branch recurrent network for isolating deepfakes in videos[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 667-684. |
[24] | LI X D, LANG Y N, CHEN Y F, et al. Sharp multiple instance learning for DeepFake video detection[C]// The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 1864-1872. |
[25] |
李滔, 胡婷, 武丹丹. 结合金字塔结构和注意力机制的单目深度估计[J]. 图学学报, 2024, 45(3): 454-463.
DOI |
LI T, HU T, WU D D. Monocular depth estimation combining pyramid structure and attention mechanism[J]. Journal of Graphics, 2024, 45(3): 454-463 (in Chinese).
DOI |
[1] | 卢洋, 陈林慧, 姜晓恒, 徐明亮. SDENet:基于多尺度注意力质量感知的合成缺陷数据评价网络[J]. 图学学报, 2025, 46(1): 94-103. |
[2] | 闫建红, 冉同霄. 基于YOLOv8的轻量化无人机图像目标检测算法[J]. 图学学报, 2024, 45(6): 1328-1337. |
[3] | 吴沛宸, 袁立宁, 胡皓, 刘钊, 郭放. 基于注意力特征融合的视频异常行为检测[J]. 图学学报, 2024, 45(5): 922-929. |
[4] | 刘丽, 张起凡, 白宇昂, 黄凯烨. 结合Swin Transformer的多尺度遥感图像变化检测研究[J]. 图学学报, 2024, 45(5): 941-956. |
[5] | 章东平, 魏杨悦, 何数技, 徐云超, 胡海苗, 黄文君. 特征融合与层间传递:一种基于Anchor DETR改进的目标检测方法[J]. 图学学报, 2024, 45(5): 968-978. |
[6] | 罗智徽, 胡海涛, 马潇峰, 程文刚. 基于同质中间模态的跨模态行人再识别方法[J]. 图学学报, 2024, 45(4): 670-682. |
[7] | 牛为华, 郭迅. 基于改进YOLOv8的船舰遥感图像旋转目标检测算法[J]. 图学学报, 2024, 45(4): 726-735. |
[8] | 艾列富, 陶勇, 蒋常玉. 基于全局注意力的正交融合图像描述符[J]. 图学学报, 2024, 45(3): 472-481. |
[9] | 崔克彬, 焦静颐. 基于MCB-FAH-YOLOv8的钢材表面缺陷检测算法[J]. 图学学报, 2024, 45(1): 112-125. |
[10] | 张丽媛, 赵海蓉, 何巍, 唐雄风. 融合全局-局部注意模块的Mask R-CNN膝关节囊肿检测方法[J]. 图学学报, 2023, 44(6): 1183-1190. |
[11] | 石佳豪, 姚莉. 基于语义引导的视频描述生成[J]. 图学学报, 2023, 44(6): 1191-1201. |
[12] | 李利霞, 王鑫, 王军, 张又元. 基于特征融合与注意力机制的无人机图像小目标检测算法[J]. 图学学报, 2023, 44(4): 658-666. |
[13] | 李鑫, 普园媛, 赵征鹏, 徐丹, 钱文华. 内容语义和风格特征匹配一致的艺术风格迁移[J]. 图学学报, 2023, 44(4): 699-709. |
[14] | 李雨, 闫甜甜, 周东生, 魏小鹏. 基于注意力机制与深度多尺度特征融合的自然场景文本检测[J]. 图学学报, 2023, 44(3): 473-481. |
[15] | 刘冰, 叶成绪. 面向不平衡数据的肺部疾病细粒度分类模型[J]. 图学学报, 2023, 44(3): 513-520. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||