图学学报 ›› 2025, Vol. 46 ›› Issue (6): 1337-1345.DOI: 10.11996/JG.j.2095-302X.2025061337
刘圆圆1,2(
), 房友江1,2, 孟天宇1,2, 孟政宇1,2, 罗鹏伟1,2, 杨培根1,2, 姜雨彤3, 魏小鹏1,2, 张强1,2, 杨鑫1,2(
)
收稿日期:2024-10-09
接受日期:2025-04-15
出版日期:2025-12-30
发布日期:2025-12-27
通讯作者:杨鑫(1984-),男,教授,博士。主要研究方向为计算机图形学与计算机视觉。E-mail:xinyang@dlut.edu.cn第一作者:刘圆圆(1999-),女,博士研究生。主要研究方向为计算机图形学与场景语义理解。E-mail:Lyy990415@gmail.com
基金资助:
LIU Yuanyuan1,2(
), FANG Youjiang1,2, MENG Tianyu1,2, MENG Zhengyu1,2, LUO Pengwei1,2, YANG Peigen1,2, JIANG Yutong3, WEI Xiaopeng1,2, ZHANG Qiang1,2, YANG Xin1,2(
)
Received:2024-10-09
Accepted:2025-04-15
Published:2025-12-30
Online:2025-12-27
First author:LIU Yuanyuan (1999-), PhD candidate. Her main research interests cover computer graphics and scene semantic understanding. E-mail:Lyy990415@gmail.com
Supported by:摘要:
近年来,在计算机图形学与视觉领域,3D场景图生成(SGG)引起了广泛关注。尽管现有研究在粗分类和单一关系标签的准确性方面有所提高,但在细粒度分类和多标签情境下的表现依然不足,无法满足实际应用的需求。为此,提出了一种创新性框架,旨在充分利用上下文信息实现细粒度实体分类、多关系标签以及更高的准确性。该方法由图特征提取(GFE)模块和图上下文推理(GCI)模块组成。GFE模块负责从输入数据中提取实体及交互语义特征保留关键信息,而GCI模块通过引入传统图和超图的结构化特征,通过分析不同实体间的关系,识别邻域内的实体关联度,合并具有相似交互模式的实体,从而学习实体间的交互形式,其中引入的几何超图结构是基于场景布局动态生成的结构化组织信息。通过在3DSSG数据集上的实验评估,该框架通过融合传统图与超图对于节点和节点间关联的组织能力,有效地改善了3D场景图生成任务中的细粒度分类和多关系标签的识别能力。
中图分类号:
刘圆圆, 房友江, 孟天宇, 孟政宇, 罗鹏伟, 杨培根, 姜雨彤, 魏小鹏, 张强, 杨鑫. 基于几何超图感知的三维场景图生成[J]. 图学学报, 2025, 46(6): 1337-1345.
LIU Yuanyuan, FANG Youjiang, MENG Tianyu, MENG Zhengyu, LUO Pengwei, YANG Peigen, JIANG Yutong, WEI Xiaopeng, ZHANG Qiang, YANG Xin. Geometry hypergraph aware 3D scene graph generation[J]. Journal of Graphics, 2025, 46(6): 1337-1345.
图2 三维场景图生成框架,包含图特征提取模块(GFE)和图上下文推理模块(GCI)
Fig. 2 Overview of our proposed framework, which includes the graph feature extraction (GFE) module and the graph context inference (GCI) module
| 模型 | 对象类别预测 | 谓词预测 | 关系预测 | |||
|---|---|---|---|---|---|---|
| R@5 | R@10 | R@3 | R@5 | R@50 | R@100 | |
| PointNet† | 63.39 | 74.54 | 89.07 | 96.03 | 50.05 | 55.73 |
| MSDN† | 61.07 | 72.41 | 85.99 | 93.60 | 46.55 | 53.20 |
| KERN† | 66.58 | 76.52 | 90.13 | 96.61 | 51.36 | 58.49 |
| 3DSSG | 66.41 | 77.26 | 82.58 | 94.34 | 51.16 | 56.48 |
| BGNN† | 71.19 | 81.98 | 86.98 | 93.80 | 55.20 | 60.85 |
| SGFormer | 70.66 | 80.98 | 83.98 | 91.82 | 56.20 | 60.75 |
| CSGG | 73.40 | 82.59 | 89.90 | 96.10 | 61.94 | 68.24 |
| Ours | 75.68 | 82.97 | 90.96 | 97.41 | 63.55 | 69.72 |
表1 不同算法在3DSSG数据集下的指标评估结果
Table 1 Quantitative comparison on 3DSSG datasets
| 模型 | 对象类别预测 | 谓词预测 | 关系预测 | |||
|---|---|---|---|---|---|---|
| R@5 | R@10 | R@3 | R@5 | R@50 | R@100 | |
| PointNet† | 63.39 | 74.54 | 89.07 | 96.03 | 50.05 | 55.73 |
| MSDN† | 61.07 | 72.41 | 85.99 | 93.60 | 46.55 | 53.20 |
| KERN† | 66.58 | 76.52 | 90.13 | 96.61 | 51.36 | 58.49 |
| 3DSSG | 66.41 | 77.26 | 82.58 | 94.34 | 51.16 | 56.48 |
| BGNN† | 71.19 | 81.98 | 86.98 | 93.80 | 55.20 | 60.85 |
| SGFormer | 70.66 | 80.98 | 83.98 | 91.82 | 56.20 | 60.75 |
| CSGG | 73.40 | 82.59 | 89.90 | 96.10 | 61.94 | 68.24 |
| Ours | 75.68 | 82.97 | 90.96 | 97.41 | 63.55 | 69.72 |
| 模型 | 对象类别预测 | 谓词预测 | 关系预测 |
|---|---|---|---|
| mR@10 | mR@5 | mR@100 | |
| MSDN† | 35.51 | 62.10 | 50.17 |
| KERN† | 35.89 | 61.97 | 49.14 |
| 3DSSG | 34.43 | 63.93 | 52.21 |
| BGNN† | 41.79 | 58.98 | 54.21 |
| SGFormer | 42.37 | 47.59 | 53.52 |
| CSGG | 45.18 | 64.16 | 61.50 |
| Ours | 45.81 | 65.22 | 63.17 |
表2 3DSSG数据集下的mR指标评估结果
Table 2 mR comparison on 3DSSG datasets
| 模型 | 对象类别预测 | 谓词预测 | 关系预测 |
|---|---|---|---|
| mR@10 | mR@5 | mR@100 | |
| MSDN† | 35.51 | 62.10 | 50.17 |
| KERN† | 35.89 | 61.97 | 49.14 |
| 3DSSG | 34.43 | 63.93 | 52.21 |
| BGNN† | 41.79 | 58.98 | 54.21 |
| SGFormer | 42.37 | 47.59 | 53.52 |
| CSGG | 45.18 | 64.16 | 61.50 |
| Ours | 45.81 | 65.22 | 63.17 |
| 模型 | 模块 | 对象类别预测 | 谓词预测 | 关系预测 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| VP | MsP | GL | HL | R@5 | R@10 | R@3 | R@5 | R@50 | R@100 | |
| M0 | Union | √ | 61.07 | 72.41 | 85.99 | 93.60 | 46.55 | 53.20 | ||
| M1 | InS | √ | 61.62 | 73.02 | 83.72 | 92.99 | 46.98 | 53.92 | ||
| M2 | I+U | √ | 69.77 | 79.05 | 91.62 | 95.79 | 61.05 | 65.44 | ||
| M3 | InS | √ | √ | 73.40 | 82.59 | 89.90 | 96.10 | 61.94 | 68.24 | |
| M4 | Union | √ | √ | 71.40 | 81.98 | 87.39 | 93.64 | 60.82 | 66.43 | |
| M5 | InS | √ | √ | √ | 73.25 | 81.67 | 90.43 | 96.97 | 62.94 | 69.24 |
| M6 | Union | √ | √ | √ | 73.27 | 81.82 | 87.63 | 94.92 | 60.41 | 66.76 |
| M7 | I+U | √ | √ | √ | 75.68 | 82.97 | 90.96 | 97.41 | 63.55 | 69.72 |
表3 消融实验
Table 3 Ablation experiment
| 模型 | 模块 | 对象类别预测 | 谓词预测 | 关系预测 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| VP | MsP | GL | HL | R@5 | R@10 | R@3 | R@5 | R@50 | R@100 | |
| M0 | Union | √ | 61.07 | 72.41 | 85.99 | 93.60 | 46.55 | 53.20 | ||
| M1 | InS | √ | 61.62 | 73.02 | 83.72 | 92.99 | 46.98 | 53.92 | ||
| M2 | I+U | √ | 69.77 | 79.05 | 91.62 | 95.79 | 61.05 | 65.44 | ||
| M3 | InS | √ | √ | 73.40 | 82.59 | 89.90 | 96.10 | 61.94 | 68.24 | |
| M4 | Union | √ | √ | 71.40 | 81.98 | 87.39 | 93.64 | 60.82 | 66.43 | |
| M5 | InS | √ | √ | √ | 73.25 | 81.67 | 90.43 | 96.97 | 62.94 | 69.24 |
| M6 | Union | √ | √ | √ | 73.27 | 81.82 | 87.63 | 94.92 | 60.41 | 66.76 |
| M7 | I+U | √ | √ | √ | 75.68 | 82.97 | 90.96 | 97.41 | 63.55 | 69.72 |
| [1] |
LYU Y, SHI Y M, ZHANG X G. Improving target-driven visual navigation with attention on 3D spatial relationships[J]. Neural Processing Letters, 2022, 54(5): 3979-3998.
DOI |
| [2] |
KIM U H, PARK J M, SONG T J, et al. 3-D scene graph: a sparse and semantic representation of physical environments for intelligent agents[J]. IEEE Transactions on Cybernetics, 2020, 50(12): 4921-4933.
DOI URL |
| [3] | ZHOU Y, WHILE Z, KALOGERAKIS E. SceneGraphNet: neural message passing for 3D indoor scene augmentation[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 7383-7391. |
| [4] | DHAMO H, MANHARDT F, NAVAB N, et al. Graph-to-3D: end-to-end generation and manipulation of 3D scenes using scene graphs[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 16332-16341. |
| [5] | YANG G C, ZHANG J Y, ZHANG Y, et al. Probabilistic modeling of semantic ambiguity for scene graph generation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 12522-12531. |
| [6] | SUHAIL M, MITTAL A, SIDDIQUIE B, et al. Energy-based learning for scene graph generation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 13931-13940. |
| [7] | HERZIG R, RABOH M, CHECHIK G, et al. Mapping images to scene graphs with permutation-invariant structured prediction[C]// The 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc, 2018: 7211-7221. |
| [8] | ZAREIAN A, KARAMAN S, CHANG S F. Bridging knowledge graphs to generate scene graphs[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 606-623. |
| [9] | NEWELL A, DENG J. Pixels to graphs by associative embedding[C]// The 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 2168-2177. |
| [10] | ZHANG J, SHIH K J, ELGAMMAL A, et al. Graphical contrastive losses for scene graph parsing[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 11527-11535. |
| [11] |
LIU Y Y, LONG C J, ZHANG Z X, et al. Explore contextual information for 3D scene graph generation[J]. IEEE Transactions on Visualization and Computer Graphics, 2023, 29(12): 5556-5568.
DOI URL |
| [12] | WALD J, DHAMO H, NAVAB N, et al. Learning 3D semantic scene graphs from 3D indoor reconstructions[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 3960-3969. |
| [13] | ARMENI I, HE Z Y, ZAMIR A, et al. 3D scene graph:a structure for unified semantics, 3D space, and camera[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 5663-5672. |
| [14] | WU S C, WALD J, TATENO K, et al. SceneGraphFusion: incremental 3D scene graph prediction from RGB-D sequences[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 7511-7521. |
| [15] | KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. [2024-04-08]. https://openreview.net/forum?id=SJU4ayYgl. |
| [16] | ZHANG C Y, YU J H, SONG Y, et al. Exploiting edge-oriented reasoning for 3D point-based scene graph analysis[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 9700-9710. |
| [17] | ZHANG S L, LI S, HAO A M, et al. Knowledge-inspired 3D scene graph prediction in point cloud[C]// The 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 18620-18632. |
| [18] | GU J X, ZHAO H D, LIN Z, et al. Scene graph generation with external knowledge and image reconstruction[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 1969-1978. |
| [19] | LU C, KRISHNA R, BERNSTEIN M, et al. Visual relationship detection with language priors[C]// The 14th European Conference on Computer Vision. Cham: Springer, 2016: 852-869. |
| [20] | LV C S, QI M S, LI X, et al. SGFormer: semantic graph transformer for point cloud-based 3D scene graph generation[C]// The 38th AAAI Conference on Artificial Intelligence. Washington: AAAI, 2024: 4035-4043. |
| [21] |
BATTISTON F, CENCETTI G, IACOPINI I, et al. Networks beyond pairwise interactions: structure and dynamics[J]. Physics Reports, 2020, 874: 1-92.
DOI URL |
| [22] |
BAI S, ZHANG F H, TORR P H S. Hypergraph convolution and hypergraph attention[J]. Pattern Recognition, 2021, 110: 107637.
DOI URL |
| [23] | SUN X G, YIN H Z, LIU B, et al. Heterogeneous hypergraph embedding for graph classification[C]// The 14th ACM International Conference on Web Search and Data Mining. New York: ACM, 2021: 725-733. |
| [24] | SCHÖLKOPF B, PLATT J, HOFMANN T. Learning with hypergraphs: clustering, classification, and embedding[C]// 2006 Neural Information Processing Systems 19. New York:IEEE Press, 2007: 1601-1608. |
| [25] | FAN H Y, ZHANG F B, WEI Y X, et al. Heterogeneous hypergraph variational autoencoder for link prediction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(8): 4125-4138. |
| [26] | YANG D Q, QU B Q, YANG J, et al. Lbsn2vec++: heterogeneous hypergraph embedding for location-based social networks[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(4): 1843-1855. |
| [27] | ZHANG R C, ZOU Y S, MA J. Hyper-SAGNN: a self-attention based graph neural network for hypergraphs[EB/OL]. [2024-04-08]. https://openreview.net/group?id=ICLR.cc/2020/Conference. |
| [28] | WANG J L, DING K Z, HONG L J, et al. Next-item recommendation with sequential hypergraphs[C]// The 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2020: 1101-1110. |
| [29] | WANG J L, DING K Z, ZHU Z W, et al. Session-based recommendation with hypergraph attention networks[C]// The 21st SIAM International Conference on Data Mining. Online: SIAM, 2021: 82-90. |
| [30] | YU J L, YIN H Z, LI J D, et al. Self-supervised multi-channel hypergraph convolutional network for social recommendation[C]// 2021 Web Conference. New York: Association for Computing Machinery, 2021: 413-424. |
| [31] |
FENG Y F, JI S Y, LIU Y S, et al. Hypergraph-based multi-modal representation for open-set 3D object retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(4): 2206-2223.
DOI URL |
| [32] | MARCU A, PIRVU M, COSTEA D, et al. Self-supervised hypergraphs for learning multiple world interpretations[C]// 2023 IEEE/CVF International Conference on Computer Vision Workshops. New York: IEEE Press, 2023: 983-992. |
| [33] | ZAREIAN A, WANG Z C, YOU H X, et al. Learning visual commonsense for robust scene graph generation[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 642-657. |
| [34] | KHANDELWAL S, SUHAIL M, SIGAL L. Segmentation-grounded scene graph generation[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 15859-15869. |
| [35] | LU Y C, RAI H, CHANG J, et al. Context-aware scene graph generation with Seq2Seq transformers[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 15911-15921. |
| [36] | GUO Y Y, SONG J K, GAO L L, et al. One-shot scene graph generation[C]// The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 3090-3098. |
| [37] | ZELLERS R, YATSKAR M, THOMSON S, et al. Neural motifs: scene graph parsing with global context[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 5831-5840. |
| [38] | CHIOU M J, DING H H, YAN H S, et al. Recovering the unbiased scene graphs from the biased ones[C]// The 29th ACM International Conference on Multimedia. New York: ACM, 2021: 1581-1590. |
| [39] |
REN G H, REN L J, LIAO Y, et al. Scene graph generation with hierarchical context[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(2): 909-915.
DOI URL |
| [40] | WOO S, KIM D, CHO D, et al. LinkNet: relational embedding for scene graph[C]// The 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 558-568. |
| [41] | XU D F, ZHU Y K, CHOY C B, et al. Scene graph generation by iterative message passing[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 3097-3106. |
| [42] | CAI S F, LI L, DENG J C, et al. Rethinking graph neural architecture search from message-passing[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 6653-6662. |
| [43] |
WU Z H, PAN S R, CHEN F W, et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1): 4-24.
DOI URL |
| [44] | YANG X, TANG K H, ZHANG H W, et al. Auto-encoding scene graphs for image captioning[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 10677-10686. |
| [45] | QI M S, LI W J, YANG Z Y, et al. Attentive relational networks for mapping images to scene graphs[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 3952-3961. |
| [46] | YANG J W, LU J S, LEE S, et al. Graph R-CNN for scene graph generation[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 690-706. |
| [47] | YAO Y, ZHANG A, HAN X, et al. Visual distant supervision for scene graph generation[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 15796-15806. |
| [48] | WANG W B, WANG R P, SHAN S G, et al. Sketching image gist: human-mimetic hierarchical scene graph generation[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 222-239. |
| [49] | CONG Y R, ACKERMANN H, LIAO W T, et al. NODIS: neural ordinary differential scene understanding[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 636-653. |
| [50] | YIN G J, SHENG L, LIU B, et al. Zoom-Net: mining deep feature interactions for visual relationship recognition[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 330-347. |
| [51] | LI Y K, OUYANG W L, ZHOU B L, et al. Factorizable net: an efficient subgraph-based framework for scene graph generation[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 346-363. |
| [52] | CHARLES R Q, SU H, KAICHUN M, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 77-85. |
| [53] | LI R J, ZHANG S Y, WAN B, et al. Bipartite graph network with adaptive message passing for unbiased scene graph generation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 11104-11114. |
| [54] | LI Y K, OUYANG W L, ZHOU B L, et al. Scene graph generation from objects, phrases and region captions[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 1270-1279. |
| [55] | CHEN T S, YU W H, CHEN R Q, et al. Knowledge-embedded routing network for scene graph generation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 6156-6164. |
| [56] | HAN X G, ZHANG Z X, DU D, et al. Deep reinforcement learning of volume-guided progressive view inpainting for 3D point scene completion from a single depth image[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 234-243. |
| [57] | YANG X, WANG Y B, WANG Y R, et al. Active object reconstruction using a guided view planner[C]// The 27th International Joint Conference on Artificial Intelligence. Washington: AAAI, 2018: 4965-4971. |
| [1] | 钟国崇, 储 珺, 缪 君 . 特征融合自适应目标跟踪[J]. 图学学报, 2018, 39(5): 939-944. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||