图学学报 ›› 2024, Vol. 45 ›› Issue (4): 745-759.DOI: 10.11996/JG.j.2095-302X.2024040745
收稿日期:
2023-12-18
接受日期:
2024-05-03
出版日期:
2024-08-31
发布日期:
2024-09-03
通讯作者:
沈旭昆(1965-),男,教授,博士。主要研究方向为计算机图形学和虚拟现实等。E-mail:xkshen@buaa.edu.cn第一作者:
宫永超(1988-),男,博士后,博士。主要研究方向为计算机视觉与深度学习。E-mail:gyc_ustc@163.com
GONG Yongchao1,2(), SHEN Xukun1,2,3(
)
Received:
2023-12-18
Accepted:
2024-05-03
Published:
2024-08-31
Online:
2024-09-03
Contact:
SHEN Xukun (1965-), professor, Ph.D. His main research interests cover computer graphics and virtual reality, etc. E-mail:xkshen@buaa.edu.cnFirst author:
GONG Yongchao (1988-), Ph.D. His main research interests cover computer vision and deep learning. E-mail:gyc_ustc@163.com
摘要:
目标检测与实例分割是计算机视觉中2种重要且关系紧密的任务,但其间的关联在大多数工作中还未得到充分的探索。为此,提出了RDSNet,一种用于互惠目标检测与实例分割的深层架构。为了实现这2种任务之间协同优化,设计了一个双流式结构来联合学习目标级别和像素级别的特征表达,分别用于编码目标级别和像素级别的信息,并在双流之间引入了3个模块来实现二者的相互作用,让目标信息辅助实例分割,像素信息辅助目标检测。通过相关模块提供一种计算目标级和像素级特征相似度的手段,以便于驱动属于同一目标的特征尽可能一致,提高实例掩码的精度。裁剪模块利用目标信息为像素级感知引入实例的概念和平移变化性,以便于更准确地区分不同实例和减少背景噪声。为了进一步提高检测框与目标的贴合程度,提出了基于掩码的边界精细化模块来对掩码和检测框做融合,利用掩码的准确性优势修正检测框的误差。在COCO数据集上的大量实验分析和对比证实了RDSNet的有效性和高效性。此外,通过在边界精细化模块引入掩码打分策略,以新的方式实现了实例分割对目标检测的辅助,使RDSNet的性能得到了进一步提升。
中图分类号:
宫永超, 沈旭昆. 一种用于互惠目标检测与实例分割的深层架构[J]. 图学学报, 2024, 45(4): 745-759.
GONG Yongchao, SHEN Xukun. A deep architecture for reciprocal object detection and instance segmentation[J]. Journal of Graphics, 2024, 45(4): 745-759.
图1 检测框误差导致的实例分割误差示例((a),(b)检测框没有完全包围目标;(c),(d)检测框不贴合目标)
Fig. 1 Illustration of errors in instance segmentation due to localization errors in object detection ((a), (b) Boxes do not fully enclose objects; (c), (d) Boxes do not enclose objects tightly)
类型 | 方法 | 尺寸 | 帧率 | APm | AP50m | AP75m | APSm | AP50M | APLm |
---|---|---|---|---|---|---|---|---|---|
两阶段 | Mask R-CNN[ | 800 | 9.5 (V) | 36.2 | 58.3 | 38.6 | 16.7 | 38.8 | 51.5 |
MS R-CNN[ | 800 | 9.1 (V) | 37.4 | 57.9 | 40.4 | 17.3 | 39.5 | 53.0 | |
RetinaMask[ | 800 | 6.0 (V) | 34.7 | 55.4 | 36.9 | 14.3 | 36.7 | 50.5 | |
单阶段 | FCIS[ | 600 | 6.6 (P) | 29.2 | 49.5 | - | 7.1 | 31.3 | 50.0 |
YOLACT[ | 550 | 33.0 (P) | 29.8 | 48.5 | 31.2 | 9.9 | 31.3 | 47.7 | |
YOLACT++[ | 550 | 27.0 (P) | 34.6 | 53.8 | 36.9 | 11.9 | 36.8 | 55.1 | |
PolarMask[ | 550 | 23.9 (P) | 30.4 | 51.9 | 31.0 | 13.4 | 32.4 | 42.8 | |
RDSNet | 550 | 32.0 (P) | 32.1 | 53.0 | 33.4 | 11.0 | 33.8 | 51.0 | |
MEInst[ | 800 | 12.8 (P) | 33.9 | 56.2 | 35.4 | 19.8 | 36.1 | 42.3 | |
SOLO[ | 800 | 10.4 (V) | 37.8 | 59.5 | 40.4 | 16.4 | 40.6 | 54.2 | |
TensorMask[ | 800 | 2.6 (V) | 37.3 | 59.5 | 39.5 | 17.5 | 39.3 | 51.6 | |
RDSNet (以文献[2]为基线) | 800 | 8.8 (V) | 36.4 | 57.9 | 39.0 | 16.4 | 39.5 | 51.6 | |
RDSNet (以文件[31]为基线) | 800 | 7.5 (P) | 37.5 | 59.3 | 40.4 | 16.9 | 40.5 | 53.0 | |
RDSNet+ (以文献[2]为基线) | 800 | 8.8 (V) | 37.2 | 59.1 | 40.2 | 16.8 | 41.2 | 52.8 | |
RDSNet+ (以文献[31]为基线) | 800 | 7.5 (P) | 38.5 | 60.4 | 41.8 | 17.3 | 41.8 | 54.3 | |
性能上限 | RDSNet (基于真值框) | 800 | - | 58.7 | 68.5 | 63.1 | 49.2 | 59.0 | 75.4 |
表1 COCO test-dev数据集上的实例分割评测结果
Table 1 Instance segmentation results on COCO test-dev. P means Titan XP or 1080Ti, and V means Tesla V100
类型 | 方法 | 尺寸 | 帧率 | APm | AP50m | AP75m | APSm | AP50M | APLm |
---|---|---|---|---|---|---|---|---|---|
两阶段 | Mask R-CNN[ | 800 | 9.5 (V) | 36.2 | 58.3 | 38.6 | 16.7 | 38.8 | 51.5 |
MS R-CNN[ | 800 | 9.1 (V) | 37.4 | 57.9 | 40.4 | 17.3 | 39.5 | 53.0 | |
RetinaMask[ | 800 | 6.0 (V) | 34.7 | 55.4 | 36.9 | 14.3 | 36.7 | 50.5 | |
单阶段 | FCIS[ | 600 | 6.6 (P) | 29.2 | 49.5 | - | 7.1 | 31.3 | 50.0 |
YOLACT[ | 550 | 33.0 (P) | 29.8 | 48.5 | 31.2 | 9.9 | 31.3 | 47.7 | |
YOLACT++[ | 550 | 27.0 (P) | 34.6 | 53.8 | 36.9 | 11.9 | 36.8 | 55.1 | |
PolarMask[ | 550 | 23.9 (P) | 30.4 | 51.9 | 31.0 | 13.4 | 32.4 | 42.8 | |
RDSNet | 550 | 32.0 (P) | 32.1 | 53.0 | 33.4 | 11.0 | 33.8 | 51.0 | |
MEInst[ | 800 | 12.8 (P) | 33.9 | 56.2 | 35.4 | 19.8 | 36.1 | 42.3 | |
SOLO[ | 800 | 10.4 (V) | 37.8 | 59.5 | 40.4 | 16.4 | 40.6 | 54.2 | |
TensorMask[ | 800 | 2.6 (V) | 37.3 | 59.5 | 39.5 | 17.5 | 39.3 | 51.6 | |
RDSNet (以文献[2]为基线) | 800 | 8.8 (V) | 36.4 | 57.9 | 39.0 | 16.4 | 39.5 | 51.6 | |
RDSNet (以文件[31]为基线) | 800 | 7.5 (P) | 37.5 | 59.3 | 40.4 | 16.9 | 40.5 | 53.0 | |
RDSNet+ (以文献[2]为基线) | 800 | 8.8 (V) | 37.2 | 59.1 | 40.2 | 16.8 | 41.2 | 52.8 | |
RDSNet+ (以文献[31]为基线) | 800 | 7.5 (P) | 38.5 | 60.4 | 41.8 | 17.3 | 41.8 | 54.3 | |
性能上限 | RDSNet (基于真值框) | 800 | - | 58.7 | 68.5 | 63.1 | 49.2 | 59.0 | 75.4 |
类型 | 方法 | 尺寸 | 主干网络 | 帧率 | APbb | AP50bb | AP75bb | APSbb | APMbb | APLbb | |
---|---|---|---|---|---|---|---|---|---|---|---|
两阶段 | Mask R-CNN[ | 800 | R-101 | 9.5 (V) | 39.7 | 61.6 | 43.2 | 23.0 | 43.2 | 49.7 | |
Cascade R-CNN[ | 800 | R-101 | 6.8 (V) | 43.1 | 61.5 | 46.9 | 24.0 | 45.9 | 55.4 | ||
HTC[ | 800 | R-101 | 4.1 (V) | 45.1 | 64.3 | 49.0 | 25.2 | 48.0 | 58.2 | ||
单阶段 | YOLOv3[ | 608 | D-53 | 19.8 (P) | 33.0 | 57.9 | 34.3 | 18.3 | 35.4 | 41.9 | |
RefineDet[ | 512 | R-101 | 9.1 (P) | 36.4 | 57.5 | 39.5 | 16.6 | 39.9 | 51.4 | ||
CornerNet[ | 512 | H-104 | 4.4 (P) | 40.5 | 57.8 | 45.3 | 20.8 | 44.8 | 56.7 | ||
RDSNet | 基线[ | 800 | R-101 | 10.9 (V) | 38.1 | 58.5 | 40.8 | 21.2 | 41.5 | 48.2 | |
w/o MBRM | 8.8 (V) | 39.4 | 60.1 | 42.5 | 22.1 | 42.6 | 49.9 | ||||
with MBRM | 8.5 (V) | 40.3 | 60.1 | 43.0 | 22.1 | 43.5 | 51.5 | ||||
基线[ | 800 | R-101 | 9.1 (P) | 42.0 | 62.4 | 46.5 | 24.6 | 44.8 | 53.3 | ||
w/o MBRM | 7.5 (P) | 42.3 | 62.5 | 46.8 | 24.7 | 44.8 | 53.5 | ||||
with MBRM | 7.3 (P) | 43.2 | 63.7 | 48.0 | 25.0 | 45.2 | 56.1 | ||||
RDSNet+ | 以文献[2]为检测器 | 800 | R-101 | 8.4 (V) | 41.4 | 60.9 | 44.3 | 22.5 | 44.0 | 52.4 | |
以文献[31]为检测器 | 7.2 (P) | 44.3 | 64.1 | 49.2 | 25.3 | 45.9 | 56.8 |
表3 COCO test-dev数据集上的目标检测评测结果
Table 3 Object detection results on COCO test-dev
类型 | 方法 | 尺寸 | 主干网络 | 帧率 | APbb | AP50bb | AP75bb | APSbb | APMbb | APLbb | |
---|---|---|---|---|---|---|---|---|---|---|---|
两阶段 | Mask R-CNN[ | 800 | R-101 | 9.5 (V) | 39.7 | 61.6 | 43.2 | 23.0 | 43.2 | 49.7 | |
Cascade R-CNN[ | 800 | R-101 | 6.8 (V) | 43.1 | 61.5 | 46.9 | 24.0 | 45.9 | 55.4 | ||
HTC[ | 800 | R-101 | 4.1 (V) | 45.1 | 64.3 | 49.0 | 25.2 | 48.0 | 58.2 | ||
单阶段 | YOLOv3[ | 608 | D-53 | 19.8 (P) | 33.0 | 57.9 | 34.3 | 18.3 | 35.4 | 41.9 | |
RefineDet[ | 512 | R-101 | 9.1 (P) | 36.4 | 57.5 | 39.5 | 16.6 | 39.9 | 51.4 | ||
CornerNet[ | 512 | H-104 | 4.4 (P) | 40.5 | 57.8 | 45.3 | 20.8 | 44.8 | 56.7 | ||
RDSNet | 基线[ | 800 | R-101 | 10.9 (V) | 38.1 | 58.5 | 40.8 | 21.2 | 41.5 | 48.2 | |
w/o MBRM | 8.8 (V) | 39.4 | 60.1 | 42.5 | 22.1 | 42.6 | 49.9 | ||||
with MBRM | 8.5 (V) | 40.3 | 60.1 | 43.0 | 22.1 | 43.5 | 51.5 | ||||
基线[ | 800 | R-101 | 9.1 (P) | 42.0 | 62.4 | 46.5 | 24.6 | 44.8 | 53.3 | ||
w/o MBRM | 7.5 (P) | 42.3 | 62.5 | 46.8 | 24.7 | 44.8 | 53.5 | ||||
with MBRM | 7.3 (P) | 43.2 | 63.7 | 48.0 | 25.0 | 45.2 | 56.1 | ||||
RDSNet+ | 以文献[2]为检测器 | 800 | R-101 | 8.4 (V) | 41.4 | 60.9 | 44.3 | 22.5 | 44.0 | 52.4 | |
以文献[31]为检测器 | 7.2 (P) | 44.3 | 64.1 | 49.2 | 25.3 | 45.9 | 56.8 |
图6 COCO val2017数据集上的定性结果展示((a) Mask R-CNN;(b)无MBRM的RDSNet;(c)完整RDSNet)
Fig. 6 Visual comparisons of some results on COCO val2017 ((a) Mask R-CNN; (b) A variant of RDSNet w/o MBRM; (c) RDSNet)
图7 不同方法在COCO test-dev数据集上速度和精度的对比((a)实例分割方法对比;(b)目标检测方法对比)
Fig. 7 Comparisons of speed and APm on COCO test-dev by different methods ((a) Instance segmentation methods; (b) Object detection methods)
No. | 方法 | 模块 | TE | OHEM | IE | 帧率 | APm |
---|---|---|---|---|---|---|---|
1 | YOLACT[ | LC | 33 | 29.9 | |||
2 | RDSNets | Corr | 32 | 31.0+1.1 | |||
3 | √ | 30.0 | |||||
4 | √ | 30.7 | |||||
5 | √ | 31.2 | |||||
6 | √ | √ | 30.8 | ||||
7 | √ | √ | 31.6 | ||||
8 | √ | √ | √ | 31.8+1.9 | |||
9 | RDSNetf | Corr | √ | √ | 29 | 28.8 | |
10 | √ | √ | √ | 28.5 |
表2 COCO val2017数据集上裁剪模块有效性验证结果
Table 2 Demonstration of the effectiveness of the cropping module on COCO val2017
No. | 方法 | 模块 | TE | OHEM | IE | 帧率 | APm |
---|---|---|---|---|---|---|---|
1 | YOLACT[ | LC | 33 | 29.9 | |||
2 | RDSNets | Corr | 32 | 31.0+1.1 | |||
3 | √ | 30.0 | |||||
4 | √ | 30.7 | |||||
5 | √ | 31.2 | |||||
6 | √ | √ | 30.8 | ||||
7 | √ | √ | 31.6 | ||||
8 | √ | √ | √ | 31.8+1.9 | |||
9 | RDSNetf | Corr | √ | √ | 29 | 28.8 | |
10 | √ | √ | √ | 28.5 |
图8 COCO val2017数据集上裁剪模块采用不同正负样本比例和难例挖掘策略下的实例分割结果
Fig. 8 Instance segmentation results on COCO val2017 obtained by different ratios of positive/negative samples and OHEM strategies in the cropping module
图9 COCO val2017数据集上的像素特征表达可视化展示((a)原图;(b)像素特征表达可视化图)
Fig. 9 Visualization of pixel representation of some results on COCO val2017 ((a) Original image; (b) Visualization of feature representation)
图10 COCO val2017数据集上的定性结果展示((a),(c)无MBRM的RDSNet;(b),(d)完整的RDSNet)
Fig. 10 Visual comparisons of some results on COCO val2017 ((a), (c) RDSNet w/o MBRM; (b), (d) RDSNet)
方法 | APbb | APSbb | APMbb | APLbb |
---|---|---|---|---|
RetinaNet [ | 35.9 | 17.1 | 39.7 | 53.3 |
BB-of-Mask | 34.2−1.7 | 11.8−5.3 | 37.7−2.0 | 55.1+1.8 |
MBRM | 37.2+1.3 | 16.9−0.2 | 40.8+1.1 | 56.5+3.2 |
表4 COCO val2017数据集上MBRM有效性验证结果
Table 4 Demonstration of the effectiveness of MBRM on COCO val2017
方法 | APbb | APSbb | APMbb | APLbb |
---|---|---|---|---|
RetinaNet [ | 35.9 | 17.1 | 39.7 | 53.3 |
BB-of-Mask | 34.2−1.7 | 11.8−5.3 | 37.7−2.0 | 55.1+1.8 |
MBRM | 37.2+1.3 | 16.9−0.2 | 40.8+1.1 | 56.5+3.2 |
图13 COCO test-dev数据集上关于检测器性能对RDSNet性能影响的结果展示((a)不同基线检测器指标;(b)不同检测器下的RDSNet目标检测指标;(c)不同检测器下的RDSNet实例分割指标)
Fig. 13 Quantitative results on COCO test-dev for verifying the impact of the performance of detectors on RDSNet ((a) Results of different base detectors; (b) Results of RDSNet object detection with different base detectors; (c) Results of RDSNet instance segmentation with different base detectors)
[1] |
REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
DOI PMID |
[2] | LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 42(2): 318-327. |
[3] | DAI J F, LI Y, HE K M, et al. R-FCN: Object detection via region-based fully convolutional networks[C]// The 30th International Conference on Neural Information Processing Systems. New York: ACM, 2016: 379-387. |
[4] | HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]// 2017 International Conference on Computer Vision. New York: IEEE Press, 2017: 2980-2988. |
[5] | DAI J F, HE K M, SUN J. Instance-aware semantic segmentation via multi-task network cascades[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 3150-3158. |
[6] | LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 8759-8768. |
[7] | FU C Y, SHVETS M, BERG A C. RetinaMask: learning to predict masks improves state-of-the-art single-shot detection for free[EB/OL]. [2023-10-18]. http://arxiv.org/abs/1901.03353. |
[8] | CHEN H, SUN K Y, TIAN Z, et al. BlendMask: top-down meets bottom-up for instance segmentation[C]// 2020 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 8570-8578. |
[9] | LEE Y W, PARK J. Centermask: real-time anchor-free instance segmentation[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 2594-2603. |
[10] | BOLYA D, ZHOU C, XIAO F Y, et al. Yolact: Real-time instance segmentation[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 9160-9169. |
[11] | TIAN Z, SHEN C H, CHEN H. Conditional convolutions for instance segmentation[C]// European Conference on Computer Vision. Cham: Springer, 2020: 282-298. |
[12] | LI Y, QI H Z, DAI J F, et al. Fully convolutional instance-aware semantic segmentation[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 4438-4446. |
[13] | WANG X L, KONG T, SHEN C H, et al. SOLO: segmenting objects by locations[M]//Computer Vision-ECCV 2020. Cham: Springer, 2020: 649-665. |
[14] | CHEN X L, WANG P, CHENG G, et al. Tensormask: surpassing pixel-level encoding for instance segmentation[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 2061-2069. |
[15] | DE BRABANDERE B, NEVEN D, VAN GOOL L. Semantic instance segmentation with a discriminative loss function[EB/OL]. [2023-10-18]. http://arxiv.org/abs/1708.02551. |
[16] | FATHI A, WOJNA Z, RATHOD V, et al. Semantic instance segmentation via deep metric learning[EB/OL]. [2023-10-18]. http://arxiv.org/abs/1703.10277. |
[17] | WANG S R, GONG Y C, XING J L, et al. RDSNet: a new deep architecture for Reciprocal object detection and instance segmentation[C]// Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12208-12215. |
[18] | LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]// Computer Vision - ECCV 2014. Cham: Springer, 2014: 740-755. |
[19] | CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]// 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 3213-3223. |
[20] | LIN T Y, DOLLÁR P, GIRSHICK R B, et al. Feature pyramid networks for object detection[C]// 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2117-2125. |
[21] | LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 3431-3440. |
[22] | SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet: a unified embedding for face recognition and clustering[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 815-823. |
[23] | SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training region-based object detectors with online hard example mining[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 761-769. |
[24] | CHEN K, WANG J Q, PANG J M, et al. MMDetection: open MMLab detection toolbox and benchmark[EB/OL]. [2023- 10-18]. http://arxiv.org/abs/1906.07155. |
[25] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778. |
[26] | KIRILLOV A, GIRSHICK R, HE K M, et al. Panoptic feature pyramid networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 6392-6401. |
[27] | HUANG Z J, HUANG L C, GONG Y C, et al. Mask scoring R-CNN[C]// 2019 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 6402-6411. |
[28] | BOLYA D, ZHOU C, XIAO F Y, et al. YOLACT better real-time instance segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(2): 1108-1121. |
[29] | XIE E Z, SUN P Z, SONG X G, et al. PolarMask: single shot instance segmentation with polar representation[EB/OL]. [2023-10-18]. http://arxiv.org/abs/1909.13226. |
[30] | ZHANG R F, TIAN Z, SHEN C H, et al. Mask encoding for single shot instance segmentation[EB/OL]. [2023-10-18]. http://arxiv.org/abs/2003.11712. |
[31] | CHEN Y T, HAN C X, WANG N Y, et al. Revisiting feature alignment for one-stage object detection[EB/OL]. [2023-10-18]. http://arxiv.org/abs/1908.01570. |
[32] | CAI Z W, VASCONCELOS N. Cascade r-cnn: Delving into high quality object detection[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 6154-6162. |
[33] | CHEN K, PANG J M, WANG J Q, et al. Hybrid task cascade for instance segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 4974-4983. |
[34] | REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2023-10-18]. http://arxiv.org/abs/1804.02767. |
[35] | ZHANG S F, WEN L Y, BIAN X, et al. Single-shot refinement neural network for object detection[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 4203-4212. |
[36] | LAW H, DENG J. CornerNet: detecting objects as paired keypoints[M]//Computer Vision-ECCV 2018. Cham: Springer, 2018: 765-781. |
[37] | NEWELL A, YANG K Y, DENG J. Stacked hourglass networks for human pose estimation[M]//Computer Vision-ECCV 2016. Cham: Springer, 2016: 483-499. |
[38] | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[M]//Computer Vision-ECCV 2016. Cham: Springer, 2016: 21-37. |
[39] | 梁正兴, 王先兵, 何涛, 等. 实例分割和边缘优化算法的研究与实现[J]. 图学学报, 2020, 41(6): 939-946. |
LIANG Z X, WANG X B, HE T, et al. Research and implementation of instance segmentation and edge optimization algorithm[J]. Journal of Graphics, 2020, 41(6): 939-946 (in Chinese). | |
[40] | 崔振东, 李宗民, 杨树林, 等. 基于语义分割引导的三维目标检测[J]. 图学学报, 2022, 43(6): 1134-1142. |
CUI Z D, LI Z M, YANG S L, et al. 3D object detection based on semantic segmentation guidance[J]. Journal of Graphics, 2022, 43(6): 1134-1142 (in Chinese). |
[1] | 翟永杰, 李佳蔚, 陈年昊, 王乾铭, 王新颖. 融合改进 Transformer 的车辆部件检测方法[J]. 图学学报, 2024, 45(5): 930-940. |
[2] | 姜晓恒, 段金忠, 卢洋, 崔丽莎, 徐明亮. 融合先验知识推理的表面缺陷检测[J]. 图学学报, 2024, 45(5): 957-967. |
[3] | 章东平 , 魏杨悦 , 何数技 , 徐云超 , 胡海苗 , 黄文君 . 特征融合与层间传递:一种基于Anchor DETR改进的目标检测方法[J]. 图学学报, 2024, 45(5): 968-978. |
[4] | 李建华 , 韩宇 , 石开铭 , 张可嘉 , 郭红领 , 方东平 , 曹佳明 . 施工现场小目标工人检测方法[J]. 图学学报, 2024, 45(5): 1040-1049. |
[5] | 孙己龙 , 刘勇 , 周黎伟 , 路鑫 , 侯小龙 , 王亚琼 , 王志丰 . 基于DCNv2和Transformer Decoder的隧道衬砌裂缝高效检测模型研究[J]. 图学学报, 2024, 45(5): 1050-1061. |
[6] | 李大湘, 吉展, 刘颖, 唐垚. 改进YOLOv7遥感图像目标检测算法[J]. 图学学报, 2024, 45(4): 650-658. |
[7] | 胡欣, 常娅姝, 秦皓, 肖剑, 程鸿亮. 基于改进YOLOv8和GMM图像点集匹配的双目测距方法[J]. 图学学报, 2024, 45(4): 714-725. |
[8] | 牛为华, 郭迅. 基于改进YOLOv8的船舰遥感图像旋转目标检测算法[J]. 图学学报, 2024, 45(4): 726-735. |
[9] | 曾志超, 徐玥, 王景玉, 叶元龙, 黄志开, 王欢. 基于SOE-YOLO轻量化的水面目标检测算法[J]. 图学学报, 2024, 45(4): 736-744. |
[10] | 张相胜, 杨骁. 基于改进YOLOv7-tiny的橡胶密封圈缺陷检测方法[J]. 图学学报, 2024, 45(3): 446-453. |
[11] | 胡欣, 胡帅, 马丽军, 司利云, 肖剑, 袁晔. 基于融合MBAM与YOLOv5的PCB缺陷检测方法[J]. 图学学报, 2024, 45(1): 47-55. |
[12] | 魏陈浩, 杨睿, 刘振丙, 蓝如师, 孙希延, 罗笑南. 具有双层路由注意力的YOLOv8道路场景目标检测方法[J]. 图学学报, 2023, 44(6): 1104-1111. |
[13] | 王大阜, 王静, 石宇凯, 邓志文, 贾志勇. 基于深度迁移学习的图像隐私目标检测研究[J]. 图学学报, 2023, 44(6): 1112-1120. |
[14] | 李利霞, 王鑫, 王军, 张又元. 基于特征融合与注意力机制的无人机图像小目标检测算法[J]. 图学学报, 2023, 44(4): 658-666. |
[15] | 邓渭铭, 杨铁军, 李纯纯, 黄琳. 基于神经网络架构搜索的铭牌目标检测方法[J]. 图学学报, 2023, 44(4): 718-727. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||