To address the problems of low recognition accuracy, slow detection speed, and large parameter quantities caused by the random and dense distribution of cracks in tunnel linings and low resolution of annotation boxes in existing models, the YOLOv8 network framework was improved based on the Deformable Convolution Network version 2 (DCNv2) and end-to-end Transformer Decoder to propose a lining crack detection model DTD-YOLOv8. Firstly, DCNv2 was added to fuse the YOLOv8 backbone convolutional network C2f, enabling the model to accurately and quickly perceive crack deformation features. At the same time, the Transformer Decoder replaced the YOLOv8 detection head to achieve a complete object detection process within an end-to-end framework, thereby eliminating the computational consumption caused by the Anchor-free processing mode. A self-built crack dataset was used to compare and verify seven detection models, including SSD, Faster-RCNN, RT-DETR, YOLOv3, YOLOv5, YOLOv8, and DTD-YOLOv8. The results indicated that the F1 score and mAP@50 of DTD-YOLOv8 reached 87.05% and 89.58%, respectively. Compared to the other six models, the F1 score increased by 14.16%, 7.68%, 1.55%, 41.36%, 8.20%, and 7.40%, while the mAP@50 increased by 28.84%, 15.47%, 1.33%, 47.65%, 10.14%, and 10.84%. The parameter count of the new model was only one-third of RT-DETR, and the detection speed for a single image was 16.01 ms, with an FPS of 65.46 frames per second, demonstrating a speed improvement over other comparative model. The DTD-YOLOv8 model can demonstrate efficient performance in meeting the requirements of crack detection tasks in operational tunnels.