Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2025, Vol. 46 ›› Issue (5): 919-930.DOI: 10.11996/JG.j.2095-302X.2025050919

• Review • Previous Articles     Next Articles

The three realms of visual turing: from seeing to imagining in the LLM era

HUANG Kaiqi1,2,3(), WU Meiqi1,2, CHEN Honghao1, FENG Xiaokun1,3, ZHANG Dailing1   

  1. 1 Center for Research on Intelligent System and Engineering & Key Laboratory of Complex System Intelligent Control and Decision, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
    2 School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
    3 School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2025-07-07 Accepted:2025-08-20 Online:2025-10-30 Published:2025-09-10
  • About author:First author contact:

    HUANG Kaiqi (1977-), professor, Ph.D. His main research interests cover computer vision and cognitive decision-making. E-mail:kaiqi.huang@nlpr.ia.ac.cn

  • Supported by:
    National Science and Technology Major Project(2022ZD0116403)

Abstract:

The Visual Turing evaluates computer vision models through a Turing-style assessment, offering a human-aligned benchmark for the advancing visual intelligence. With the advent of the large language models (LLM), computer vision technologies have advanced rapidly, achieving remarkable performance in tasks such as image classification, object detection and segmentation, and video understanding. However, despite these impressive technical achievements, there remains a significant gap between current algorithms and human visual cognition in terms of adaptability and generalization. The evolution of visual intelligence was revisited from the perspective of its three progressive levels—Seeing the Visible, Seeing the Cognized, and Seeing the Conceived—while systematically examining the limitations and challenges of current technologies. The objectivewas to drive computer vision toward a more human-like capacity for perception and cognition.

Key words: visual turing three realms, visual turing test, MLLMs, visual intelligence, human-like intelligence

CLC Number: