Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2024, Vol. 45 ›› Issue (6): 1231-1242.DOI: 10.11996/JG.j.2095-302X.2024061231

• Special Topic on “Large Models and Graphics Technology and Applications” • Previous Articles     Next Articles

Research on KB-VQA knowledge retrieval strategy based on implicit knowledge enhancement

ZHENG Hongyan1(), WANG Hui2, LIU Hao1, ZHANG Zhiping1, YANG Xiaojuan3, SUN Tao1()   

  1. 1. Department of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan Shandong 250353, China
    2. Affiliated Middle School of Shandong Normal University, Jinan Shandong 250014, China
    3. Faculty of Education, Shandong Normal University, Jinan Shandong 250014, China
  • Received:2024-06-21 Accepted:2024-08-17 Online:2024-12-31 Published:2024-12-24
  • Contact: SUN Tao
  • About author:First author contact:

    ZHENG Hongyan (1999-), master student. His main research interest covers visual question answering. E-mail:1679436540@qq.com

  • Supported by:
    Pilot Project for Integrated Innovative of Science, Education, and Industry of Qilu University of Technology (Shandong Academy of Sciences)(2024ZDZX08);Shandong Provincial Natural Science Foundation General Project(ZR202211190244);Shandong Province Science and Technology SME Innovation Capacity Enhancement Project(2023TSGC0212)

Abstract:

The knowledge-based visual question answering (KB-VQA) requires not only image and question information but also relevant knowledge from external sources to answer questions accurately. Existing methods typically involve using a retriever to fetch external knowledge from a knowledge base or relying on implicit knowledge from large models. However, solely depending on image and textual information often proves insufficient for acquiring the necessary knowledge. To address this issue, an enhanced retrieval strategy was proposed for both the query and external knowledge stages. On the query side, implicit knowledge from large models was utilized to enrich the existing image and question information, aiding. The retriever in locating more accurate external knowledge from the knowledge base. On the external knowledge side, a pre-simulation interaction module was introduced to enhance the external knowledge. This module generated a new lightweight vector for the knowledge vector, allowing the retriever to pre-simulate the interaction between the query and the knowledge passage, thus better capturing their semantic relationship. Experimental results demonstrated that the improved model can achieve an accuracy of 61.3% on the OK-VQA dataset by retrieving only a small amount of knowledge.

Key words: visual question answering, knowledge retrieval, text-image enhancement, pre-simulated interaction, multi-modal

CLC Number: