DRec：大语言模型驱动的数据分析推荐系统

doi:10.11996/JG.j.2095-302X.2025051028

图学学报 ›› 2025, Vol. 46 ›› Issue (5): 1028-1041.DOI: 10.11996/JG.j.2095-302X.2025051028

• 计算机图形学与虚拟现实 • 上一篇下一篇

DRec：大语言模型驱动的数据分析推荐系统

陈治彰(), 封颖超杰, 翁罗轩, 沈健, 陈为()

浙江大学计算机辅助设计与图形系统全国重点实验室，浙江杭州 310058

收稿日期:2024-11-19 接受日期:2025-02-25 出版日期:2025-10-30 发布日期:2025-09-10
通讯作者:陈为(1976-)，男，教授，博士。主要研究方向为可视化、可视分析等。E-mail：chenvis@zju.edu.cn
第一作者:陈治彰(2000-)，男，硕士研究生。主要研究方向为可视分析。E-mail：chenzhiz@zju.edu.cn
基金资助:
国家自然科学基金(62132017);浙江省领雁研发攻关计划(2024C01167);浙江省自然科学基金(LD24F020011)

DRec: large language model-driven data analysis recommendation system

CHEN Zhizhang(), FENG Yingchaojie, WENG Luoxuan, SHEN Jian, CHEN Wei()

State Key Laboratory of CAD&CG, Zhejiang University, Hangzhou Zhejiang 310058, China

Received:2024-11-19 Accepted:2025-02-25 Published:2025-10-30 Online:2025-09-10
First author：CHEN Zhizhang (2000-), master student. His main research interest covers visual analysis. E-mail：chenzhiz@zju.edu.cn
Supported by:
National Natural Science Foundation of China(62132017);“Pioneer” and “Leading Goose” Research and Development Program of Zhejiang(2024C01167);Zhejiang Provincial Natural Science Foundation of China(LD24F020011)

摘要/Abstract

摘要：

自然语言交互系统极大地简化了用户与数据分析的交互流程，允许用户通过自然语言来完成数据分析和图表绘制。随着大型语言模型(LLM)的兴起，近年来LLM驱动的自然语言数据分析系统逐渐成为一种趋势。LLM凭借其出色的逻辑推理和工具调用能力，能够生成更为复杂的逻辑推断和图表。尽管如此，依靠LLM进行的交互式数据分析仍充满挑战。数据分析师在分析过程中必须明确分析方向以推动交互式分析的进行，通常要求其对数据有深入的了解。此外，使用LLM进行数据探索时，分析师因为较少直接操作数据，致使对数据的理解不足，从而影响对分析流程的整体掌控。为了帮助用户明确分析流程、加深对数据的理解，提出一种基于推荐和关联的LLM数据分析系统DRec。该系统通过关联信息帮助用户建立起对数据的认知，并引导数据分析的流程。同时，系统从语义和数据2个维度为用户提供洞察，并据此推荐查询，以协助用户确定数据分析的方向。通过案例研究和用户实验，证明DRec系统能够提高数据分析效率并引导用户获得合理的数据分析结果。

关键词: 大语言模型, 交互式数据分析, 数据探索, 自然语言界面, 自然语言推荐

Abstract:

Natural language interaction systems have greatly simplified the interaction process between users and data analysis, allowing users to complete data analysis and chart generation through natural language. With the rise of large language models (LLMs), LLM-driven natural language data analysis systems have gradually become a trend in recent years. Thanks to their excellent logical reasoning and tool invocation capabilities, LLMs are able to generate more complex logical inferences and charts. However, interactive data analysis based on LLMs poses challenges. Data analysts must clearly define the direction of analysis to drive the interactive process, which often necessitates a deep understanding of the data. Furthermore, when employing LLMs for data exploration, analysts are often less directly involved with the data, which may lead to insufficient understanding of the data and consequently affect the overall control of the analysis process. To assist users in clarifying the analysis process and deepening their understanding of the data, the LLM-based recommendation and association-driven data analysis system DRec was proposed. This system aided users in developing a comprehensive understanding of the data through associative information and guides the data analysis process. At the same time, the system provided insights from both the semantic and data dimensions and offered query recommendations to assist users in determining the analysis direction. Case studies and user experiments demonstrated that the DRec system can enhance data analysis interaction efficiency and guide users toward reasonable data analysis results.

Key words: large language models, interactive data analysis, data exploration, natural language interface, natural language recommendation

中图分类号:

TP391.3
TP18

陈治彰, 封颖超杰, 翁罗轩, 沈健, 陈为. DRec：大语言模型驱动的数据分析推荐系统[J]. 图学学报, 2025, 46(5): 1028-1041.

CHEN Zhizhang, FENG Yingchaojie, WENG Luoxuan, SHEN Jian, CHEN Wei. DRec: large language model-driven data analysis recommendation system[J]. Journal of Graphics, 2025, 46(5): 1028-1041.

图/表 9

图1 系统流程图

Fig. 1 System flowchart

图2 系统视图((a) 数据概览；(b) 交互式数据分析；(c) 数据洞察；(d) 推荐视图)

Fig. 2 System overview ((a) Data overview; (b) Interactive data analysis; (c) Data insights; (d) Recommendation view)

图3 案例研究-明确数据探索方向

Fig. 3 Case study-defining data exploration direction

图4 案例研究-关联数据考察

Fig. 4 Case study-investigating correlated data

图5 案例研究-更全面的数据探索

Fig. 5 Case study-more comprehensive data exploration

表1 与基准系统对比评估

Table 1 Evaluation by comparison with the baseline system

序号	调研问题
Q1	系统能够便捷地进行数据分析
Q2	系统帮助我找到数据探索方向
Q3	系统帮助我回顾历史对话和结果
Q4	系统的分析流程易于理解和学习
Q5	我愿意在未来的数据分析场景中使用该系统

表2 系统辅助功能评估

Table 2 Evaluation of system auxiliary features

序号	调研问题
Q1	系统找到了合理的数据洞察
Q2	系统提供的数据列信息帮助加深对数据的认识
Q3	系统提供的关联洞察帮助更全面的分析数据
Q4	系统的关注度信息促进数据探索
Q5	系统给出了合理的数据分析推荐
Q6	系统帮助我快速适应LLM驱动的数据分析
Q7	系统帮呼我决定对数据的下一步探索
Q8	我对选择了正确的数据探索方向感到自信
Q9	系统易于学习

图6 基准系统对比结果

Fig. 6 Benchmark system comparison results

图7 系统辅助功能评估

Fig. 7 Evaluation of system auxiliary features

参考文献 46

[1]	WONGSUPHASAWAT K, MORITZ D, ANAND A, et al. Voyager: exploratory analysis via faceted browsing of visualization recommendations[J]. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(1): 649-658.
[2]	LUO Y Y, QIN X D, TANG N, et al. DeepEye: towards automatic data visualization[C]// The 34th IEEE International Conference on Data Engineering. New York: IEEE Press, 2018: 101-112.
[3]	PANDEY A, SRINIVASAN A, SETLUR V. MEDLEY: intent-based recommendations to support dashboard composition[J]. IEEE Transactions on Visualization and Computer Graphics, 2023, 29(1): 1135-1145.
[4]	YU Y, SHEN L, LONG F, et al. PyGWalker: on-the-fly assistant for exploratory visual data analysis[C]// 2024 IEEE Visualization and Visual Analytics. New York: IEEE Press, 2024: 6-10.
[5]	SHEN L X, SHEN E Y, LUO Y Y, et al. Towards natural language interfaces for data visualization: a survey[J]. IEEE Transactions on Visualization and Computer Graphics, 2023, 29(6): 3121-3144.
[6]	SETLUR V, BATTERSBY S E, TORY M, et al. Eviza: a natural language interface for visual analysis[C]// The 29th ACM Symposium on User Interface Software and Technology. New York: ACM, 2016: 365-377.
[7]	HOQUE E, SETLUR V, TORY M, et al. Applying pragmatics principles for interaction with visual analytics[J]. IEEE Transactions on Visualization and Computer Graphics, 2018, 24(1): 309-318.
[8]	GAO T, DONTCHEVA M, ADAR E, et al. DataTone: managing ambiguity in natural language interfaces for data visualization[C]// The 28th ACM Symposium on User Interface Software and Technology. New York: ACM, 2015: 489-500.
[9]	NARECHANIA A, SRINIVASAN A, STASKO J. NL4DV: a toolkit for generating analytic specifications for data visualization from natural language queries[J]. IEEE Transactions on Visualization and Computer Graphics, 2021, 27(2): 369-379.
[10]	TORY M, SETLUR V. Do what I mean, not what I say! Design considerations for supporting intent and context in analytical conversation[C]// 2019 IEEE Conference on Visual Analytics Science and Technology. New York: IEEE Press, 2019: 93-103.
[11]	OpenAI, ACHIAM J, ADLER S, et al. GPT-4 technical report[EB/OL]. [2024-09-19]. https://arxiv.org/abs/2303.08774.
[12]	OUYANG L, WU J, JIANG X, et al. Training language models to follow instructions with human feedback[C]// The 36th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2022: 2011.
[13]	TOUVRON H, LAVRIL T, IZACARD G, et al. LLaMA: open and efficient foundation language models[EB/OL]. [2024-09-19]. https://arxiv.org/abs/2302.13971.
[14]	BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 159.
[15]	WEI J, WANG X Z, SCHUURMANS D, et al. Chain-of- thought prompting elicits reasoning in large language models[C]// The 36th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2022: 1800.
[16]	YAO S Y, YU D, ZHAO J, et al. Tree of thoughts: deliberate problem solving with large language models[C]// The 37th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2023: 517.
[17]	XIE T B, ZHOU F, CHENG Z J, et al. OpenAgents: an open platform for language agents in the wild[EB/OL]. [2024- 09-19]. https://arxiv.org/abs/2310.10634.
[18]	CROTHERS E N, JAPKOWICZ N, VIKTOR H L. Machine-generated text: a comprehensive survey of threat models and detection methods[J]. IEEE Access, 2023, 11: 70977-71002.
[19]	WENG L X, LIU S, ZHU H, et al. Towards an understanding and explanation for mixed-initiative artificial scientific text detection[J]. Information Visualization, 2024, 23(3): 272-291.
[20]	WENG L X, WANG X B, LU J Y, et al. InsightLens: discovering and exploring insights from conversational contexts in large-language-model-powered data analysis[EB/OL]. [2024-09-19]. https://arxiv.org/html/2404.01644v1.
[21]	SRINIVASAN A, DONTCHEVA M, ADAR E, et al. Discovering natural language commands in multimodal interfaces[C]// The 24th International Conference on Intelligent User Interfaces. New York: ACM, 2019: 661-672.
[22]	SRINIVASAN A, SETLUR V. Snowy: recommending utterances for conversational visual analysis[C]// The 34th Annual ACM Symposium on User Interface Software and Technology. New York: ACM, 2021: 864-880.
[23]	WANG X B, CHENG F R, WANG Y, et al. Interactive data analysis with next-step natural language query recommendation[EB/OL]. [2024-09-19]. https://arxiv.org/abs/2201.04868.
[24]	SRINIVASAN A, STASKO J T. Natural language interfaces for data analysis with visualization: considering what has and could be asked[C]// The Eurographics/IEEE VGTC Conference on Visualization: Short Papers. Goslar: Eurographics Association, 2017: 55-59.
[25]	YU B W, SILVA C T. FlowSense: a natural language interface for visual data exploration within a dataflow system[J]. IEEE Transactions on Visualization and Computer Graphics, 2020, 26(1): 1-11.
[26]	KUMAR A, AURISANO J, DI EUGENIO B, et al. Intelligent assistant for exploring data visualizations[EB/OL]. [2024-09-19]. https://cdn.aaai.org/ocs/18496/18496-79437-1-PB.pdf.
[27]	FU S W, XIONG K, GE X D, et al. Quda: natural language queries for visual data analytics[EB/OL]. [2024-09-19]. https://arxiv.org/abs/2005.03257.
[28]	LUO Y Y, TANG N, LI G L, et al. Natural language to visualization by neural machine translation[J]. IEEE Transactions on Visualization and Computer Graphics, 2022, 28(1): 217-226.
[29]	LIU C, HAN Y, JIANG R K, et al. ADVISor: automatic visualization answer for natural-language question on tabular data[C]// The 14th IEEE Pacific Visualization Symposium. New York: IEEE Press, 2021: 11-20.
[30]	SAKTHEESWARAN A, SRINIVASAN A, STASKO J T. Touch? Speech? or Touch and Speech? Investigating multimodal interaction for visual network exploration and analysis[J]. IEEE Transactions on Visualization and Computer Graphics, 2020, 26(6): 2168-2179.
[31]	KAVAZ E, PUIG A, RODRÍGUEZ I, et al. Chatbot-Based natural language interfaces for data visualisation: a scoping review[J]. Applied Sciences, 2023, 13(12): 7025.
[32]	FENG Y C J, WANG X B, PAN B, et al. XNLI: explaining and diagnosing NLI-based visual data analysis[J]. IEEE Transactions on Visualization and Computer Graphics, 2024, 30(7): 3813-3827.
[33]	GUO Y, CAO N, QI X Y, et al. Urania: visualizing data analysis pipelines for natural language-based data exploration[EB/OL]. [2024-09-19]. https://arxiv.org/abs/2306.07760.
[34]	KIM H, LE K D, LIM G, et al. DataDive: supporting readers’ contextualization of statistical statements with data exploration[C]// The 29th International Conference on Intelligent User Interfaces. New York: ACM, 2024: 623-639.
[35]	CHU W, PARK S T. Personalized recommendation on dynamic content using predictive bilinear models[C]// The 18th International Conference on World Wide Web. New York: ACM, 2009: 691-700.
[36]	RENDLE S, FREUDENTHALER C, GANTNER Z, et al. BPR: Bayesian personalized ranking from implicit feedback[C]// The 25th Conference on Uncertainty in Artificial Intelligence. Arlington: AUAI Press, 2009: 452-461.
[37]	WANG S J, CAO L B, WANG Y, et al. A survey on session-based recommender systems[J]. ACM Computing Surveys, 2022, 54(7): 154.
[38]	BHATIA S, MAJUMDAR D, MITRA P. Query suggestions in the absence of query logs[C]// The 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2011: 795-804.
[39]	HE Q, JIANG D X, LIAO Z, et al. Web query recommendation via sequential query prediction[C]// 2009 IEEE 25th International Conference on Data Engineering. New York: IEEE Press, 2009: 1443-1454.
[40]	CHEN C, HOFFSWELL J, GUO S N, et al. WhatsNext: guidance-enriched exploratory data analysis with interactive, low-code notebooks[C]// 2023 IEEE Symposium on Visual Languages and Human-Centric Computing. New York: IEEE Press, 2023: 209-214.
[41]	HUA W Y, LI L, XU S Y, et al. Tutorial on large language models for recommendation[C]// The 17th ACM Conference on Recommender Systems. New York: ACM, 2023: 1281-1283.
[42]	HU J, XIA W W, ZHANG X L, et al. Enhancing sequential recommendation via LLM-based semantic embedding learning[C]// The ACM on Web Conference 2024. New York: ACM, 2024: 103-111.
[43]	DING R, HAN S, XU Y, et al. Quick-insights: quick and automatic discovery of insights from multi-dimensional data[C]// 2019 International Conference on Management of Data. New York: ACM, 2019: 317-332.
[44]	ZHOU Z L, WEN X M, WANG Y, et al. Modeling and leveraging analytic focus during exploratory visual analysis[C]// 2021 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2021: 21.
[45]	EPPERSON W, GORANTLA V, MORITZ D, et al. Dead or alive: continuous data profiling for interactive data science[J]. IEEE Transactions on Visualization and Computer Graphics, 2024, 30(1): 197-207.
[46]	NARECHANIA A, COSCIA A, WALL E, et al. Lumos: increasing awareness of analytic behavior during visual data analysis[J]. IEEE Transactions on Visualization and Computer Graphics, 2022, 28(1): 1009-1018.

DRec：大语言模型驱动的数据分析推荐系统

DRec: large language model-driven data analysis recommendation system

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 46

相关文章 5

编辑推荐

Metrics

本文评价

[1]	徐沛, 黄凯奇. 大模型引导的高效强化学习方法[J]. 图学学报, 2024, 45(6): 1165-1177.
[2]	陈晓皎, 束云峰, 汪睿涵, 周佳欢, 陈为. 大语言模型驱动的UI评估系统[J]. 图学学报, 2024, 45(6): 1178-1187.
[3]	于晗, 陈治源, 熊熙瑞, 戴原星, 蔡鸿明. 基于检索增强大语言模型的MBSE智能设计方法[J]. 图学学报, 2024, 45(6): 1188-1199.
[4]	许璟琳, 彭阳, 欧金武, 谈骏杰, 舒江鹏, 余芳强. 融合大模型和数字孪生的公共建筑智慧运维系统[J]. 图学学报, 2024, 45(6): 1200-1206.
[5]	蒋灿, 郑哲, 梁雄, 林佳瑞, 马智亮, 陆新征. 大语言模型驱动的交互式建筑设计新范式——基于Rhino7的概念验证[J]. 图学学报, 2024, 45(3): 594-600.