DRec: large language model-driven data analysis recommendation system

doi:10.11996/JG.j.2095-302X.2025051028

Abstract

Abstract:

Natural language interaction systems have greatly simplified the interaction process between users and data analysis, allowing users to complete data analysis and chart generation through natural language. With the rise of large language models (LLMs), LLM-driven natural language data analysis systems have gradually become a trend in recent years. Thanks to their excellent logical reasoning and tool invocation capabilities, LLMs are able to generate more complex logical inferences and charts. However, interactive data analysis based on LLMs poses challenges. Data analysts must clearly define the direction of analysis to drive the interactive process, which often necessitates a deep understanding of the data. Furthermore, when employing LLMs for data exploration, analysts are often less directly involved with the data, which may lead to insufficient understanding of the data and consequently affect the overall control of the analysis process. To assist users in clarifying the analysis process and deepening their understanding of the data, the LLM-based recommendation and association-driven data analysis system DRec was proposed. This system aided users in developing a comprehensive understanding of the data through associative information and guides the data analysis process. At the same time, the system provided insights from both the semantic and data dimensions and offered query recommendations to assist users in determining the analysis direction. Case studies and user experiments demonstrated that the DRec system can enhance data analysis interaction efficiency and guide users toward reasonable data analysis results.

Key words: large language models, interactive data analysis, data exploration, natural language interface, natural language recommendation

CLC Number:

TP391.3
TP18

CHEN Zhizhang, FENG Yingchaojie, WENG Luoxuan, SHEN Jian, CHEN Wei. DRec: large language model-driven data analysis recommendation system[J]. Journal of Graphics, 2025, 46(5): 1028-1041.

Figures/Tables 9

References 46

[1]	WONGSUPHASAWAT K, MORITZ D, ANAND A, et al. Voyager: exploratory analysis via faceted browsing of visualization recommendations[J]. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(1): 649-658.
[2]	LUO Y Y, QIN X D, TANG N, et al. DeepEye: towards automatic data visualization[C]// The 34th IEEE International Conference on Data Engineering. New York: IEEE Press, 2018: 101-112.
[3]	PANDEY A, SRINIVASAN A, SETLUR V. MEDLEY: intent-based recommendations to support dashboard composition[J]. IEEE Transactions on Visualization and Computer Graphics, 2023, 29(1): 1135-1145.
[4]	YU Y, SHEN L, LONG F, et al. PyGWalker: on-the-fly assistant for exploratory visual data analysis[C]// 2024 IEEE Visualization and Visual Analytics. New York: IEEE Press, 2024: 6-10.
[5]	SHEN L X, SHEN E Y, LUO Y Y, et al. Towards natural language interfaces for data visualization: a survey[J]. IEEE Transactions on Visualization and Computer Graphics, 2023, 29(6): 3121-3144.
[6]	SETLUR V, BATTERSBY S E, TORY M, et al. Eviza: a natural language interface for visual analysis[C]// The 29th ACM Symposium on User Interface Software and Technology. New York: ACM, 2016: 365-377.
[7]	HOQUE E, SETLUR V, TORY M, et al. Applying pragmatics principles for interaction with visual analytics[J]. IEEE Transactions on Visualization and Computer Graphics, 2018, 24(1): 309-318.
[8]	GAO T, DONTCHEVA M, ADAR E, et al. DataTone: managing ambiguity in natural language interfaces for data visualization[C]// The 28th ACM Symposium on User Interface Software and Technology. New York: ACM, 2015: 489-500.
[9]	NARECHANIA A, SRINIVASAN A, STASKO J. NL4DV: a toolkit for generating analytic specifications for data visualization from natural language queries[J]. IEEE Transactions on Visualization and Computer Graphics, 2021, 27(2): 369-379.
[10]	TORY M, SETLUR V. Do what I mean, not what I say! Design considerations for supporting intent and context in analytical conversation[C]// 2019 IEEE Conference on Visual Analytics Science and Technology. New York: IEEE Press, 2019: 93-103.
[11]	OpenAI, ACHIAM J, ADLER S, et al. GPT-4 technical report[EB/OL]. [2024-09-19]. https://arxiv.org/abs/2303.08774.
[12]	OUYANG L, WU J, JIANG X, et al. Training language models to follow instructions with human feedback[C]// The 36th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2022: 2011.
[13]	TOUVRON H, LAVRIL T, IZACARD G, et al. LLaMA: open and efficient foundation language models[EB/OL]. [2024-09-19]. https://arxiv.org/abs/2302.13971.
[14]	BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 159.
[15]	WEI J, WANG X Z, SCHUURMANS D, et al. Chain-of- thought prompting elicits reasoning in large language models[C]// The 36th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2022: 1800.
[16]	YAO S Y, YU D, ZHAO J, et al. Tree of thoughts: deliberate problem solving with large language models[C]// The 37th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2023: 517.
[17]	XIE T B, ZHOU F, CHENG Z J, et al. OpenAgents: an open platform for language agents in the wild[EB/OL]. [2024- 09-19]. https://arxiv.org/abs/2310.10634.
[18]	CROTHERS E N, JAPKOWICZ N, VIKTOR H L. Machine-generated text: a comprehensive survey of threat models and detection methods[J]. IEEE Access, 2023, 11: 70977-71002.
[19]	WENG L X, LIU S, ZHU H, et al. Towards an understanding and explanation for mixed-initiative artificial scientific text detection[J]. Information Visualization, 2024, 23(3): 272-291.
[20]	WENG L X, WANG X B, LU J Y, et al. InsightLens: discovering and exploring insights from conversational contexts in large-language-model-powered data analysis[EB/OL]. [2024-09-19]. https://arxiv.org/html/2404.01644v1.
[21]	SRINIVASAN A, DONTCHEVA M, ADAR E, et al. Discovering natural language commands in multimodal interfaces[C]// The 24th International Conference on Intelligent User Interfaces. New York: ACM, 2019: 661-672.
[22]	SRINIVASAN A, SETLUR V. Snowy: recommending utterances for conversational visual analysis[C]// The 34th Annual ACM Symposium on User Interface Software and Technology. New York: ACM, 2021: 864-880.
[23]	WANG X B, CHENG F R, WANG Y, et al. Interactive data analysis with next-step natural language query recommendation[EB/OL]. [2024-09-19]. https://arxiv.org/abs/2201.04868.
[24]	SRINIVASAN A, STASKO J T. Natural language interfaces for data analysis with visualization: considering what has and could be asked[C]// The Eurographics/IEEE VGTC Conference on Visualization: Short Papers. Goslar: Eurographics Association, 2017: 55-59.
[25]	YU B W, SILVA C T. FlowSense: a natural language interface for visual data exploration within a dataflow system[J]. IEEE Transactions on Visualization and Computer Graphics, 2020, 26(1): 1-11.
[26]	KUMAR A, AURISANO J, DI EUGENIO B, et al. Intelligent assistant for exploring data visualizations[EB/OL]. [2024-09-19]. https://cdn.aaai.org/ocs/18496/18496-79437-1-PB.pdf.
[27]	FU S W, XIONG K, GE X D, et al. Quda: natural language queries for visual data analytics[EB/OL]. [2024-09-19]. https://arxiv.org/abs/2005.03257.
[28]	LUO Y Y, TANG N, LI G L, et al. Natural language to visualization by neural machine translation[J]. IEEE Transactions on Visualization and Computer Graphics, 2022, 28(1): 217-226.
[29]	LIU C, HAN Y, JIANG R K, et al. ADVISor: automatic visualization answer for natural-language question on tabular data[C]// The 14th IEEE Pacific Visualization Symposium. New York: IEEE Press, 2021: 11-20.
[30]	SAKTHEESWARAN A, SRINIVASAN A, STASKO J T. Touch? Speech? or Touch and Speech? Investigating multimodal interaction for visual network exploration and analysis[J]. IEEE Transactions on Visualization and Computer Graphics, 2020, 26(6): 2168-2179.
[31]	KAVAZ E, PUIG A, RODRÍGUEZ I, et al. Chatbot-Based natural language interfaces for data visualisation: a scoping review[J]. Applied Sciences, 2023, 13(12): 7025.
[32]	FENG Y C J, WANG X B, PAN B, et al. XNLI: explaining and diagnosing NLI-based visual data analysis[J]. IEEE Transactions on Visualization and Computer Graphics, 2024, 30(7): 3813-3827.
[33]	GUO Y, CAO N, QI X Y, et al. Urania: visualizing data analysis pipelines for natural language-based data exploration[EB/OL]. [2024-09-19]. https://arxiv.org/abs/2306.07760.
[34]	KIM H, LE K D, LIM G, et al. DataDive: supporting readers’ contextualization of statistical statements with data exploration[C]// The 29th International Conference on Intelligent User Interfaces. New York: ACM, 2024: 623-639.
[35]	CHU W, PARK S T. Personalized recommendation on dynamic content using predictive bilinear models[C]// The 18th International Conference on World Wide Web. New York: ACM, 2009: 691-700.
[36]	RENDLE S, FREUDENTHALER C, GANTNER Z, et al. BPR: Bayesian personalized ranking from implicit feedback[C]// The 25th Conference on Uncertainty in Artificial Intelligence. Arlington: AUAI Press, 2009: 452-461.
[37]	WANG S J, CAO L B, WANG Y, et al. A survey on session-based recommender systems[J]. ACM Computing Surveys, 2022, 54(7): 154.
[38]	BHATIA S, MAJUMDAR D, MITRA P. Query suggestions in the absence of query logs[C]// The 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2011: 795-804.
[39]	HE Q, JIANG D X, LIAO Z, et al. Web query recommendation via sequential query prediction[C]// 2009 IEEE 25th International Conference on Data Engineering. New York: IEEE Press, 2009: 1443-1454.
[40]	CHEN C, HOFFSWELL J, GUO S N, et al. WhatsNext: guidance-enriched exploratory data analysis with interactive, low-code notebooks[C]// 2023 IEEE Symposium on Visual Languages and Human-Centric Computing. New York: IEEE Press, 2023: 209-214.
[41]	HUA W Y, LI L, XU S Y, et al. Tutorial on large language models for recommendation[C]// The 17th ACM Conference on Recommender Systems. New York: ACM, 2023: 1281-1283.
[42]	HU J, XIA W W, ZHANG X L, et al. Enhancing sequential recommendation via LLM-based semantic embedding learning[C]// The ACM on Web Conference 2024. New York: ACM, 2024: 103-111.
[43]	DING R, HAN S, XU Y, et al. Quick-insights: quick and automatic discovery of insights from multi-dimensional data[C]// 2019 International Conference on Management of Data. New York: ACM, 2019: 317-332.
[44]	ZHOU Z L, WEN X M, WANG Y, et al. Modeling and leveraging analytic focus during exploratory visual analysis[C]// 2021 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2021: 21.
[45]	EPPERSON W, GORANTLA V, MORITZ D, et al. Dead or alive: continuous data profiling for interactive data science[J]. IEEE Transactions on Visualization and Computer Graphics, 2024, 30(1): 197-207.
[46]	NARECHANIA A, COSCIA A, WALL E, et al. Lumos: increasing awareness of analytic behavior during visual data analysis[J]. IEEE Transactions on Visualization and Computer Graphics, 2022, 28(1): 1009-1018.

序号	调研问题
Q1	系统能够便捷地进行数据分析
Q2	系统帮助我找到数据探索方向
Q3	系统帮助我回顾历史对话和结果
Q4	系统的分析流程易于理解和学习
Q5	我愿意在未来的数据分析场景中使用该系统

序号	调研问题
Q1	系统能够便捷地进行数据分析
Q2	系统帮助我找到数据探索方向
Q3	系统帮助我回顾历史对话和结果
Q4	系统的分析流程易于理解和学习
Q5	我愿意在未来的数据分析场景中使用该系统

序号	调研问题
Q1	系统找到了合理的数据洞察
Q2	系统提供的数据列信息帮助加深对数据的认识
Q3	系统提供的关联洞察帮助更全面的分析数据
Q4	系统的关注度信息促进数据探索
Q5	系统给出了合理的数据分析推荐
Q6	系统帮助我快速适应LLM驱动的数据分析
Q7	系统帮呼我决定对数据的下一步探索
Q8	我对选择了正确的数据探索方向感到自信
Q9	系统易于学习

序号	调研问题
Q1	系统找到了合理的数据洞察
Q2	系统提供的数据列信息帮助加深对数据的认识
Q3	系统提供的关联洞察帮助更全面的分析数据
Q4	系统的关注度信息促进数据探索
Q5	系统给出了合理的数据分析推荐
Q6	系统帮助我快速适应LLM驱动的数据分析
Q7	系统帮呼我决定对数据的下一步探索
Q8	我对选择了正确的数据探索方向感到自信
Q9	系统易于学习