大语言模型驱动的交互式建筑设计新范式——基于Rhino7的概念验证

doi:10.11996/JG.j.2095-302X.2024030594

图学学报 ›› 2024, Vol. 45 ›› Issue (3): 594-600.DOI: 10.11996/JG.j.2095-302X.2024030594

大语言模型驱动的交互式建筑设计新范式——基于Rhino7的概念验证

蒋灿¹^,²(), 郑哲², 梁雄¹, 林佳瑞²^,³(), 马智亮², 陆新征²

1.广联达科技股份有限公司，北京 100193
2.清华大学土木工程系，北京 100084
3.住房城乡建设部数字建造与孪生重点实验室，北京 100084

收稿日期:2023-09-25 接受日期:2023-12-21 出版日期:2024-06-30 发布日期:2024-06-12
通讯作者:林佳瑞(1987-)，男，副研究员，博士。主要研究方向为智能建造、数字孪生和知识图谱等。E-mail：lin611@tsinghua.edu.cn
第一作者:蒋灿(1993-)，男，博士后，博士。主要研究方向为人工智能在智能建造领域的应用。E-mail：jiangc-l@glodon.com
基金资助:
国家自然科学基金项目(52378306);北京市科委-中关村管委会项目(20220468132)

A new interaction paradigm for building design driven by large language model: proof of concept with Rhino7

JIANG Can¹^,²(), ZHENG Zhe², LIANG Xiong¹, LIN Jiarui²^,³(), MA Zhiliang², LU Xinzheng²

1. Glodon Company Limited, Beijing 100193, China
2. Department of Civil Engineering, Tsinghua University, Beijing 100084, China
3. Key Laboratory of Digital Construction and Digital Twin, Ministry of Housing and Urban-Rural Development, Beijing 100084, China

Received:2023-09-25 Accepted:2023-12-21 Published:2024-06-30 Online:2024-06-12
First author：JIANG Can (1993-), postdoctoral, Ph.D. His main research interest covers application of artificial intelligence in intelligent construction. E-mail：jiangc-l@glodon.com
Supported by:
National Natural Science Foundation of China(52378306);Research Project of Beijing Municipal Science & Technology Commission, Administrative Commission of Zhongguancun Science Park(20220468132)

摘要/Abstract

摘要：

随着社会对建筑设计质量要求越来越高，建筑设计软件也变得越来越专业和复杂。现在的设计软件不仅学习成本高，而且交互模式复杂。大语言模型(LLM)的最新突破使计算机清晰地理解人类自然语言指令，并准确生成代码语言具有可行性，有望为人与软件的交互范式提供新思路。因此，本文提出了LLM驱动的交互式建筑设计新范式——将设计师通过多次键鼠操作与设计软件交互转变为LLM根据设计师自然语言指令生成并执行API调用脚本的方式；提出了技术路线并验证了其在建筑设计场景落地的可能性。该技术路线包括：① LLM根据用户指令从API库中搜索与任务相关的API；② LLM基于指令和候选API摘要信息编写程序脚本并运行；③ LLM根据来自软件环境、用户等反馈改进优化所编写的程序脚本。通过Rhino7设计软件、GPT-4和CodeLlaMa完成多个设计任务，测试当前LLM是否具备执行该技术路线各关键环节的能力。测试结果不仅证明了LLM驱动的交互式设计范式在建筑设计场景已初具落地前景，也为技术落地提供经验和建议。该设计范式的落地可以降低软件的使用门槛和学习成本，提高设计师工作效率；有望在未来的建筑设计软件中发挥重要作用。

关键词: 建筑设计软件, 软件交互, 大语言模型, 应用程序接口, GPT-4, Rhino7, Ladybug

Abstract:

As society places higher demands on the quality of building designs, design software has become more professional and complicated. Current design software not only incurs high learning costs but also features complex interaction modes. The recent breakthroughs in large language models (LLM) have enabled computers to clearly comprehend instructions based on human natural language and accurately generate code, which is expected to provide new ideas for the paradigm of human interaction with software. Therefore, this study designed a new paradigm of interactive building design driven by LLM, i.e., shifting from the designers interacting with the design software through multiple keyboard and mouse operations to LLMs writing scripts to invoke APIs according to architects’ instructions. The methodology was proposed and its implementation feasibility in building design was validated. The methodology included: ① LLM retrieved task-related APIs from the API set according to user instructions; ② LLM wrote a program script based on instructions and the abstract of candidate APIs and ran it; ③ LLM revised the script written based on the feedback from the environment, users, etc. To validate the capabilities of current LLMs in executing the key steps of the methodology, multiple design tasks were completed with Rhino7 design software, GPT-4, and CodeLlaMa. The results not only demonstrated that the LLM-driven interactive design paradigm held initial prospects for implementation in building design, but also provided experiences and suggestions for its implementation. The implementation of this design paradigm could reduce the threshold and learning costs, improving the efficiency in many scenarios, and was expected to play a key role in future building design software.

Key words: building design software, interaction with software, large language model, application programming interface, GPT-4, Rhino7, Ladybug

中图分类号:

TP391

蒋灿, 郑哲, 梁雄, 林佳瑞, 马智亮, 陆新征. 大语言模型驱动的交互式建筑设计新范式——基于Rhino7的概念验证[J]. 图学学报, 2024, 45(3): 594-600.

JIANG Can, ZHENG Zhe, LIANG Xiong, LIN Jiarui, MA Zhiliang, LU Xinzheng. A new interaction paradigm for building design driven by large language model: proof of concept with Rhino7[J]. Journal of Graphics, 2024, 45(3): 594-600.

图/表 6

参考文献 20

[1]	SADEGHIPOUR ROUDSARI M, PAK M, VIOLA A. Ladybug: a parametric environmental plugin for grasshopper to help designers create an environmentally-conscious design[EB/OL]. [2023-05-11]. https://xueshu.baidu.com/usercenter/paper/show?paperid=db06c426c33b33371c6e5ad36b02ae91&site=xueshu_se.
[2]	ZHAO W X, ZHOU K, LI J Y, et al. A survey of large language models[EB/OL]. (2023-03-31) [2023-05-24]. http://arxiv.org/abs/2303.18223.pdf.
[3]	ANUMBA C J, ISSA R R A, PAN J Y, et al. Ontology-based information and knowledge management in construction[J]. Construction Innovation, 2008, 8(3): 218-239.
[4]	LIN J R, HU Z Z, ZHANG J P, et al. A natural-language-based approach to intelligent data retrieval and representation for cloud BIM[J]. Computer-Aided Civil and Infrastructure Engineering, 2016, 31(1): 18-33.
[5]	SHIN S, ISSA R R A. BIMASR: framework for voice-based BIM information retrieval[J]. Journal of Construction Engineering and Management, 2021, 147(10): 04021124.
[6]	SOCHER R, BAUER J, MANNING C D, et al. Parsing with compositional vector grammars[J]. ACL 2013 - 51st Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference 2013, 1: 455-465.
[7]	CHEN D Q, MANNING C. A fast and accurate dependency parser using neural networks[C]// The 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2014: 740-750.
[8]	ZHENG Z, LU X Z, CHEN K Y, et al. Pretrained domain-specific language model for natural language processing tasks in the AEC domain[J]. Computers in Industry, 2022, 142: 103733.
[9]	ZHOU Y C, ZHENG Z, LIN J R, et al. Integrating NLP and context-free grammar for complex rule interpretation towards automated compliance checking[J]. Computers in Industry, 2022, 142: 103746.
[10]	ZHENG Z, ZHOU Y C, LU X Z, et al. Knowledge-informed semantic alignment and rule interpretation for automated compliance checking[J]. Automation in Construction, 2022, 142: 104524.
[11]	ZHENG J W, FISCHER M. BIM-GPT: a prompt-based virtual assistant framework for BIM information retrieval[EB/OL]. (2023-04-18) [2023-05-11]. http://arxiv.org/abs/2304.09333.pdf.
[12]	ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 10674-10685.
[13]	OPENAI, ACHIAM J, ADLER S, et al. GPT-4 technical report[EB/OL]. (2023-05-24) [2023-06-05]. http://arxiv.org/abs/2303.08774.pdf.
[14]	TUTORIALSUP. SketchUp + ChatGPT 4 different use cases[EB/OL]. (2023-05-04) [2023-06-11]. https://www.youtube.com/watch?v=IPoFA-XyWrc.
[15]	WHITE J, FU Q C, HAYS S, et al. A prompt pattern catalog to enhance prompt engineering with ChatGPT[EB/OL]. (2023- 02-21) [2023-05-14]. http://arxiv.org/abs/2302.11382.pdf.
[16]	PATIL S G, ZHANG T J, WANG X, et al. Gorilla: large language model connected with massive APIs[EB/OL]. (2023-05-24) [2023-06-05]. http://arxiv.org/abs/2305.15334.pdf.
[17]	LI M H, ZHAO Y X, YU B W, et al. API-bank: a comprehensive benchmark for tool-augmented LLMs[EB/OL]. (2023-04-14) [2023-06-06]. http://arxiv.org/abs/2304.08244.pdf.
[18]	WU Q Y, BANSAL G, ZHANG J Y, et al. AutoGen: enabling next-gen LLM applications via multi-agent conversation[EB/OL]. (2023-08-16) [2023-09-07]. http://arxiv.org/abs/2308.08155.pdf.
[19]	WANG G Z, XIE Y Q, JIANG Y F, et al. Voyager: an open-ended embodied agent with large language models[EB/OL]. (2023-05-25) [2023-07-28]. http://arxiv.org/abs/2305.16291.pdf.
[20]	ROZIÈRE B, GEHRING J, GLOECKLE F, et al. Code llama: open foundation models for code[EB/OL]. (2023-08-24) [2023-09-04]. http://arxiv.org/abs/2308.12950.pdf.

内容	例子
API名称	calc_sunpath(location, hoys, ···)
API功能	Calulate trajectory of sun according to location and time information
API输入	Location information (latitude, longitude, etc.) of a city (location: Object); ···
API输出	A list of solar altitude (altitudes: list); A list of solar azimuth (azimuths: list); ···
API调用案例	altitudes, azimuths, datetimes, vectors = calc_sunpath(location, hoys)

内容	例子
API名称	calc_sunpath(location, hoys, ···)
API功能	Calulate trajectory of sun according to location and time information
API输入	Location information (latitude, longitude, etc.) of a city (location: Object); ···
API输出	A list of solar altitude (altitudes: list); A list of solar azimuth (azimuths: list); ···
API调用案例	altitudes, azimuths, datetimes, vectors = calc_sunpath(location, hoys)

需求	任务	API数量
几何建模	生成矩形截面建筑模型	2
	生成不规则截面建筑模型	5
	生成多连立方体建筑模型	4
	多建筑模型随机排布	4
	多建筑模型按指定位姿排布	4
	考虑间距约束的多建筑模型排布	4
建筑性能分析	建筑日照分析	6
建筑性能分析	建筑辐照度分析	6
可视化渲染	太阳路径计算与可视化	4
	天穹辐射密度计算与可视化	4
	视角与渲染模式转换	2
	模型颜色变化	1

需求	任务	API数量
几何建模	生成矩形截面建筑模型	2
	生成不规则截面建筑模型	5
	生成多连立方体建筑模型	4
	多建筑模型随机排布	4
	多建筑模型按指定位姿排布	4
	考虑间距约束的多建筑模型排布	4
建筑性能分析	建筑日照分析	6
建筑性能分析	建筑辐照度分析	6
可视化渲染	太阳路径计算与可视化	4
	天穹辐射密度计算与可视化	4
	视角与渲染模式转换	2
	模型颜色变化	1

大语言模型驱动的交互式建筑设计新范式——基于Rhino7的概念验证

A new interaction paradigm for building design driven by large language model: proof of concept with Rhino7

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 6

参考文献 20

相关文章 15

编辑推荐

Metrics

本文评价

[1]	黄昱喆, 王旭鹏, 陈文会, 周中泽, 赵嘉鑫, 王芸倩. 面向足底压力优化的全接触矫形鞋垫设计[J]. 图学学报, 2024, 45(4): 868-878.
[2]	王枫红, 陈岱琳, 高紫婷, 文兆铖. HUD道路引导空间位置对新手驾驶人的影响[J]. 图学学报, 2024, 45(4): 856-867.
[3]	邹亚坤, 陈贤川, 谭毅, 林永枫, 张亚飞. 基于BIM和三维激光扫描的桁架几何质量自动化检测研究[J]. 图学学报, 2024, 45(4): 845-855.
[4]	张冀, 崔文帅, 张荣华, 王文彬, 李亚琦. 基于关键视图的文本驱动3D场景编辑方法[J]. 图学学报, 2024, 45(4): 834-844.
[5]	侯文军, 郭雨阳, 李桐. 大型公共场所全息地图信息交互应用研究[J]. 图学学报, 2024, 45(4): 827-833.
[6]	朱宝旭, 刘漫丹, 张雯婷, 谢立志. 高分辨率人脸纹理图全流程生成方法[J]. 图学学报, 2024, 45(4): 814-826.
[7]	龚辰晨, 曹力, 张腾腾, 吴奕泽. 面向建筑彩绘纹样的高质量贴图重构方法[J]. 图学学报, 2024, 45(4): 804-813.
[8]	梁成武, 杨杰, 胡伟, 蒋松琪, 钱其扬, 侯宁. 基于时间动态帧选择与时空图卷积的可解释骨架行为识别[J]. 图学学报, 2024, 45(4): 791-803.
[9]	赵磊, 李栋, 房建东, 曹琪. 面向交通标志的改进YOLO目标检测算法[J]. 图学学报, 2024, 45(4): 779-790.
[10]	武兵, 田莹. 基于注意力机制的多尺度道路损伤检测算法研究[J]. 图学学报, 2024, 45(4): 770-778.
[11]	李松洋, 王雪婷, 陈相龙, 陈恩庆. 基于骨骼点动态时域滤波的人体动作识别[J]. 图学学报, 2024, 45(4): 760-769.
[12]	宫永超, 沈旭昆. 一种用于互惠目标检测与实例分割的深层架构[J]. 图学学报, 2024, 45(4): 745-759.
[13]	曾志超, 徐玥, 王景玉, 叶元龙, 黄志开, 王欢. 基于SOE-YOLO轻量化的水面目标检测算法[J]. 图学学报, 2024, 45(4): 736-744.
[14]	牛为华, 郭迅. 基于改进YOLOv8的船舰遥感图像旋转目标检测算法[J]. 图学学报, 2024, 45(4): 726-735.
[15]	胡欣, 常娅姝, 秦皓, 肖剑, 程鸿亮. 基于改进YOLOv8和GMM图像点集匹配的双目测距方法[J]. 图学学报, 2024, 45(4): 714-725.