欢迎访问《图学学报》 分享到:

图学学报 ›› 2021, Vol. 42 ›› Issue (2): 316-324.DOI: 10.11996/JG.j.2095-302X.2021020316

• 建筑与城市信息模型 • 上一篇    下一篇

基于 IFC 标准的 BIM 自适应分词方法 

  

  1. 1. 北京建筑大学电气与信息工程学院,北京 100044;  2. 建筑大数据智能处理方法研究北京市重点实验室,北京 102616
  • 出版日期:2021-04-30 发布日期:2021-04-30
  • 基金资助:
    国家自然科学基金项目(71601013);北京市自然科学基金项目(4202017);北京市青年拔尖人才培育项目(CIT&TCD201904050);北 京建筑大学青年英才项目;北京建筑大学市属高校基本科研业务费专项资金(X20039) 

A model adaptive method for Chinese word segmentation using IFC-based building information model 

  1. 1. School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China; 2. Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing 102616, China
  • Online:2021-04-30 Published:2021-04-30
  • Supported by:
    National Natural Science Foundation of China (71601013); Beijing Municipal Natural Science Foundation (4202017); Beijing Youth Talent Training Project (CIT&TCD201904050); Young Elite of Beijing University of Civil Engineering and Architecture; The Fundamental Research Funds for Beijing University of Civil Engineering and Architecture (X20039) 

摘要: 建筑信息模型(BIM)已经成为建筑行业信息技术应用的有效方案。随着 BIM 数据不断增长,为 了高效使用 BIM 数据,很多研究将自然语言处理(NLP)引入 BIM 应用中。在中文环境中,由于缺乏建筑行业 的术语特征,导致基础环节的中文分词在建筑领域 BIM 应用中的适应性较差。通过分析当前流行的 BIM 数据 格式工业基础类(industry foundation class, IFC)文件,从中提取 BIM 模型特征,配合建筑领域术语特征加入分词 模型中,以提高中文分词在建筑领域的性能。实验结果表明,与原始条件随机场(CRF)分词模型相比,在建筑 领域测试集上,分词模型的 F-measure 提高了 1.26%,其中,在仅加入 BIM 模型特征时,F-measure 提升了 0.10%, 说明在分词模型中加入 BIM 模型特征对于提高中文分词在建筑领域的性能是有效的。同时,在 BIM 模型测试 集上,相较于仅加入建筑领域术语特征,在加入 BIM 模型特征后,准确率从 46.97%提升至 87.74%,召回率从 67.60%提升至 94.77%,F-measure 从 55.43%提升至 91.12%,提升了 35.69%,有效提高了中文分词在建筑领域 的 BIM 模型自适应性。

关键词: 建筑信息模型, 工业基础类, 中文分词, 模型自适应, 建筑信息提取 

Abstract: The building information model (BIM) has become an effective solution to information technology applications in the construction industry. With the continuous increase of BIM data, natural language processing (NLP) has been introduced into BIM applications in many studies to effectively utilize BIM data. In the Chinese language environment, due to the absence of terminology features in the building field, Chinese word segmentation cannot be efficiently adapted in BIM application. By analyzing the currently popular industry foundation class (IFC) files in BIM data format, this study extracted BIM model features from IFC files and added them together with architectural terminology characteristics into the statistical word segmentation model, thus improving the adaptability of Chinese word segmentation in the building field. The experimental results show that compared with the original conditional random fields (CRF)based word segmentation model, on the domain test set, the F-measure increased by 1.26%, and F-measure still increased by 0.10% with BIM model features added alone, indicating that appending BIM model features to the segmentation model can effectively improve the performance of Chinese word segmentation in the building field. Meanwhile, on the model test set, compared with the case of architectural terminology characteristics being appended alone, after BIM model features were appended, the precision rate increased from 46.97% to 87.74%, the recall rate from 67.60% to 94.77%, and the F-measure from 55.43% to 91.12% (by 35.69%), thereby effectively boosting the BIM model adaptability of Chinese word segmentation in the building field. 

Key words: building information model, industry foundation classes, Chinese word segmentation, model adaptation, building information extraction 

中图分类号: