欢迎访问《图学学报》 分享到:

图学学报 ›› 2021, Vol. 42 ›› Issue (2): 307-315.DOI: 10.11996/JG.j.2095-302X.2021020307

• 建筑与城市信息模型 • 上一篇    下一篇

基于预训练语言模型的建筑施工安全事故文本的命名实体识别研究

  

  1. 1. 华南理工大学土木与交通学院,广东 广州 510640;  2. 亚热带建筑科学国家重点实验室,广东 广州 510640;  3. 中新国际联合研究院,广东 广州 510555
  • 出版日期:2021-04-30 发布日期:2021-04-30
  • 基金资助:
    广东省自然科学基金项目(2018A030310363,2017A030313393);广州市科技计划重点项目(20181003SF0059) 

Research on named entity recognition of construction safety accident text based on pre-trained language model 

  1. 1. School of Civil Engineering and Transportation, South China University of Technology, Guangzhou Guangdong 510640, China;  2. State Key Laboratory of Subtropical Building Science, Guangzhou Guangdong 510640, China;  3. Sino-Singapore International Joint Research Institute, Guangzhou Guangdong 510555, China
  • Online:2021-04-30 Published:2021-04-30
  • Supported by:
    Natural Science Foundation of Guangdong Province (2018A030310363, 2017A030313393); Key Project of Guangzhou Science and Technology Plan (20181003SF0059)

摘要: 建筑施工安全事故分析是施工安全管理的重要环节,但分散在事故报告中的施工安全知识不 能得到良好的复用,无法为施工安全管理提供充分的借鉴作用。知识图谱是结构化存储和复用知识的工具, 可以用于事故案例快速检索、事故关联路径分析及统计分析等,从而更好地提高施工安全管理水平。命名实 体识别(NER)是自动构建知识图谱的关键工作,目前主要研究集中于医疗、金融、军事等领域,而在建筑施 工安全领域,尚未见到 NER 的相关研究。根据建筑施工安全领域知识图谱的应用需求,定义了该领域 5 类 概念,并明确了实体标注规范。采用改进的基于 Transformer 的双向编码表征器(BERT)预训练语言模型获取 动态字向量,并采用双向长短期记忆-条件随机场(BiLSTM-CRF)模型获取实体最优标签序列,提出了适用于 建筑施工安全领域的 NER 模型。为了训练该模型并验证其实体识别效果,收集、整理和标注了 1 000 篇施 工安全事故报告作为实验语料。实验表明,相比于传统模型,该模型在建筑施工安全事故文本中具有更优的 识别效果。

关键词: 知识图谱, 命名实体识别, 施工安全, 预训练语言模型, 事故报告

Abstract: The construction safety accident analysis plays an important role in construction safety management, but the construction safety knowledge scattered in accident reports cannot be reused, nor can it shed sufficient light on construction safety management. Knowledge graph serves as a tool for structured storage and knowledge reuse, such as retrieval of accident cases, analysis of accident-related paths, and statistical analysis. Named Entity Recognition (NER) is the key task of automatic knowledge graph construction, and currently mainly concentrates on medical, financial, and military fields. In the realm of construction safety, there has been an absence of relevant research on NER. In this paper, five concepts in this field were defined, and the entity labeling specifications were clarified. The improved Bidirectional Encoder Representations from Transformers (BERT) pre-trained language model was employed to obtain dynamic word vectors, and the Bidirectional Long Short-Term Memory-Conditional Random Field (BiLSTM-CRF) model was utilized to gain the optimal entity tag sequence, thus proposing the NER model for the field of construction safety. In order to train and verify the proposed model, 1,000 accident reports on construction safety were collected, sorted, and annotated as an experimental corpus. Experiments show that compared with traditional models, the proposed model can yield a better recognition effect in texts on construction safety accident. 

Key words:  , knowledge graph, named entity recognition, construction safety, pre-trained language model, accident report 

中图分类号: