欢迎访问《图学学报》 分享到:

图学学报 ›› 2022, Vol. 43 ›› Issue (4): 685-694.DOI: 10.11996/JG.j.2095-302X.2022040685

• 计算机图形学与虚拟现实 • 上一篇    下一篇

基于词表示模型的领域文献数据可视分析方法

  

  1. 北京工商大学计算机学院食品安全大数据技术北京市重点实验室,北京 100048
  • 出版日期:2022-08-31 发布日期:2022-08-15
  • 通讯作者: 陈谊(1963),女,教授,博士。主要研究方向为可视化、可视分析和机器学习等
  • 基金资助:
    国家自然科学基金项目(61972010);国家重点研发计划项目课题(2018YFC1603602)

A visual analysis approach for domain literature data based on word representation model

  1. Beijing Key Laboratory of Big Data Technology for Food Safety, School of Computer Science and Engineering, Beijing Technology and Business University, Beijing 100048, China
  • Online:2022-08-31 Published:2022-08-15
  • Contact: CEHN Yi (1963), professor, Ph.D. Her main research interests cover visualization, visual analysis, machine learning etc
  • Supported by:
    National Natural Science Foundation of China (61972010); National Key R&D Program of China (2018YFC1603602)

摘要:

随着科学技术的发展,科研文献数量越来越大,如何从海量文献信息中找出特定领域的研究主题、有影响力的学者和高水平论文是一个巨大的挑战。为此提出一种基于词表示模型的领域文献数据可视分析方法,首先利用词嵌入模型 word2vec 向量化推荐领域相关的关键词,根据这些词向量之间的近似度筛选出领域相关的论文;然后应用 BERTopic 模型从领域论文摘要中提取主题;基于 PageRank 算法计算论文影响力,应用综合考虑作者署名顺序、发表论文数量和论文影响力的作者影响力评价方法 Author-Rank 计算作者的影响力;最后使用多视图协同和交互的可视化方法帮助研究人员从领域的主题词频、主题演变、文献影响力和引用关系、作者影响力等多个角度对特定领域进行快速理解和分析。将该方法应用于食品安全领域的文献数据分析,应用结果和用户测试说明了其有效性。

关键词: 可视化;文献分析;word2vec, BERTopic, Author-Rank;食品安全

Abstract:

With the development of science and technology, scientific literature is mounting to an increasingly large scale. How to quickly and accurately seek the research topics, influential scholars, and high-level papers in a specific domain from the vast amount of publications remains an enormous challenge. The visual analysis method for domain literature data based on word representation model employed word2vec to recommend domain-related keywords by the similarity between word vectors, and filters the domain-related papers according to these keywords. Then it utilized the BERTopic model to extract topics from the abstracts of domain papers. Next, the values for paper impact were calculated using PageRank, and the values for author influence were calculated using Author-Rank, the author impact evaluation method, taking into account the order of authorship, the number of publications, and the impact of papers. Finally, the multi-view collaborative and interactive visualization approach could help researchers gain a quick understanding and analysis of specific areas from multiple perspectives, such as topics word frequency, topics evolution, literature impact, citation relationships, and author impact. The method can be applied to literature data analysis in the field of “food safety”, and the results and user tests can validate this method.

Key words: visualization, bibliometric analysis, word2vec, BERTopic, Author-Rank, food safety

中图分类号: