欢迎访问《图学学报》 分享到:

图学学报

• 计算机图形学与虚拟现实 • 上一篇    下一篇

高维数据聚类可视分析方法综述

  

  1. 1. 北京工商大学计算机与信息工程学院食品安全大数据技术北京市重点实验室,北京 100048;
    2. 武汉理工大学信息工程学院,湖北武汉 430070
  • 出版日期:2020-02-29 发布日期:2020-03-11
  • 基金资助:
    国家重点研发计划资助项目(2018YFC1603602);国家自然科学基金项目(61972010);国家科技基础性工作专项(2015FY111200)

Overviewing of visual analysis approaches for clustering high-dimensional data

  1. 1. Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and Business University, Beijing 100048, China;
    2. School of Information Engineering, Wuhan University of Technology, Wuhan Hubei 430070, China
  • Online:2020-02-29 Published:2020-03-11

摘要: 数据聚类的可视分析方法利用可视化与交互技术帮助用户对聚类过程与结果进行
多角度分析,从而发现数据内部隐藏的结构和关系。但由于高维数据自身的“维度诅咒”问题
使得聚类分析面临着许多挑战,例如模型参数设定、数据特征捕捉、结果解释以及可视化展现
等。本文从高维数据聚类过程中遇到的问题出发,首先总结了高维数据聚类过程中常用的数据
处理方法并对其性能进行了比较,这些方法能够较好地解决“维度诅咒”问题,帮助用户挖掘
数据中存在的聚类模式。在分析和理解不同聚类结果中包含的数据内部结构和规律时,由于前
期采取的数据处理方法不同,因此需要采取不同的探索分析策略,所以本文将近10 年来高维数
据聚类的可视分析方法分为2 大类进行总结,即基于降维的聚类可视分析方法和基于子空间聚
类的可视分析方法。最后对该领域目前存在的机遇与挑战进行了讨论。

关键词: 可视分析, 聚类, 高维数据, 综述

Abstract: Visual clustering analysis makes use of visualization and interaction technologies to help
users analyze the clustering process and results from multiple perspectives to find hidden structures
and relationships within the original data. However, because of the “curse of dimension” of
high-dimensional data, there are many challenges posed for cluster analysis, such as parameter setting
of clustering model, data feature capture, result interpretation and visualization. Starting with the
problems encountered in the process of high-dimensional data clustering, this paper firstly
summarizes the data processing methods commonly used in the process of clustering and compares
their performance. These methods can greatly solve the “curse of dimension” problem to help users
explore the clustering patterns existing in the data. Then, due to the different needs of the clustering
results obtained by different data processing methods in analyzing and understanding the internal
structure and rules hidden in clusters, this paper makes a summary and divides the currently available
visual analysis approaches of clustering high-dimensional data into two categories, namely, visual
analysis approaches based on dimensionality reduction and subspace clustering. Finally, the current opportunities and challenges existing in this field are discussed.

Key words: visual analysis, clustering, high-dimensional data, overviewing