论文总字数:30434字
摘 要
近几十年来,聚类分析逐渐成为计算机科学领域的一个热点问题。如何从大量数据中挖掘出人们需要的信息也随着“大数据”时代的到来成为了亟待解决的问题。为此,研究者们提出了多种不同的聚类算法。但是这些方法大多基于凸样本空间,不能在非凸样本空间上收敛到全局最优解。
谱聚类因能够在任意形状的样本空间上实现聚类,并收敛于全局最优解被广泛研究,并得到了广泛的关注和应用。基于图论的Normalized Cut(归一化分割)算法是一种经典的谱聚类算法。该算法首先将数据映射成带权无向图,数据视为图中节点,边权重表示数据之间相似度,然后利用最优割集准则分割子图。
由于原问题的求解比较困难,Normalized Cut算法将其转化为拉普拉斯矩阵特征值问题,将离散的聚类问题松弛为连续的特征向量求解问题,然后根据Rayleigh熵的性质得到聚类结果。
本文主要实现了Normalized Cut算法,并在多个国际标准数据集上进行了性能测试与结果分析。实验中以K-means算法作为对比算法,结果表明该算法比K-means算法有更好的聚类效果。
关键词: 谱聚类,归一化分割,图论
THE IMPLEMENT AND DESIGN OF NORMALIZED CUT
Abstract
In recent decades, clustering analysis has become a hot issue in computer science. With the arrival of the era of big data, how to mine useful information from massive data is growing urgency. To solve this problem, researchers have proposed a variety of clustering algorithms. However, these methods are almost based on convex sample space and the global optimal solution can not be converged in the non-convex sample space.
Because of the property that spectral clustering can be implemented in the sample space with any shape and converge to the global optimal solution, it has been widely studied and applied. Normalized Cut based on graph theory is a classical spectral clustering algorithm. In this algorithm, data is mapped into a weighted undirected graph, whose nodes are data points and edge weight indicates the similarity between the points. Then, the graph can be divided into some sub-graphs according to the optimal cut criterion.
Since solving the original problem is too difficult, Normalized Cut algorithm transforms the problem into a Laplacian matrix eigenvalue problem. As a result, the discrete clustering problem is relaxed to a continuous feature vector problem, and then the clustering result can be obtained in the light of Rayleigh entropy.
In this paper, we mainly realize the Normalized Cut algorithm and run the code on several real-world data-sets. The experimental result shows that Normalized Cut algorithm performse better than K-means clustering algorithm.
Key words: Spectral clustering, Normalized Cut, Graph theory
目 录
摘要…………………………………………………………………………………………..........Ⅰ
Abstract…………………………………………………………………………………………....Ⅱ
第一章 绪论…..………………………………………………………………………………...….5
1.1 聚类分析……………………………………………………………………………..…...5
1.2 谱聚类算法研究背景…………………………………...…………………………...…..5
1.3 本章小结……………………………………………………………………………..…..7
第二章 K-means算法………………………………………………………………..………...….8
2.1 K-means算法步骤…………………………………………………………………..…...8
2.2 本章小结…………………………………………………………………………….......9
第三章 Normalized Cut算法………………………………………………………….……...….10
3.1 图论基础…………………………………………………………………………..…....10
3.2 Rayleigh熵…………………………………………………………………………...….12
3.3 Normalized Cut……………………………………………………………………..…...13
3.4 Normalized Cut算法步骤…………………………………………………………..…..18
3.5 本章小结…………………………………………………………………………..…....18
第四章 实验与分析……………..………………………………………………………………..18
4.1 实验设置……………………………………………..…………………………..……...19
4.2 实验结果与分析……………………………………………………………………..….19
4.3 本章小结…………………………………………………………………………..….…19
结论………..…………………………………………………………………………………...….21
致谢…………………………………………………………………………………………..…....22
参考文献………………………………………………………………………………………......23
附录………………………………………………………………………………………………..24
- 绪 论
1.1聚类分析
剩余内容已隐藏,请支付后下载全文,论文总字数:30434字
该课题毕业论文、开题报告、外文翻译、程序设计、图纸设计等资料可联系客服协助查找;