DBScan算法

DBSCAN(D, eps, MinPts)
   C = 0
   for each unvisited point P in dataset D //每个没有访问的节点
      mark P as visited
      NeighborPts = regionQuery(P, eps)   //查找该区域内的所有邻居节点
      if sizeof(NeighborPts) < MinPts
         mark P as NOISE
      else
         C = next cluster                     //新建一个cluster
         expandCluster(P, NeighborPts, C, eps, MinPts)  //扩展这个新的cluster
          
expandCluster(P, NeighborPts, C, eps, MinPts)
   add P to cluster C
   for each point P' in NeighborPts 
      if P' is not visited
         mark P' as visited
         NeighborPts' = regionQuery(P', eps)   //把p的邻居都拿进来
         if sizeof(NeighborPts') >= MinPts
            NeighborPts = NeighborPts joined with NeighborPts' //更新迭代过程,不停的增加新的neighbor进来
      if P' is not yet member of any cluster
         add P' to cluster C
          
regionQuery(P, eps)
   return all points within P's eps-neighborhood (including P)

今天看这段DBSCan 代码,惊叹和我当年写的基于标签网络的话题挖掘的思路是如此的一致,其中又有些略微的不同,其实没有什么难的,非常简单。
回去 把这篇文章的代码好好消化一下,自己动手写一下,C++的东西学习一下。http://www.cnblogs.com/weixliu/archive/2012/12/08/2808815.html
原文地址:https://www.cnblogs.com/harveyaot/p/3333150.html