rtree的使用

安装https://www.cnblogs.com/mangmangbiluo/p/10954836.html

rtree可以看做是二维或者多维空间中的b树

如果不知道b树的建议先去看看b树

rtree的原理https://blog.csdn.net/xiaofengcanyuexj/article/details/41912169

https://blog.csdn.net/houzuoxin/article/details/16113895

两篇文章相同的

Python rtree官方文档

http://toblerity.org/rtree/index.html#

rtree中的基本单位是一个个小的矩形

在这个库中是一个类: rtree.index.Item

我这里主要写一下Pythonrtree的使用

初始化对象

from rtree import index

idx = index.Index()

插入数据: insert(self, id, coordinates, obj=None), 没有返回值

其中id是一个长整数,coordinates 是形如(xmin, ymin, xmax. ymax)的元组或者列表也可以,都是double类型的浮点数,和python的float相对应(python只有float,相当于c++中的double)

obj是可选的选项,可以是任意的一个Python对象,实际原理就是先将Python对象序列化以后再存储起来

idx.insert(0,(0,0,1,1), obj={1:4})

add方法和insert方法完全一样

In [226]: ctypes.c_double(10.0/3)
Out[226]: c_double(3.3333333333333335)

In [227]: 10.0/3
Out[227]: 3.3333333333333335

大家看,c_double和python的float是相一致的,尽管放心

删除数据: delete(self, id, coordinates); 没有返回值

idx.delete(0,(0,0,1,1))

这个函数既需要id,又需要坐标值

如果没有目标点满足,不会报错,,也不会删除

如果有多个目标点满足的话,仅会删除一个(目测和插入顺序相关,但是请避免这种事情的发生)

查找区域内的元素intersection(self, coordinates, objects=False): 返回是一个生成器

In [243]: list(c.intersection((0,0,2,2), objects=False))
Out[243]: [0, 0, 1]

In [244]: list(c.intersection((0,0,2,2), objects=True))
Out[244]: 
[<rtree.index.Item at 0x7f6ffc60b8e8>,
 <rtree.index.Item at 0x7f6ffc60b3c0>,
 <rtree.index.Item at 0x7f6ffc60b628>]

In [245]: list(c.intersection((0,0,2,2), objects=False))
Out[245]: [0, 0, 1]

仅仅是边界甚至是一个顶点有重叠也算是重叠,比如(0,0,1,1)和(1,1,2,2)算重叠

objects=False返回的是id, "raw"返回的是object,附带的python对象

True返回的是item对象,有['handle', 'owned', 'id', 'object', 'bounds']五个元素,后三种分别是id,object,范围, 前面的不太清楚,owned是一个布尔类型,handle是一个对象

In [249]: a = list(c.intersection((0,0,2,2), objects=True))[0]

In [250]: a.bounds
Out[250]: [0.0, 1.0, 0.0, 1.0]

In [251]: a.id
Out[251]: 0

In [252]: a.object
Out[252]: 'b'

In [253]: a.owned
Out[253]: False

奇怪的是a.bounds是一个(xmin, xmax, ymin, ymax)的形式

查找最近的元素nearest(self, coordinates, num_results=1, objects=False): 返回的是一个python 生成器

寻找最近的点,num_result是返回的点的个数,如果有多个点的距离相等,那么这些点都返回

返回的点按照从短到长来依次排列

据我实验,num_result=0和num_result=1等同,num_result=-1,-2...的时候全部的点都会列出来

据我推测,两个矩形之间的距离是这样计算的:两个矩形区域a,b;分别取一点p,q,距离d=min(pq),

我用Python实现了一个这个方法

def getRangeDistance(range1, range2):
    """返回两个区间的最短距离,有重叠为0"""
    if min(range1) > max(range2):
        result = min(range1) - max(range2)
        return result
    if min(range2) > max(range1):
        result = min(range2) - max(range1)
        return result
    return 0

def getRectangleDistance(coordinates1, coordinates2):
    """获得两个矩形之间距离的最短值"""
    (xmin1, ymin1, xmax1, ymax1) = coordinates1
    (xmin2, ymin2, xmax2, ymax2) = coordinates2
    tmp1 = getRangeDistance((xmin1, xmax1), (xmin2, xmax2))
    tmp2 = getRangeDistance((ymin1, ymax1), (ymin2, ymax2))
    distance =  (tmp1 **2 + tmp2 **2) ** 0.5
    return distance

实验结果如下

In [283]: list(c.nearest((10,10,19,19), num_results=-2))
Out[283]: [1, 0, 0, 3]

In [284]: list(f.nearest((0,0,0.9,0.9), num_results=0))
Out[284]: []

In [285]: list(c.nearest((0,0,0.9,0.9), num_results=1))
Out[285]: [0, 0]

In [286]: list(c.nearest((0,0,0.9,0.9), num_results=2))
Out[286]: [0, 0]

In [287]: list(c.nearest((0,0,0.9,0.9), num_results=3))
Out[287]: [0, 0, 1]

In [288]: list(c.nearest((0,0,0.9,0.9), num_results=4))
Out[288]: [0, 0, 1, 3]

In [289]: list(c.nearest((0,0,0.9,0.9), num_results=5))
Out[289]: [0, 0, 1, 3]

当是空树时候无论如何都会返回一个空的生成器

查找某个范围内的对象个数 count(self, coordinates): 返回是一个长整数

In [291]: c.count((1,1,2,2))
Out[291]: 3L

获得r树的范围get_bounds(self, coordinate_interleaved=None), 获得的是整个r树的范围

:param coordinate_interleaved: If True, the coordinates are turned
in the form [xmin, ymin, ..., kmin, xmax, ymax, ..., kmax],
otherwise they are returned as
[xmin, xmax, ymin, ymax, ..., ..., kmin, kmax]. If not specified,
the :attr:`interleaved` member of the index is used, which
defaults to True.

None的时候是由self.interleaved决定的,这个是r树生成时的参数默认是True

获得节点信息levels(self), 返回是一个列表,元素是(id, child_ids, bounds)

虽然名字叫做叶子节点但是实际上并不能搞得清楚是什么意思

In [295]: c.leaves()
Out[295]: [(0, [0, 0, 1, 3], [-1.0, -1.0, 2.0, 2.0])]

太少了,看不清楚,再多加几个看一看

n [296]: for i in range(4,100):
     ...:     c.insert(i,(i,i,i+0.5,i+0.5))
     ...:     

In [297]: c.leaves()
Out[297]: 
[(0,
  [0,
   0,
   1,
   3,
   4,
   5,
   6,
   7,
   8,
   9,
   10,
   11,
   12,
   13,
   14,
   15,
   16,
   17,
   18,
   19,
   20,
   21,
   22,
   23,
   24,
   25,
   26,
   27,
   28,
   29,
   30,
   31,
   32,
   33,
   34,
   35,
   36,
   37,
   38,
   39,
   40,
   41,
   42,
   43,
   44,
   45,
   46,
   47,
   48,
   49,
   50,
   51,
   52,
   53,
   54,
   55,
   56,
   57,
   58,
   59,
   60,
   61,
   62,
   63,
   64,
   65,
   66,
   67,
   68,
   69,
   70,
   71,
   72,
   73,
   74,
   75,
   76,
   77,
   78,
   79,
   80,
   81,
   82,
   83,
   84,
   85,
   86,
   87,
   88,
   89,
   90,
   91,
   92,
   93,
   94,
   95,
   96,
   97,
   98,
   99],
  [-1.0, -1.0, 99.5, 99.5])]

什么鬼,再加几个

for i in range(4,10000):       
     ...:     c.insert(i,(i,i,i+0.5,i+0.5))

过程略,

最后的结论就是这些id是叶子节点的上一层节点的信息,一个叶子节点至少有50个元素;所以至多应该是100个,最后的四个元素的列表是这个区域的范围

dumps是序列化,但是我们不要用这个对树进行序列化,因为叶子节点会丢失掉,这个只是调用的Python的pickle的dump序列化而已;序列化是序列内存空间的,就好像你序列化一个二叉树的一个节点,但是这个节点不包括他的子子节点内容,所以你在另一个文件中载入这个序列化,就不能知道他的子节点的内容信息,要用rtree自带的存储方法,

http://toblerity.org/rtree/tutorial.html

就到这吧