Next: , Previous: Numerical algorithms, Up: GEOMLIB

5.3 R*-Tree

Source: The R*-Tree: An efficient and Robust Access Method for Points and Rectangles (1990) – Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernard Seeger.

This data structure is a popular method to localize rectangular objects in two (or generally more) dimensional space. In VRR , it is used for effective localization of geometrical objects near to a given mouse-click (see Snap), or to find all objects in a given rectangular area (view drawing, rectangular selection).

R*-Tree is one of the variants to data structure called R-Tree. Each leaf node represents one object within a given bounding box (rectangle overlapping the entire object). Internal nodes contain information about the smallest common bounding boxes of all their children. This arrangement allows us to walk the tree from root to leaves while ignoring large unimportant plane areas (subtrees). R-Tree variants differ by methods of grouping nodes to subtrees.

R*-Tree tree is based on a heuristic optimization and uses combination of several optimization criteria (described in Data insertion) to arrange objects to groups and build a balanced tree over them. The data structure is fully dynamic. Insertions, deletions, updates and queries can be mixed and no periodic global reorganization is required.

5.3.1 Structures

We have used balanced (A,B)-Tree (A=GEOM_RTREE_MIN, B=GEOM_RTREE_MAX) to store the hierarchy. The user should define one instance of geom_rtree for each existing R*-Tree and include one geom_rtree_obj in each inserted object structure. Internal nodes are allocated and freed automatically. Structures were designed to minimize space requirements:

     /* R*-Tree main structure */
     struct geom_rtree {
       struct geom_rtree_node *root;
         /* pointer to root node (NULL if tree is empty) */
       clist lquery;
         /* list of dynamic rectangular queries */
     /* R*-Tree leaf node */
     struct geom_rtree_obj {
       struct geom_rectangle bbox;
         /* rectangle enclosing entire geometrical object */
       struct geom_rtree_node *parent;
         /* pointer to parent tree node */
     /* R*-Tree internal node */
     struct geom_rtree_node {
       struct geom_rectangle bbox;
         /* smallest common bounding box of node children */
       byte count;
         /* number of children */
       byte height;
         /* tree level (0 for internal nodes containing leaves) */
       struct geom_rtree_node *parent;
         /* parent node (NULL in root node) */
       struct geom_rtree_node *child[GEOM_RTREE_MAX + 1];
         /* child-pointers (internal nodes or leaves) */

5.3.2 Data insertion

In standard (A,B)-Tree, after the node is inserted, overfull nodes (i.e. the number of children exceeds B) on the trace from the newly inserted leaf to the root are split. R*-Tree uses following algorithm to find good splits. Along each axis (X and Y), the entries are first sorted by the lower value, then sorted by the upper value of their bounding boxes. For each sort all distributions of entries into two (A,B)-tree nodes are determined.

For each distribution the goodness values are computed:

area[bbox(first group)] + area[bbox(second group)]
margin[bbox(first group)] + margin[bbox(second group)]
area[intersection of bbox(first group) and bbox(second group)]

Depending on these goodness values the final distribution of the entries is determined.

Algorithm Split (S1) Determine the axis, to which the split is performed.
(S1a) For each axis: Sort the entries by the lower then by the upper value of their rectangles and determine all distributions as described above. Compute S, the sum of all margin-values of the different distributions.
(S1b) Choose the axis with the minimum S as split axis.
(S2) Along the chosen split axis, choose the distribution with the minimum overlap-value. Resolve ties by choosing the distribution with minimum area-value.
(S3) Distribute the entries into two groups.

Because R*-Tree is nondeterministic in allocating the entries onto the nodes, it suffers from its old entries. Data rectangles inserted during the early growth of the structure may have introduced directory rectangles, which are not suitable to guarantee a good heuristic. Reorganization during splits is only local optimization.

To achieve better performance, some nodes are deleted and reinserted during the insertion routine. The Whole algorithm is described below. To insert a new entry, the following routine is called with the leaf level as a parameter. All used sub-algorithms are described below.

Algorithm Insert (I1) Invoke ChooseSubtree (CS1), with the level as a parameter, to find an appropriate node N, where to place the new entry E.
(I2) If N has less than GEOM_RTREE_MAX entries, accommodate E in N.
If N has GEOM_RTREE_MAX entries, invoke OverflowTreatment (OT) with the level of N as a parameter (for Reinsertion or split).
(I3) If OverflowTreatment was called and Split was performed, propagate OverflowTreatment upwards if necessary.
If it caused a split of the root, create a new root.
(I4) Adjust all covering rectangles in the insertion path.

Algorithm Choosesubtree (CS1) Set N to be the root.
(CS2) If N is in desired tree level, return N.
If the child-pointers in N point to leaves [determine the minimum overlap cost], choose the entry in N whose rectangle needs the least overlap enlargement to include the new data rectangle. Resolve ties primary by choosing the entry whose rectangle needs least area enlargement, secondary the entry with the rectangle of smallest area.
If the child-pointers in N do not point to leaves [determine the minimum area cost], choose the entry in N whose rectangle needs least area enlargement to include the new data rectangle. Resolve ties by choosing the entry with the rectangle of smallest area.
(CS3) Set N to be the child-node pointer of the chosen entry and repeat from (CS2).

Algorithm Overflow treatment (OT) If the level is not the root level and this is the first call of overflow treatment in the given level during the insertion of one data rectangle, then invoke Reinsert (RI1) else invoke Split (S1).

Algorithm Reinsert (RI1) For all GEOM_RTREE_MAX+1 entries of a node N, compute the distance between the centers of their rectangles and the center of the bounding rectangle of N.
(RI2) Sort the entries in decreasing order of their distances computed in (RI1).
(RI3) Remove the first GEOM_RTREE_REINSERT entries from N and adjust the bounding rectangle of N.
(RI4) In the sort, defined in (RI2), starting with the maximum distance, invoke Insert (I1) to reinsert the entries.

5.3.3 Data deletion

Deletion of a given entry uses the same routines as the previously described insertion. If the removed leaf causes, that the number of parent child-pointers would decrease below GEOM_RTREE_MIN, the parent node is recursively removed and all remaining child-pointers are reinserted to the same tree level.

5.3.4 Data updates

The modification of already inserted items is implemented by simple calls of data insertion and deletion. It would be possible to improve the performance, especially for relatively small changes.

5.3.5 Rectangular queries

R*-Tree is an especially good heuristic to find objects, that fall into a given rectangular (or point) area. The algorithm is very simple. It starts in the root node and recursively descents to lower levels while cutting off whole subtrees with improper bounding boxes.

The following macros in geomlib/rtree.h implement rectangular queries for a given R*-Tree:

     GEOM_RTREE_RECT_INTERSECT_QUERY_BEGIN(unique_prefix, rtree, rect, object) {
       /* executed for each object, whose bounding rectangle
          intersects with query rectangle */
     GEOM_RTREE_RECT_ENCLOSE_QUERY_BEGIN(unique_prefix, rtree, rect, object) {
       /* executed for each object, whose bounding rectangle
          is entirely enclosed to query rectangle */
     GEOM_RTREE_POINT_QUERY_BEGIN(unique_prefix, rtree, point, object) {
       /* executed for each object, whose bounding rectangle
          contains query point */

5.3.6 Dynamic rectangular queries

Dynamic queries can exist for an arbitrary long period of time and informs the user about changes made in a given rectangular area via callback functions.

New query may be created by geom_rtree_query_init(query, rtree, rect, show, hide, update) and destroyed by geom_rtree_query_destroy(query). The following callbacks are supported:

Informs user about every object, that appeared in the query area. This may occur during dynamic query creation, insertion of a new entry to R*-Tree or when previously invisible entry is moved to the watched area.
Informs user about every object, that disappeared from the query area. This may occur when the query is destroyed, an entry is removed from R*-Tree or previously visible entry is moved outside the query area.
This callback is executed only when previously visible object changed its bounding box, but remains in the query area.

5.3.7 Center pass algorithm

The center pass algorithm enables the user to loop items stored to R*-Tree in order of increasing distance from a given center point. If the heuristic builds an effective planar hierarchy, it offers a sub-linear time complexity to find a limited number of nearest entries. This feature is used in GUI to localize geometrical objects near a given mouse-click (see Snap for usage details).

The implementation is located in the VRR Kernel (see Kernel), but the description thematically belongs to this section of the Programmer's Manual.

At first, we briefly describe the algorithm, how to pass objects sorted by the distance of their bounding boxes. Let S be a set of disjoint subtrees with evaluated distance of their root's bounding box to the center point. Initially, the set S contains only one item representing the whole R*-Tree. The algorithm works in a cycle. At each step of the cycle, the first entry is removed from the set S. If it is a leaf node, the incident object is the nearest of all contained in the set S. If the entry is not a leaf node, then we split the entry (subtree) to set of its root's children subtrees and insert them back to the set S. This procedure is repeated until we have passed the required number of nearest objects or the set S is empty.

This algorithm can be easily extended to consider the exact distance of the objects to the center point instead only of bounding boxes. We only need to add a next level to the hierarchy. Every leaf node reached in the main cycle is reinserted back to set S with its exact object distance.

The set S is implemented by the heap data structure. Minimum in the heap can be found and removed in O(log n) time as well as a new item can be inserted.

Supported objects by the center pass code are all Kernel's geometrical objects. See the file kernel/rtree.c for implementation details.