[postgis-users] Improve 3D BB query on large set of lines

Tom Kazimiers Tue, 17 Jan 2017 14:02:48 -0800

Hi all,

I am using Postgres 9.6 with PostGIS 2.3 and want to improve theperformance of a bounding box intersection query on a large set oflines. I asked the same questions two month ago ongis.stackexchange.com, but didn't get any reply. Therefore, pleaseexcuse me for cross-posting, but I figured this list would be a goodplace to ask. This is the original post:


 https://gis.stackexchange.com/questions/215762

Problem: We store about 6 million lines/edges in a 3D space and use an axis
aligned bounding box to find all intersecting (and included) lines. This works
well, but we want to improve the query time for larger point sets.

Context: Tree-like 3D structures represent neurons, each node has a parent node
or it is the root. At the moment we deal with about 15 million nodes, grouped
into 150000 trees (many > 10000 nodes). I want to improve existing performance
bottlenecks with bigger result sets plus I plan to scale this setup to 10-100x

the nodes. Typical query bounding boxes are rather slices than boxes,i.e. they expand any range in XY, but only a short distance in Z, but itcould in principal any dimension that is "flat".


Setup: The table storing edges looks like this:

 =>\d+ treenode_edge
                   Tabelle »public.treenode_edge«

Spalte | Typ | Attribute------------+-----------------------+-----------

  id         | bigint                | not null
  project_id | integer               | not null

edge | geometry(LineStringZ) |Indexe:

          "treenode_edge_pkey" PRIMARY KEY, btree (id)
          "treenode_edge_gix" gist (edge gist_geometry_ops_nd)
          "treenode_edge_project_id_index" btree (project_id)

Note that there is a 3D index for the edge column in place (treenode_edge_gix).

Current timing: To request 2000 nodes with an typical bounding box size I use
the following query:

 SELECT te.id
 FROM treenode_edge te

-- left bottom z2, right top z1WHERE te.edge &&& 'LINESTRINGZ( -537284.0 699984.0 84770.0,

                                                                  1456444.0 
-128432.0 84735.0)'
                                   -- left top halfz, right top halfz,
                                   -- right bottom halfz, left bottom halfz,
                                   -- left top halfz; halfz (distance)
 AND ST_3DDWithin(te.edge, ST_MakePolygon(ST_GeomFromText(
          'LINESTRING( -537284.0 -128432.0 84752.5, 1456444.0 -128432.0 84752.5,
                                   1456444.0  699984.0 84752.5, -537284.0  
699984.0 84752.5,
                                   -537284.0 -128432.0 84752.5)')), 17.5)
 AND te.project_id = 1
 LIMIT 2000;

This takes about 900ms, statistics are up-to-date. This is not so bad,but I look for strategies to make this much faster still. The &&&operator filters already most of the existing edges by bounding box test(using index) so that ST_3DWithin only needs to check the edges that aremost likely part of the result.


The query plan looks like this:

 Limit  (cost=48.26..4311.24 rows=70 width=8) (actual time=856.261..864.208 
rows=2000 loops=1)
         Buffers: shared hit=20470
         ->  Bitmap Heap Scan on treenode_edge te  (cost=48.26..4311.24 rows=70 
width=8) (actual time=856.257..863.974 rows=2000 loops=1)
                   Recheck Cond: (edge &&& 
'01020000800200000000000000886520C100000000A05C25410000000020B2F440000000003C39364100000000005BFFC000000000F0AFF440'::geometry)
                   Filter: ((edge && 
'0103000080010000000500000000000000AB6520C100000000185CFFC000000000F0AFF44000000000AB6520C100000000C35C254100000000F0AFF440000000804D39364100000000C35C25410000000020B2F440000000804D39364100000000185CFFC00000000020B2F44000000000AB6520C100000000185CFFC000000000F0AFF440'::geometry)
 AND (project_id = 1) AND 
('0103000080010000000500000000000000886520C100000000005BFFC00000000008B1F440000000003C39364100000000005BFFC00000000008B1F440000000003C39364100000000A05C25410000000008B1F44000000000886520C100000000A05C25410000000008B1F44000000000886520C100000000005BFFC00000000008B1F440'::geometry
 && st_expand(edge, '17.5'::double precision)) AND _st_3ddwithin(edge, 
'0103000080010000000500000000000000886520C100000000005BFFC00000000008B1F440000000003C39364100000000005BFFC00000000008B1F440000000003C39364100000000A05C25410000000008B1F44000000000886520C100000000A05C25410000000008B1F44000000000886520C100000000005BFFC00000000008B1F440'::geometry,
 '17.5'::double precision))
                   Heap Blocks: exact=1816
                   Buffers: shared hit=20470
                   ->  Bitmap Index Scan on treenode_edge_gix  
(cost=0.00..48.25 rows=1044 width=0) (actual time=855.795..855.795 rows=2856 
loops=1)
                                 Index Cond: (edge &&& 
'01020000800200000000000000886520C100000000A05C25410000000020B2F440000000003C39364100000000005BFFC000000000F0AFF440'::geometry)
                                 Buffers: shared hit=18654
  Planning time: 3.467 ms
  Execution time: 864.614 ms

This shows clearly that the big bounding box test on all existing edges is the

problem, though it already makes use of the GiST index. Are there maybeother indices available that can discard many far-away bounding boxeswithout looking at them, i.e. that useses a hierarchy like an octree? Sofar I didn't find any, but it looked like the SP-GiST interface might beuseful if one were to create a new one.


Alternatives: There is one approach that I currently implement to improve this,
but I'd like to hear if someone has alternative, possibly better or other
suggestions?

This is what I tried: create a new table that would partition the space
into many cubes and store for each all intersecting edges. New, updated or
deleted nodes would then lead to an update of the relevant cubes, something

that would probably have to run asynchronously (didn't implement theupdate yet). Queries then pre-filtered the space by only looking atrelevant cubes instead of using the &&& operator with the bounding boxof each edge. For a range of cubse sizes this improved query time quitea bit, the above query would run in about a third of thet time above.However, I realize cell sizes will behave differently for differentdataset, but I didn't try to implement a dynamic cube size update, yet.Thinking about it, it makes me wonder if it wouldn't be best if I wouldimplement a octree---table based or as an extension.


Now I also saw the pointcloud extension and its indexing looked very

interesting (octree), but I couldn't really see how it could work in myuse case. But I would appreciate ideas if you see a way to make it work.


Best,
Tom
_______________________________________________
postgis-users mailing list
postgis-users@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/postgis-users

[postgis-users] Improve 3D BB query on large set of lines

Reply via email to