joto created an issue (osm2pgsql-dev/osm2pgsql#2368)

In general osm2pgsql is built around the principle that while it does its 
processing you only look at one OSM object at a time and process that. But this 
is a simplified view. There are already cases where we look at more than one 
object at a time, and it is likely that there will be more cases in the future 
with advanced relations processing that is often asked for.

For this to work we need to store OSM objects (or at least their location) in 
the middle and get them back when needed. Osm2pgsql has code for that already. 
Unfortunately that code has changed many times over the years and it has become 
hard to reason about and check. The recent PRs #2365 and #2367 have shown that.

Basically the problem is this:

* We need different pieces of objects at different times and for different 
reasons. Sometimes we only need the geometry (location), sometimes the tags, 
sometimes related objects (members, parents, ...).
* We can not predict what pieces of data we will need, because it depends on 
the complex logic implemented in Lua scripts by the user.
* Depending on the middle used and several options ([ram 
middle](https://github.com/osm2pgsql-dev/osm2pgsql/blob/master/src/middle-ram.hpp#L87-L106)
 and [pgsql 
middle](https://github.com/osm2pgsql-dev/osm2pgsql/blob/master/src/middle-pgsql.hpp#L35-L48))
 parts of the data can be stored in different places. Some of these places are 
expensive to access (mainly the database).
* Accessing the database is more efficient if we don't do it every time we need 
something. For instance if and when we need a node member of a relation it 
makes sense to also get the other node members in the same query. Chances are, 
we are going to need them also, and we can do the query in one go instead of 
having n queries for n nodes.

Keeping track of all this "manually" in the code will lead to headaches and 
bugs every time we want to add new features in osm2pgsql that need extra bits 
and pieces of objects. So we should think about a better way to solve this.

We'd need some kind of "smart cache" either in the middle implementations or 
between the RAM and pgsql middle and the users of the middle that will answer 
requests for objects. If the object is not available yet, the cache will 
retrieve it and possibly other pieces of data, too.

To make this work without the outside code having to understand the details, 
the cache must be accessed through the objects themselves. So for instance the 
outside code says: "give me node 17", it will get a proxy object back. When the 
code then uses the object ("give me the location for this node"), the proxy 
will figure out that it needs to get the location "just in time". It stores the 
location in the proxy so that it doesn't have to do that again, in case the 
code needs the location a second time. The cache probably also needs some kind 
of interface to get more than one object at a time. So that it can optimize 
database queries as mentioned above.

Currently we are using osmium::Node/Way/Relation objects in many places. But 
they are cumbersome, because they have to live in an osmium::Buffer. And they 
have no space to store the extra data needed for our proxy objects. We have to 
change all the code to work with those proxy objects instead. The only place 
where we really need the Osmium objects is when interacting with the Osmium 
library, which is when reading the data from the input file and when building 
multipolygons. We need to take that into account, but I believe that in all 
other cases we can move away from that interface.

One other thing we need to keep in mind here: One way to speed things up is 
multithreading. If we can ask the database for objects we are likely going to 
need soon in an extra thread, we could speed things up. But that means that 
cache would have to support multithreading in some form.


-- 
Reply to this email directly or view it on GitHub:
https://github.com/osm2pgsql-dev/osm2pgsql/issues/2368
You are receiving this because you are subscribed to this thread.

Message ID: <osm2pgsql-dev/osm2pgsql/issues/[email protected]>
_______________________________________________
Tile-serving mailing list
[email protected]
https://lists.openstreetmap.org/listinfo/tile-serving

Reply via email to