On Sep 28, 2012 7:27 AM, "THEVENON Julien" <julien_theve...@yahoo.fr> wrote:
> ------------------------------
> Le jeu. 27 sept. 2012 20:18 HAEC, Sarah Hoffmann a écrit :
>
> >> This is the real problem for us.
> >
> >For the sake of completeness: planetwide there are currently
> >152 million objects. Which means 1/6th of the planet consists of
> >French buildings. Now, there is a real problem.
> >
> Hi Sara,
>
> concerning problem of disk usage by french cadastre data do you have some
information?particulary do you know how is it stored in database?
> to be allowed to use cadastre data we have to add a source key which is
long about 40 characters to each way drawn thanks cadastre data due to
legal agreement with french office goverment providing cadastre data.
> do you know is this key is duplicatd for each building in the database or
if there is a smart storage? if not it would be interesting to know which
part of the size is for the key itself and which part is for the geometry.
I think that for buildings composed of one way and 4 nodes the space
required by the could be greater than for geometry.
  > if this is the case there is perhaps a way to factorise the source key
and dramatically reduce disk usage.

I think the biggest cost for long tags that are heavily used is really in
the planet file size. A bigger planet takes longer to generate, longer to
download, longer to parse. The sheer size of it can be a problem to some
potential users. Especially when over 10% of it is just tags from imports
that most data consumers couldn't care less about. I think I calculated
once that the tiger:upload_uuid tag here in the US is responsible for about
1% of the data in the planet file. Since it is a random string with
hundreds of thousands of possible values, it doesn't compress well either.

As for database space, it depends on the schema being used. The API
database does not store geometry information for ways. It only stores which
nodes belong to which ways. And every tag takes one row in the way tags
table. There is no consolidation of common tags. The same goes for the
planet file XML. So in these cases the tag will take up as much space as is
needed for the key+value strings and the "geometry" only takes a few bytes
per node to store the relationship between the way and the node.

The default rendering database style (osm2pgsql) explicitly drops the
source tag while importing since it is not useful for rendering but of
course it still takes CPU cycles to uncompress and parse every tag from the
XML/PBF file.
One schema where you could actually make a direct comparison is pgsnapshot.
It can store listening geometry and it stores all tags in an hstore field.
I'm not really sure how the linestring geometry is stored on disk. When
queried at a postgres prompt, it returns a string that is 187 characters
long for some random 4 node way I picked out.

Toby
_______________________________________________
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk

Reply via email to