On Sep 28, 2012 7:27 AM, "THEVENON Julien" <julien_theve...@yahoo.fr> wrote: > ------------------------------ > Le jeu. 27 sept. 2012 20:18 HAEC, Sarah Hoffmann a écrit : > > >> This is the real problem for us. > > > >For the sake of completeness: planetwide there are currently > >152 million objects. Which means 1/6th of the planet consists of > >French buildings. Now, there is a real problem. > > > Hi Sara, > > concerning problem of disk usage by french cadastre data do you have some information?particulary do you know how is it stored in database? > to be allowed to use cadastre data we have to add a source key which is long about 40 characters to each way drawn thanks cadastre data due to legal agreement with french office goverment providing cadastre data. > do you know is this key is duplicatd for each building in the database or if there is a smart storage? if not it would be interesting to know which part of the size is for the key itself and which part is for the geometry. I think that for buildings composed of one way and 4 nodes the space required by the could be greater than for geometry. > if this is the case there is perhaps a way to factorise the source key and dramatically reduce disk usage.
I think the biggest cost for long tags that are heavily used is really in the planet file size. A bigger planet takes longer to generate, longer to download, longer to parse. The sheer size of it can be a problem to some potential users. Especially when over 10% of it is just tags from imports that most data consumers couldn't care less about. I think I calculated once that the tiger:upload_uuid tag here in the US is responsible for about 1% of the data in the planet file. Since it is a random string with hundreds of thousands of possible values, it doesn't compress well either. As for database space, it depends on the schema being used. The API database does not store geometry information for ways. It only stores which nodes belong to which ways. And every tag takes one row in the way tags table. There is no consolidation of common tags. The same goes for the planet file XML. So in these cases the tag will take up as much space as is needed for the key+value strings and the "geometry" only takes a few bytes per node to store the relationship between the way and the node. The default rendering database style (osm2pgsql) explicitly drops the source tag while importing since it is not useful for rendering but of course it still takes CPU cycles to uncompress and parse every tag from the XML/PBF file. One schema where you could actually make a direct comparison is pgsnapshot. It can store listening geometry and it stores all tags in an hstore field. I'm not really sure how the linestring geometry is stored on disk. When queried at a postgres prompt, it returns a string that is 187 characters long for some random 4 node way I picked out. Toby
_______________________________________________ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk