Lately I've played with some OpenStreetMap data...
Nodes imported have many properties with a small set of values (road
type, point-of-interest type, colour, ...) but I don't know in advance
the set of values (sometimes a new value can become standard,
sometimes an invalid value is present).
Other node properties are just unique text (address, url).
To speed up the import process I've tried to apply some kind of
compression, I've seen that Neo4j encode property names using a
sequence of integers, I've tried to do the same for values of all the
properties which I know they contain only a small set.

With this encoding the database is obviously much smaller..

after importing sweden.osm the database dir is 552M:
100M neostore.propertystore.db
220M neostore.propertystore.db.arrays
227M neostore.propertystore.db.strings

with 'compression' on is 344M:
100M neostore.propertystore.db
220M neostore.propertystore.db.arrays
20M neostore.propertystore.db.strings
property value dictionary entries: 16286
property value dictionary size: 387378 bytes

I don't know if this is a common use case, but it would be cool to
have this kind of compression out of the box!

WDYT?

Regards,
-- 
Davide Savazzi
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to