Re: [OSM-talk] First drop in planet size ?

Nic Roets Sat, 13 Mar 2010 12:53:42 -0800

No. It runs on the uncompressed planet, like this :
bzcat /osm/planet-10*.osm.bz2 |   /osm/gosmore/bboxSplit \
   -85.05113   73.12500    9.44906  180.00000 gzip 0720048510241024.osm.gz \
   -25.48295  120.58594   72.91964  180.00000 gzip 0855020310240587.osm.gz \
   -85.05113   98.43750   13.23995  172.61719 gzip 0792047410031024.osm.gz \
...


I'm not too worried about further optimizations: Unlike wikipedia,
there isn't the same urgency to have up-to-date. Except for disaster
relief.


On Sat, Mar 13, 2010 at 10:42 PM, jamesmikedup...@googlemail.com
<jamesmikedup...@googlemail.com> wrote:
> you are bunziping the code ? you are scanning the bzip blocks?
> it is faster than the bunzip. But maybe you mean that it is very fast.
>
> I have experimented with bziprecover to extract blocks on their own,
> i made a perl script to extract blocks from a wikipedia file that can be
> used to run the processing  of the huge file by many people in parallel.
>
> https://code.launchpad.net/~jamesmikedupont/+junk/openstreetmap-wikipedia
>
> It is a tool to extract lat/long coords from the wikipedia articles.
>
> Such a processing of the large files would allow us to team up and all help.
> We really need to just have an index file of all the blocks so that we can
> find the ones that we need. Imagine being able to process the bzip file
> directly!
>
> mike
>
> On Sat, Mar 13, 2010 at 9:31 PM, Nic Roets <nro...@gmail.com> wrote:
>>
>> Hello James,
>>
>> I wanted to split the planet into overlapping bboxes like this (click
>> to see actual size):
>> http://dev.openstreetmap.de/gosmore/
>>
>> On talk I described how I was dissatisfied with osmosis's memory
>> consumption. So I came up with this observation: Most entities will
>> end up in one or two extracts. And when it's two, it's in a pattern
>> that is often repeated, say Africa bbox and Middle East bbox. Never
>> Africa and Canada. So of the 2^168 possible combinations only around
>> 3000 is actually used.
>>
>> So bboxSplit allocates 16 bits for each entity. Those are then indexes
>> into the array of 'youniouns'. If a new node comes along, I check it
>> against list of bboxes and it typically matches 1 or 2. So to find out
>> quickly if I already have that combination of bboxes, I also have an
>> STL map on the array of younions. A hashtable would have been faster.
>>
>> Ways and relations also trigger the code that merge younions.
>>
>> bboxSplit is faster than the corresponding bunzip and any program that
>> uses libxml, i.e. very fast.
>>
>> Regards,
>> Nic
>>
>> On Sat, Mar 13, 2010 at 10:03 PM, jamesmikedup...@googlemail.com
>> <jamesmikedup...@googlemail.com> wrote:
>> > That is very deep c++ code!
>> > care to comment on how it works?
>> > would be very interested to understand its performance ! looks very
>> > fast.
>> > mike
>> >
>> > On Sat, Mar 13, 2010 at 7:06 PM, Nic Roets <nro...@gmail.com> wrote:
>> >>
>> >> My understanding is that all Xml compliant* parsers will abort at the
>> >> file offsets that Frederik mentions.
>> >> My advice is to use the egrep filter when in doubt, because you will
>> >> loose no more than a dozen lines in a planet file of billions of
>> >> lines.
>> >>
>> >> *: (My split program is not compliant and will happily ignore these
>> >> errors:
>> >>
>> >>
>> >> http://trac.openstreetmap.org/browser/applications/rendering/gosmore/bboxSplit.cpp)
>> >>
>> >> On Sat, Mar 13, 2010 at 7:44 PM, John Mitchell <mitchellj...@gmail.com>
>> >> wrote:
>> >> > Will this also be a problem if you try to import via osm2pgsql into
>> >> > postgres?
>> >> >
>> >> > Thanks,
>> >> >
>> >> > John
>> >> >
>> >> > On 3/13/10, hbogner <hbog...@gmail.com> wrote:
>> >> >> Thx for help, I'll try it.
>> >> >>
>> >> >> Now I have to follow 'dev' too :D
>> >> >>
>> >> >> Nic Roets wrote:
>> >> >>> There's a bug in the code that generated this week's planet. You
>> >> >>> should either wait until next week or filter the planet with the
>> >> >>> following command:
>> >> >>> bzcat /osm/planet-10*.osm.bz2 |egrep -v '&#[0-9]*;'|...
>> >> >>>
>> >> >>> There has been a long discussion on 'dev', mentioning other
>> >> >>> remedies.
>> >> >>>
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> talk mailing list
>> >> >> talk@openstreetmap.org
>> >> >> http://lists.openstreetmap.org/listinfo/talk
>> >> >>
>> >> >
>> >> >
>> >> > --
>> >> > John J. Mitchell
>> >> >
>> >> > _______________________________________________
>> >> > talk mailing list
>> >> > talk@openstreetmap.org
>> >> > http://lists.openstreetmap.org/listinfo/talk
>> >> >
>> >>
>> >> _______________________________________________
>> >> talk mailing list
>> >> talk@openstreetmap.org
>> >> http://lists.openstreetmap.org/listinfo/talk
>> >
>> >
>
>

_______________________________________________
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk

Re: [OSM-talk] First drop in planet size ?

Reply via email to