you are bunziping the code ? you are scanning the bzip blocks?
it is faster than the bunzip. But maybe you mean that it is very fast.

I have experimented with bziprecover to extract blocks on their own,
i made a perl script to extract blocks from a wikipedia file that can be
used to run the processing  of the huge file by many people in parallel.

https://code.launchpad.net/~jamesmikedupont/+junk/openstreetmap-wikipedia

It is a tool to extract lat/long coords from the wikipedia articles.

Such a processing of the large files would allow us to team up and all help.
We really need to just have an index file of all the blocks so that we can
find the ones that we need. Imagine being able to process the bzip file
directly!

mike

On Sat, Mar 13, 2010 at 9:31 PM, Nic Roets <nro...@gmail.com> wrote:

> Hello James,
>
> I wanted to split the planet into overlapping bboxes like this (click
> to see actual size):
> http://dev.openstreetmap.de/gosmore/
>
> On talk I described how I was dissatisfied with osmosis's memory
> consumption. So I came up with this observation: Most entities will
> end up in one or two extracts. And when it's two, it's in a pattern
> that is often repeated, say Africa bbox and Middle East bbox. Never
> Africa and Canada. So of the 2^168 possible combinations only around
> 3000 is actually used.
>
> So bboxSplit allocates 16 bits for each entity. Those are then indexes
> into the array of 'youniouns'. If a new node comes along, I check it
> against list of bboxes and it typically matches 1 or 2. So to find out
> quickly if I already have that combination of bboxes, I also have an
> STL map on the array of younions. A hashtable would have been faster.
>
> Ways and relations also trigger the code that merge younions.
>
> bboxSplit is faster than the corresponding bunzip and any program that
> uses libxml, i.e. very fast.
>
> Regards,
> Nic
>
> On Sat, Mar 13, 2010 at 10:03 PM, jamesmikedup...@googlemail.com
> <jamesmikedup...@googlemail.com> wrote:
> > That is very deep c++ code!
> > care to comment on how it works?
> > would be very interested to understand its performance ! looks very fast.
> > mike
> >
> > On Sat, Mar 13, 2010 at 7:06 PM, Nic Roets <nro...@gmail.com> wrote:
> >>
> >> My understanding is that all Xml compliant* parsers will abort at the
> >> file offsets that Frederik mentions.
> >> My advice is to use the egrep filter when in doubt, because you will
> >> loose no more than a dozen lines in a planet file of billions of
> >> lines.
> >>
> >> *: (My split program is not compliant and will happily ignore these
> >> errors:
> >>
> >>
> http://trac.openstreetmap.org/browser/applications/rendering/gosmore/bboxSplit.cpp
> )
> >>
> >> On Sat, Mar 13, 2010 at 7:44 PM, John Mitchell <mitchellj...@gmail.com>
> >> wrote:
> >> > Will this also be a problem if you try to import via osm2pgsql into
> >> > postgres?
> >> >
> >> > Thanks,
> >> >
> >> > John
> >> >
> >> > On 3/13/10, hbogner <hbog...@gmail.com> wrote:
> >> >> Thx for help, I'll try it.
> >> >>
> >> >> Now I have to follow 'dev' too :D
> >> >>
> >> >> Nic Roets wrote:
> >> >>> There's a bug in the code that generated this week's planet. You
> >> >>> should either wait until next week or filter the planet with the
> >> >>> following command:
> >> >>> bzcat /osm/planet-10*.osm.bz2 |egrep -v '&#[0-9]*;'|...
> >> >>>
> >> >>> There has been a long discussion on 'dev', mentioning other
> remedies.
> >> >>>
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> talk mailing list
> >> >> talk@openstreetmap.org
> >> >> http://lists.openstreetmap.org/listinfo/talk
> >> >>
> >> >
> >> >
> >> > --
> >> > John J. Mitchell
> >> >
> >> > _______________________________________________
> >> > talk mailing list
> >> > talk@openstreetmap.org
> >> > http://lists.openstreetmap.org/listinfo/talk
> >> >
> >>
> >> _______________________________________________
> >> talk mailing list
> >> talk@openstreetmap.org
> >> http://lists.openstreetmap.org/listinfo/talk
> >
> >
>
_______________________________________________
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk

Reply via email to