No. It runs on the uncompressed planet, like this : bzcat /osm/planet-10*.osm.bz2 | /osm/gosmore/bboxSplit \ -85.05113 73.12500 9.44906 180.00000 gzip 0720048510241024.osm.gz \ -25.48295 120.58594 72.91964 180.00000 gzip 0855020310240587.osm.gz \ -85.05113 98.43750 13.23995 172.61719 gzip 0792047410031024.osm.gz \ ...
I'm not too worried about further optimizations: Unlike wikipedia, there isn't the same urgency to have up-to-date. Except for disaster relief. On Sat, Mar 13, 2010 at 10:42 PM, jamesmikedup...@googlemail.com <jamesmikedup...@googlemail.com> wrote: > you are bunziping the code ? you are scanning the bzip blocks? > it is faster than the bunzip. But maybe you mean that it is very fast. > > I have experimented with bziprecover to extract blocks on their own, > i made a perl script to extract blocks from a wikipedia file that can be > used to run the processing of the huge file by many people in parallel. > > https://code.launchpad.net/~jamesmikedupont/+junk/openstreetmap-wikipedia > > It is a tool to extract lat/long coords from the wikipedia articles. > > Such a processing of the large files would allow us to team up and all help. > We really need to just have an index file of all the blocks so that we can > find the ones that we need. Imagine being able to process the bzip file > directly! > > mike > > On Sat, Mar 13, 2010 at 9:31 PM, Nic Roets <nro...@gmail.com> wrote: >> >> Hello James, >> >> I wanted to split the planet into overlapping bboxes like this (click >> to see actual size): >> http://dev.openstreetmap.de/gosmore/ >> >> On talk I described how I was dissatisfied with osmosis's memory >> consumption. So I came up with this observation: Most entities will >> end up in one or two extracts. And when it's two, it's in a pattern >> that is often repeated, say Africa bbox and Middle East bbox. Never >> Africa and Canada. So of the 2^168 possible combinations only around >> 3000 is actually used. >> >> So bboxSplit allocates 16 bits for each entity. Those are then indexes >> into the array of 'youniouns'. If a new node comes along, I check it >> against list of bboxes and it typically matches 1 or 2. So to find out >> quickly if I already have that combination of bboxes, I also have an >> STL map on the array of younions. A hashtable would have been faster. >> >> Ways and relations also trigger the code that merge younions. >> >> bboxSplit is faster than the corresponding bunzip and any program that >> uses libxml, i.e. very fast. >> >> Regards, >> Nic >> >> On Sat, Mar 13, 2010 at 10:03 PM, jamesmikedup...@googlemail.com >> <jamesmikedup...@googlemail.com> wrote: >> > That is very deep c++ code! >> > care to comment on how it works? >> > would be very interested to understand its performance ! looks very >> > fast. >> > mike >> > >> > On Sat, Mar 13, 2010 at 7:06 PM, Nic Roets <nro...@gmail.com> wrote: >> >> >> >> My understanding is that all Xml compliant* parsers will abort at the >> >> file offsets that Frederik mentions. >> >> My advice is to use the egrep filter when in doubt, because you will >> >> loose no more than a dozen lines in a planet file of billions of >> >> lines. >> >> >> >> *: (My split program is not compliant and will happily ignore these >> >> errors: >> >> >> >> >> >> http://trac.openstreetmap.org/browser/applications/rendering/gosmore/bboxSplit.cpp) >> >> >> >> On Sat, Mar 13, 2010 at 7:44 PM, John Mitchell <mitchellj...@gmail.com> >> >> wrote: >> >> > Will this also be a problem if you try to import via osm2pgsql into >> >> > postgres? >> >> > >> >> > Thanks, >> >> > >> >> > John >> >> > >> >> > On 3/13/10, hbogner <hbog...@gmail.com> wrote: >> >> >> Thx for help, I'll try it. >> >> >> >> >> >> Now I have to follow 'dev' too :D >> >> >> >> >> >> Nic Roets wrote: >> >> >>> There's a bug in the code that generated this week's planet. You >> >> >>> should either wait until next week or filter the planet with the >> >> >>> following command: >> >> >>> bzcat /osm/planet-10*.osm.bz2 |egrep -v '&#[0-9]*;'|... >> >> >>> >> >> >>> There has been a long discussion on 'dev', mentioning other >> >> >>> remedies. >> >> >>> >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> >> talk mailing list >> >> >> talk@openstreetmap.org >> >> >> http://lists.openstreetmap.org/listinfo/talk >> >> >> >> >> > >> >> > >> >> > -- >> >> > John J. Mitchell >> >> > >> >> > _______________________________________________ >> >> > talk mailing list >> >> > talk@openstreetmap.org >> >> > http://lists.openstreetmap.org/listinfo/talk >> >> > >> >> >> >> _______________________________________________ >> >> talk mailing list >> >> talk@openstreetmap.org >> >> http://lists.openstreetmap.org/listinfo/talk >> > >> > > > _______________________________________________ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk