Hi, There isn't a day gone past where the vast gaps in the OSM dataset, the >> missing address nodes, missing turn restrictions, missing building >> outlines, missing subdivisions, missing everything and whatnot don't >> hugely degrade the usefulness of the project. >> > > OSM is not a data dumping ground, OSM is a community project. Importing all > these things without a community to support them is worth less than nothing, > it hurts the project rather than helping it. > > If you have a shape file with building outlines, configure your Mapnik > instance to render the buildings from that. >
So anybody who wants to see building outlines should spend thousands of hours tracing them by hand, a mind numbingly boring, tedious task or just run their own private Mapnik instance and render with that? What kind of statement is that? Why don't you go configure your Mapnik not to use any data from an import and use that instead? Its not a fair statement to make. There is a huge amount of data out there that is under an acceptable > license to import into OSM that would be a great asset to the project. > No, no, and no again. OSM is not a pool to collect the free geodata of the > world. Because you are right - there is an *awful* lot of geodata available > and we do _not_ want to burden our infrastructure with dead stuff that > nobody cares about. You really don't think that there is data out there that we could import that would be an asset to the project? None? At all? Sure, there is data out there that we don't need and don't want in OSM because its not as good as what we've got or its not the type of data that the project is about and we don't need to burden our infrastructure with that. I'm not saying that we should be a "dumping ground" of free geodata and that everything out there should go in. I'm say that there is a lot of great stuff out there and we should figure out how to bring that in. You can say "just go collect it manually" but if we know the data is > already there we're not going to put in years of work duplicating it > just to appease this anti-import mindset that some on this list have. > Let's say it is a pro-community mindset. Prove that there's the manpower and > the interest to maintain the imported data and you might have a point. > I've put in a lot of transit data, such as bus stops, by hand. How do you prove that there is the manpower and interest to keep this updated? You can't. In fact, the city updates their GTFS feed more often and more accurately than I can hope to keep up with all the changes they make by doing everything on foot. It is something that people use and would like to see in OSM so it certainly isn't "dead stuff". What we need is a good toolchain to do imports and be able to import changes from upstream sources like tiger and GTFS feeds where appropriate. The US has lots of free data. You seem to think that importing this data hurts the US because people who just "look" at the map don't see open spaces to fill in and therefore don't contribute and create community. That if only we didn't do imports the community would form to gather the data by hand and everything would be good. I don't think this is the case. A community didn't form in the US pre-tiger import when the map was a blank slate here. We didn't because we knew that data was there and that importing that data would make a lot more sense than trying to duplicate it. Take for instance the San Francisco address data that I've been working on cleaning up so that it can be imported. Having address data in OSM makes it a much more useful dataset, especially for routing. As far as addresses go in San Francisco a few shops and restaurants currently have them entered in OSM. There also a couple dozen blocks that have address range ways alongside them. Other than that there is no address data at all in OSM for San Francisco. We can import this dataset which is really pretty good to start with and will be even better once I've cleaned it up a bit more. It will probably be about 200k nodes. At a rough estimate, given how many miles of streets would need to be walked and how much data would have to be input I'd say it would take somewhere between 3-6 thousand man hours to duplicate. Why should we not do it? Just because we can't prove that we'll be able to maintain it? Its not like the addresses jump around frequently. I know that in Europe, especially Germany, the whole army of mappers with boots on the ground thing is working really well and thats great. Over here in the US we don't have that. It would be nice if we did but we don't. What we do have is a lot of PD government data, much of which is constantly being maintained and updated by the government. Lots of us would like to work with what we have and make good use of those government datasets, some of which are really good. I guess I'm just frustrated that anytime someone even thinks the word 'import' that they suffer an onslaught of condescending 'imports are bad' and 'community, community, community' diatribes. This thread is a great example. Someone wondered about making a tool that could help make imports easier to perform. Nobody talked about technical details of it. Whats the best way to do it? Whats techniques can we use to help prevent duplicates if they're is an overlap between the datasets? Are there already existing conflation tools we could integrate as well? Are there ways we could flag large uploads automatically so that we can check and make sure that data is coming from a legal source? Could we set up a test server so that people can work through the process without using the live servers and without having to go through the high barrier to entry step of setting up their own test servers? None of this gets discussed. It really feels like every time a discussion about how to make an import tool is discussed it gets hijacked by this whole anti-import debate. I don't think that is a proper way to go about things. Cheers, Greg
_______________________________________________ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk