Ok, the first version of my result parser is here:
https://code.launchpad.net/~jamesmikedupont/+junk/EPANatReg

Right now it just extracts the tables and produces some statsitics.
I will first create an spreasheet of the data when it is ready.

Here are the statistics so far, of what attributes occur how often :

FILES   : 1636
Files with NULL bytes:  216

>From those,  they all have the following attributes :
    REGISTRY ID ,    LONGITUDE  , PROGRAM , COUNTY NAME,
HORIZONTAL DATUM, LOCATION ADDRESS, LATITUDE

HORIZONTAL DATUM has been always        WGS84, so it is not interesting. I
assume that for some islands it will be different.

The collection method is not filled out all the time, and it might
give clues to how accurate the data is. We could use this to display
only certain types of data that are deemed fit, leaving the others
invisible for example.

Which tell how it was collected, the breakdown on that is :

COLLECTION METHOD       1357
        INTERPOLATION-OTHER:    1
        GPS, WITH CANADIAN ACTIVE CONTROL SYSTEM:       1
        GPS CARRIER PHASE KINEMATIC RELATIVE POSITION:  1
        ADDRESS MATCHING-PRIMARY NAME:  1
        GPS CODE (PSEUDO RANGE) STANDARD POSITION (SA ON):      1
        GPS CODE (PSEUDO RANGE) PRECISE POSITION:       1
        ADDRESS MATCHING-STREET CENTERLINE:     2
        ADDRESS MATCHING-DIGITIZED:     2
        GPS CODE (PSEUDO RANGE) STANDARD POSITION (SA OFF):     2
        CLASSICAL SURVEYING TECHNIQUES: 4
        ADDRESS MATCHING-BLOCK FACE:    4
        CENSUS BLOCK/GROUP-1990-CENTROID:       4
        PUBLIC LAND SURVEY - EIGHTH SECTION:    7
        GPS CODE (PSEUDO RANGE) DIFFERENTIAL:   12
        INTERPOLATION - DIGITAL MAP SRCE (TIGER):       16
        UNKNOWN:        20
        INTERPOLATION-PHOTO:    21
        INTERPOLATION-MAP:      48
        GPS CARRIER PHASE STATIC RELATIVE POSITION:     97
        ADDRESS MATCHING-OTHER: 128
        GPS - UNSPECIFIED:      133
        ADDRESS MATCHING-HOUSE NUMBER:  851


Here are the different programs that were collected :

<A HREF="http://www.epa.gov/superfund/action/law/cercla.htm";
target=_blank>COMPREHENSIVE ENVIRONMENTAL RESPONSE, COMPENSATION AND
INFORMATION SYSTEM</A>  15
<A HREF="http://www.epa.gov/brownfields/"; target=_blank>Office of
Brownfields and Land Revitalization</A> 52
<A HREF="http://www.epa.gov/osw/hazard/correctiveaction/index.htm";
target=_blank>RESOURCE CONSERVATION AND RECOVERY ACT INFORMATION
SYSTEM</A>      64
<A HREF="http://www.epa.gov/osw/hazard/tsd/index.htm";
target=_blank>RESOURCE CONSERVATION AND RECOVERY ACT INFORMATION
SYSTEM</A>      65
<A HREF="http://cfpub.epa.gov/npdes/"; target=_blank>PERMIT COMPLIANCE
SYSTEM</A>      171
<A HREF="http://cfpub.epa.gov/npdes/"; target=_blank>INTEGRATED
COMPLIANCE INFORMATION SYSTEM</A>       171
NULL    218
<A HREF="http://www.epa.gov/osw/hazard/generation/lqg.htm";
target=_blank>RESOURCE CONSERVATION AND RECOVERY ACT INFORMATION
SYSTEM</A>      252
<A HREF="http://www.epa.gov/air/oaqps/permits/obtain.html";
target=_blank>AEROMETRIC INFORMATION RETRIEVAL SYSTEM / AIRS FACILITY
SYSTEM</A>      398
<A HREF="http://www.epa.gov/tri/"; target=_blank>TOXIC CHEMICAL RELEASE
INVENTORY SYSTEM</A>    1009


This is a good start, I think that the download will be finished in a
few days and then we will have a table of data to work with.

The next step will be to extract the changesets back out from OSM, and
get the nodes (if they have been updated) and then we can process the
ones that did not get touched yet.
That can also run in parallel, josm can be used to update the dataset
once it is in memory.

mike

On Wed, Dec 16, 2009 at 8:42 AM, jamesmikedup...@googlemail.com
<jamesmikedup...@googlemail.com> wrote:
> I have started the process to resolve the URLS that were listed (in
> the kml file)
>
> The program will wait between requests so as not to abuse the server.
> Some of them are already returning 0. I am going to start to parse the
> ones I got already.
>
> I will then parse the data and and start to update the records. At
> least we can filter out alot of junk this way.
>
> Again, I am committed to repair any damages I caused. I am a
> professional programmer and can code many different solutions. I will
> continue to work on this until we have a good solution.
>
> mike
>
> On Wed, Dec 16, 2009 at 8:22 AM, jamesmikedup...@googlemail.com
> <jamesmikedup...@googlemail.com> wrote:
>> Very good!
>> That means we have a fast criteria to remove alot of junk :
>> if the url is returning 0 bytes.
>> I will start on this asap.
>> mike
>>
>> On Wed, Dec 16, 2009 at 1:55 AM, Anthony <o...@inbox.org> wrote:
>>> On Tue, Dec 15, 2009 at 1:06 AM, jamesmikedup...@googlemail.com
>>> <jamesmikedup...@googlemail.com> wrote:
>>>>
>>>> What urls dont work?
>>>
>>> Every one I've tried so far.  Here's one:
>>> http://iaspub.epa.gov/enviro/national_kml.registry_html?p_registry_id=110038277664
>>>
>>> See http://www.openstreetmap.org/browse/node/586922112
>>>
>>> Which is a Pinch-A-Penny (a pool supplies store), that is listed as
>>> man_made=envionmental_hazard (presumably because it has pool supplies) and
>>> landuse=industrial (which is just plain wrong).  The node is in the middle
>>> of the highway, and I'm not sure exactly where to move it to (I looked up
>>> the address by doing a few google searches, but that didn't help too much).
>>> I'll leave the node around in this state for at least a few days so you can
>>> look at it.
>>>
>>
>

_______________________________________________
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us

Reply via email to