A further update on this work:

 * I found more yet bizarre phone-related tags "phone:1", "telephone"
   and the like.  These have all been tidied.  My osmfilter now looks
   like this:    --keep="contact:*=* or phone*=* or Phone*=* or
   alt_phone=* or fax*=* or tty*=*"  Additional suggestions for
   something to search on are welcome so I get all phone numbers.
 * I found there were some formats used very regionally eg. Edmonton
   Schools used one format consistently and Ottawa Schools used a
   different format consistently.
 * The canada.poly filter I have been using includes Saint Pierre and
   Miquelon (which does not use North American dialing plan), as well
   as a few US entries (especially relations which go near the
   border). If anyone knows of a canada.poly that is tighter, can you
   point me in the direction?  I am generally leaving non-Canadian
   entries alone, but they do count in the stats below.
 * There are now 67 unique tag/phone number format combinations (down
   from 400+ originally) when using   egrep -i
   'k="[a-z:]*(phone|fax|tty)[a-z:]*" ' $OSMFILENAME | cut -d\" -f2,4
   | sed -e 's/[0-9]/#/g' | sed -e 's/[A-Z]/A/g' | sed -e 's/([a-zA-Z
   -]*)/(...)/g' | sort | uniq -c | sort -nr | wc -l
 * The bulk of the work remaining now is to reformat the big groups
   of numbers that do not begin with "+1".  I will make changes by
   area code to limit the number of canada-wide changesets.


As always, comments welcome.

Here is the new "top 20"as of ~10am ET today:

  12555 phone"+#-###-###-####
   4453 phone"+# ###-###-####
   4060 phone"###-###-####
   3749 phone"+# ### ### ####
   2624 phone"+# ### ###-####
   2239 phone"(###) ###-####
   1292 fax"+#-###-###-####
   1032 phone"##########
    941 contact:phone"+#-###-###-####
    323 phone"+###########
    322 phone"+# ### #######
    158 contact:fax"+#-###-###-####
    117 phone:tollfree"+#-###-###-####
    109 phone"###-####
     39 phone"+#-###-###-####;+#-###-###-####
     25 phone"+#-###-###-AAAA
     23 phone"+#-###-###-####x###
     17 phone"+# (###) ###-####
     14 phone"+#-###-###-####x####
      9 phone"+#-###-###-####x#



On 2018-02-04 11:49 PM, OSM Volunteer stevea wrote:
On Feb 4, 2018, at 8:37 PM, Matthew Darwin <matt...@mdarwin.ca> wrote:
Just an update on this activity.
Again, nice work!

Here are the top 20 tags as of ~4pm ET Sunday:

   10669 phone"+#-###-###-####
    4392 phone"+# ###-###-####
    4206 phone"###-###-####
    2970 phone"+# ### ### ####
    2540 phone"+# ### ###-####
    2451 phone"(###) ###-####
    1076 phone"##########
     659 phone"+# ### #######
     547 fax"+#-###-###-####
     522 contact:phone"+#-###-###-####
     516 phone"+###########
     456 phone"#-###-###-####
     446 phone"### ### ####
     378 fax"+# ###-###-####
     283 contact:phone"### ###-####
     260 phone"+# (###) ###-####
     200 fax"+###########
     186 phone"### ###-####
     170 phone"(###)###-####
     162 fax"+# ### ###-####
I'd appreciate others to chime in about this, but it seems where dashes and space 
characters overlap (are the only difference in format), those can be conflated together.  
I'm not sure whether dash or space ends up as "the winner," but this should 
reduce the number of categories.

As you consider additional conflations, you may be able to do this again and 
again, getting it down to a fairly small number of formats.  I urge additional 
feedback (here would be good) before additional conflations, but (I keep saying 
it):  nice work.

SteveA

_______________________________________________
Talk-ca mailing list
Talk-ca@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-ca

Reply via email to