Just an update on this activity.

 * I see people have been quite creative with the tags used for phone
   numbers, so it is taking me a bit longer to clean up than I
   originally thought.   Good to find all these weird tags: Phone,
   alt_phone, phone_1, phone_2, phone:tollfree, phone:toll-free, etc.
 * Radio stations and others with other explanations that are in the
   field now look like this: +#-###-###-#### (office);+#-###-###-####
   (on-air studio)
 * When a location had multiple phone numbers, and one was toll-free,
   I put it in phone:tollfree as that seemed to be used a bit (now
   ~100 times in Canada).  Alternately I could instead consolidate
   all the toll free phone numbers into the regular phone field. 
   Suggestions welcome
   (https://taginfo.openstreetmap.org/search?q=tollfree)
 * For phone numbers with the wrong number of digits: If I could
   figure out what was wrong (eg there was a web site listed) then I
   fixed it. In a dozen cases I couldn't make sense of the number and
   deleted it (also delete phone numbers that were like "+1-" (no
   real number).  Where no area code was listed, I left the number as
   7 digits only (someone local can probably fix it easily).   Phone
   numbers of "911" were also removed.
 * We are now down to ~140 unique formats.  Although this is a bit
   misleading if you compare it to my ~400 formats I mentioned
   initially, that doesn't include all the other tags I found and
   fixed along the way.  I also forgot to include relations in my
   initial query.... they're in there now.
 * Using josm for editing: regular expression search and the to-do
   list work quite well for this task.  Although eliminating
   non-printable characters from the value took a bit to figure out. 
   (there were also values with trailing spaces)

Here are the top 20 tags as of ~4pm ET Sunday:

  10669 phone"+#-###-###-####
   4392 phone"+# ###-###-####
   4206 phone"###-###-####
   2970 phone"+# ### ### ####
   2540 phone"+# ### ###-####
   2451 phone"(###) ###-####
   1076 phone"##########
    659 phone"+# ### #######
    547 fax"+#-###-###-####
    522 contact:phone"+#-###-###-####
    516 phone"+###########
    456 phone"#-###-###-####
    446 phone"### ### ####
    378 fax"+# ###-###-####
    283 contact:phone"### ###-####
    260 phone"+# (###) ###-####
    200 fax"+###########
    186 phone"### ###-####
    170 phone"(###)###-####
    162 fax"+# ### ###-####


On 2018-01-31 11:09 PM, OSM Volunteer stevea wrote:
        • There are additionally ~45 phone numbers that use letters instead of 
digits (eg 1-555-GOT-BEER)
        • ";" separator is used occasionally to indicate multiple phone numbers.  " ", 
"," and "/" are also used.
        • There are random comments in the phone number field (not sure where 
these really should be?)
        • Extensions are represented generally by "x" or "ext" or "ext."
        • There are less than 1000 phone numbers using contact:phone instead of 
phone, using ~40 unique formats
        • I did not analyze phone_1 or fax or any other tags.
I will continue to cleanup phone numbers across the country which are missing 
the leading +1 and or are not one of the 4 common formats listed above.  My 
thought is that
        • I will leave the phone numbers of 1-555-GOT-BEER type.
        • I will use ";" as multiple number separator.
        • I will use "x" for extension.
        • And I will be happy to cleanup the wonky ones with lots of text in them if 
there is a direction of where this should move to.  Example for a radio station: 
"office (###) ###-####; on-air studio (###) ###-####"

Feedback welcome.
Those sound largely sane and well thought out to me.  (And I wrote phone number parsers 
for the NANP about 30 years ago, um — wait for it — in HyperTalk!)  The GOT-BEER style 
are best left alone (imo) as smarter parsers eventually figure those out.  Yes, ; 
(semicolon) is a frequent separator in key:value pair value lists in OSM data.  Yes, x 
(choose a case, lower seems better and more common than upper) for extensions.  For the 
radio station/on-air studio stuff I'd make the first part of each of these "compound 
data" be the phone number in one of the acceptable formats along with other data, 
then have extra descriptive text added to the rest, even if in a semicolon-separated 
list.  That's a pretty regular set of alphanumerics and with maybe a eight or ten rules, 
(reasonable for a parser extracting machine-dialble phone numbers, if necessary), you're 
either done or at or above 99%, I'd be willing to wager (and I'm not a betting type, 
though I do play poker with friends and online).

Nice job.

SteveA

_______________________________________________
Talk-ca mailing list
Talk-ca@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-ca

Reply via email to