On 01/04/2014 08:26 PM, Alex Kleider wrote:
Any suggestions as to a better way to handle the problem of encoding in the
following context would be appreciated.  The problem arose because 'Bogota' is
spelt with an acute accent on the 'a'.

$ cat IP_info.py3
#!/usr/bin/env python3
# -*- coding : utf -8 -*-
# file: 'IP_info.py3'  a module.

import urllib.request

url_format_str = \
     'http://api.hostip.info/get_html.php?ip=%s&position=true'

def ip_info(ip_address):
     """
Returns a dictionary keyed by Country, City, Lat, Long and IP.

Depends on http://api.hostip.info (which returns the following:
'Country: UNITED STATES (US)\nCity: Santa Rosa, CA\n\nLatitude:
38.4486\nLongitude: -122.701\nIP: 76.191.204.54\n'.)
THIS COULD BREAK IF THE WEB SITE GOES AWAY!!!
"""
     response =  urllib.request.urlopen(url_format_str %\
                                    (ip_address, )).read()
     sp = response.splitlines()
     country = city = lat = lon = ip = ''
     for item in sp:
         if item.startswith(b"Country:"):
             try:
                 country = item[9:].decode('utf-8')
             except:
                 print("Exception raised.")
                 country = item[9:]
         elif item.startswith(b"City:"):
             try:
                 city = item[6:].decode('utf-8')
             except:
                 print("Exception raised.")
                 city = item[6:]
         elif item.startswith(b"Latitude:"):
             try:
                 lat = item[10:].decode('utf-8')
             except:
                 print("Exception raised.")
                 lat = item[10]
         elif item.startswith(b"Longitude:"):
             try:
                 lon = item[11:].decode('utf-8')
             except:
                 print("Exception raised.")
                 lon = item[11]
         elif item.startswith(b"IP:"):
             try:
                 ip = item[4:].decode('utf-8')
             except:
                 print("Exception raised.")
                 ip = item[4:]
     return {"Country" : country,
             "City" : city,
             "Lat" : lat,
             "Long" : lon,
             "IP" : ip            }

if __name__ == "__main__":
     addr =  "201.234.178.62"
     print ("""    IP address is %(IP)s:
         Country: %(Country)s;  City: %(City)s.
         Lat/Long: %(Lat)s/%(Long)s""" % ip_info(addr))
"""

The output I get on an Ubuntu 12.4LTS system is as follows:
alex@x301:~/Python/Parse$ ./IP_info.py3
Exception raised.
     IP address is 201.234.178.62:
         Country: COLOMBIA (CO);  City: b'Bogot\xe1'.
         Lat/Long: 10.4/-75.2833


I would have thought that utf-8 could handle the 'a-acute'.

Thanks,
alex

'รก' does not encode to 0xe1 in utf8 encoding; what you read is probably (legacy) files in probably latin-1 (or another latin-* encoding).

Denis
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to