On 01/04/2014 08:26 PM, Alex Kleider wrote:
Any suggestions as to a better way to handle the problem of encoding in the following context would be appreciated. The problem arose because 'Bogota' is spelt with an acute accent on the 'a'.$ cat IP_info.py3 #!/usr/bin/env python3 # -*- coding : utf -8 -*- # file: 'IP_info.py3' a module. import urllib.request url_format_str = \ 'http://api.hostip.info/get_html.php?ip=%s&position=true' def ip_info(ip_address): """ Returns a dictionary keyed by Country, City, Lat, Long and IP. Depends on http://api.hostip.info (which returns the following: 'Country: UNITED STATES (US)\nCity: Santa Rosa, CA\n\nLatitude: 38.4486\nLongitude: -122.701\nIP: 76.191.204.54\n'.) THIS COULD BREAK IF THE WEB SITE GOES AWAY!!! """ response = urllib.request.urlopen(url_format_str %\ (ip_address, )).read() sp = response.splitlines() country = city = lat = lon = ip = '' for item in sp: if item.startswith(b"Country:"): try: country = item[9:].decode('utf-8') except: print("Exception raised.") country = item[9:] elif item.startswith(b"City:"): try: city = item[6:].decode('utf-8') except: print("Exception raised.") city = item[6:] elif item.startswith(b"Latitude:"): try: lat = item[10:].decode('utf-8') except: print("Exception raised.") lat = item[10] elif item.startswith(b"Longitude:"): try: lon = item[11:].decode('utf-8') except: print("Exception raised.") lon = item[11] elif item.startswith(b"IP:"): try: ip = item[4:].decode('utf-8') except: print("Exception raised.") ip = item[4:] return {"Country" : country, "City" : city, "Lat" : lat, "Long" : lon, "IP" : ip } if __name__ == "__main__": addr = "201.234.178.62" print (""" IP address is %(IP)s: Country: %(Country)s; City: %(City)s. Lat/Long: %(Lat)s/%(Long)s""" % ip_info(addr)) """ The output I get on an Ubuntu 12.4LTS system is as follows: alex@x301:~/Python/Parse$ ./IP_info.py3 Exception raised. IP address is 201.234.178.62: Country: COLOMBIA (CO); City: b'Bogot\xe1'. Lat/Long: 10.4/-75.2833 I would have thought that utf-8 could handle the 'a-acute'. Thanks, alex
'รก' does not encode to 0xe1 in utf8 encoding; what you read is probably (legacy) files in probably latin-1 (or another latin-* encoding).
Denis _______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
