New submission from Silverback Networks <silverback...@gmail.com>:

I've searched high and low to find a way to make Python accept Apple's iOS 
characters, but it looks like Python is not supporting greater than 16-bit 
characters correctly. If you look at the leading character of each group, it's 
\xf0, indicating a 4-character sequence, which also indicates greater than 
16-bit characters. I've tried all three "errors" arguments to decode - ignore, 
replace, and strict - and still get this error each time:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 140: 
character maps to <undefined>

So I have no way to proceed short of rolling my own corrected unicode decoder. 
My assumption is that Python should convert a character regardless of whether 
it's found in the internal lookup database, or at a minimum there should be a 
way to signal Python to do so.

Below is a sample bytes string that will reproduce the problem:

b'<?xml version=\'1.0\' encoding=\'utf-8\'?>\n  <dict>\n   <key>\n    
average-user-rating\n   </key>\n   <real>\n    1\n   </real>\n   <key>\n    
text\n   </key>\n   <string>\n    
\xf0\x9f\x8e\x84\xf0\x9f\x8e\x85\xf0\x9f\x8e\x81\xf0\x9f\x8e\x84\xf0\x9f\x8e\x85\xf0\x9f\x8e\x81
  if you haven&#39;t checked this out yet please do. download APP TRAILERS and 
go to videos use promo code FREE4U and enjoy free apps courtesy of apple MERRY 
CHRISTMAS 
\xf0\x9f\x8e\x84\xf0\x9f\x8e\x85\xf0\x9f\x8e\x81\xf0\x9f\x8e\x84\xf0\x9f\x8e\x85\xf0\x9f\x8e\x81\n
   </string>\n   <key>\n    title\n   </key>\n   <string>\n    4. IF YOU LOVE 
FREE STUFF (v1.5)\n   </string>\n   <key>\n    type\n   </key>\n   <string>\n   
 review\n   </string>\n   <key>\n    user-name\n   </key>\n   <string>\n    
Freenesss on Dec 16, 2011\n   </string>\n  </dict>\n  <dict>\n   <key>\n    
average-user-rating\n   </key>\n   <real>\n    0.8\n   </real>\n   <key>\n    
text\n   </key>\n   <string>\n    This application is very cool .. I
  hope only be added to the dictionary other languages 
\xe2\x80\x8b\xe2\x80\x8b..\n   </string>\n   <key>\n    title\n   </key>\n   
<string>\n    8. the dictionary (v1.5)\n   </string>\n   <key>\n    type\n   
</key>\n   <string>\n    review\n   </string>\n   <key>\n    user-name\n   
</key>\n   <string>\n    Rnaa on Dec 16, 2011\n   </string>\n  </dict>\n  
<dict>\n   <key>\n    average-user-rating\n   </key>\n   <real>\n    1\n   
</real>\n   <key>\n    text\n   </key>\n   <string>\n    Hey I&#39;m 13 trying 
to b discovered plz check my 1st video out on you tube its called speak now 
cover  by Bekka burton thnx and I luv luv luv this app\n   </string>\n   
<key>\n    title\n   </key>\n   <string>\n    9. Love this app+check me out on 
you tube (v1.5)\n   </string>\n   <key>\n    type\n   </key>\n   <string>\n    
review\n   </string>\n   <key>\n    user-name\n   </key>\n   <string>\n    
Lol\xee\x84\x86 on Dec 16, 2011\n   </string>\n  </dict>\n'

(Obviously, stripped down to not-well-formed XML, but for conversion purposes 
that's irrelevant.)

----------
components: Unicode
messages: 149659
nosy: ezio.melotti, silverbacknet
priority: normal
severity: normal
status: open
title: bytes.deocde() UnicodeEncodeError on Apple iOS characters
type: behavior
versions: Python 3.3

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13618>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to