[Tutor] Re: How to read unicode strings from a binary file and display them as plain ascii?

Javier Ruere Wed, 02 Mar 2005 20:41:43 -0800

R. Alan Monroe wrote:

R. Alan Monroe wrote:

I started writing a program to parse the headers of truetype fonts to
examine their family info. But I can't manage to print out the strings
without the zero bytes in between each character (they display as a
black block labeled 'NUL' in Scite's output pane)

I tried:
    stuff = f.read(nlength)
    stuff = unicode(stuff, 'utf-8')

If there are embeded 0's in the string, it won't be utf8, it could be utf16 or 32. Try: unicode(stuff, 'utf-16') or stuff.decode('utf-16')

    print type(stuff), 'stuff', stuff.encode()
This prints:
<type 'unicode'> stuff [NUL]C[NUL]o[NUL]p[NUL]y[NUL]r[NUL]i[NUL]g[NUL]

I don't understand what you tried to accomplish here.

That's evidence of what I failed to accomplish. My expected results
was to print the word "Copyright" and whatever other strings are
present in the font, with no intervening NUL characters.


  Oh but why print type(stuff) or 'stuff'?

Aha, after some trial and error I see that I'm running into an endian
problem. It's "\x00C" in the file, which needs to be swapped to
"C\x00". I cheated temporarily by just adding 1 to the file pointer
:^)

Ah! Endianness! I completely overlook this issue! I have lost several hours of my life to endian problems. Glad to see (on another post) there is an encoding which handles explicitly the endianness or the encoded string.

Javier

_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Re: How to read unicode strings from a binary file and display them as plain ascii?

Reply via email to