Re: [Tutor] how to struct.pack a unicode string?

2013-01-03 Thread eryksun
On Tue, Jan 1, 2013 at 1:29 AM, Steven D'Aprano st...@pearwood.info wrote: 2 Since wide builds use so much extra memory for the average ASCII string, hardly anyone uses them. On Windows (and I think OS X, too) a narrow build has been practical since the wchar_t type is 16-bit. As to Linux

Re: [Tutor] how to struct.pack a unicode string?

2013-01-03 Thread Steven D'Aprano
On 03/01/13 23:52, eryksun wrote: On Tue, Jan 1, 2013 at 1:29 AM, Steven D'Apranost...@pearwood.info wrote: 2 Since wide builds use so much extra memory for the average ASCII string, hardly anyone uses them. On Windows (and I think OS X, too) a narrow build has been practical since the

Re: [Tutor] how to struct.pack a unicode string?

2012-12-31 Thread Steven D'Aprano
I'm digging out an old email which I saved as a draft almost a month ago but never got around to sending, because I think the new Unicode implementation in Python 3.3 is one of the coolest things ever. On 03/12/12 16:56, eryksun wrote: CPython 3.3 has a new implementation that angles for the

Re: [Tutor] how to struct.pack a unicode string?

2012-12-02 Thread Albert-Jan Roskam
How can I pack a unicode string using the struct module? If I simply use packed = struct.pack(fmt, hello) in the code below (and 'hello' is a unicode string), I get this: error: argument for 's' must be a string. I keep reading that I have to encode it to a utf-8 bytestring, but this does

Re: [Tutor] how to struct.pack a unicode string?

2012-12-02 Thread Albert-Jan Roskam
snip * some encodings are more compact than others (e.g. Latin-1 uses   one byte per character, while UTF-32 uses four bytes per   character). I read that performance of UTF32 is better (UTF-32 advantage: you don't need to decode stored data to the 32-bit Unicode code point for e.g.

Re: [Tutor] how to struct.pack a unicode string?

2012-12-02 Thread Albert-Jan Roskam
  snip to make is that the transform formats are multibyte encodings (except ASCII in UTF-8), which means the expression str(len(hello)) is using the wrong length; it needs to use the length of the encoded string. Also, UTF-16 and UTF-32 typically have very many null bytes. Together,

Re: [Tutor] how to struct.pack a unicode string?

2012-12-02 Thread Dave Angel
On 12/02/2012 08:34 AM, Albert-Jan Roskam wrote: snip Hi Eryksun, Observation #1: Yes, makes perfect sense. I should have thought about that. Observation #2: As I emailed earlier today to Peter Otten, I thought unicode_internal means UCS-2 or UCS-4, depending on the size of

Re: [Tutor] how to struct.pack a unicode string?

2012-12-02 Thread eryksun
On Sun, Dec 2, 2012 at 8:34 AM, Albert-Jan Roskam fo...@yahoo.com wrote: As I emailed earlier today to Peter Otten, I thought unicode_internal means UCS-2 or UCS-4, depending on the size of sys.maxunicode? How is this related to UTF-16 and UTF-32? UCS is the universal character set. Some

Re: [Tutor] how to struct.pack a unicode string?

2012-12-01 Thread eryksun
On Sat, Dec 1, 2012 at 2:30 AM, Steven D'Aprano st...@pearwood.info wrote: The length and order of the optional byte order mark (BOM) distinguishes UTF-16LE, UTF-16BE, UTF-32LE, and UTF-32BE. That's not quite right. The UTF-16BE and UTF-16LE character sets do not take BOMs, because the

[Tutor] how to struct.pack a unicode string?

2012-11-30 Thread Albert-Jan Roskam
Hi, How can I pack a unicode string using the struct module? If I simply use packed = struct.pack(fmt, hello) in the code below (and 'hello' is a unicode string), I get this: error: argument for 's' must be a string. I keep reading that I have to encode it to a utf-8 bytestring, but this does

Re: [Tutor] how to struct.pack a unicode string?

2012-11-30 Thread Peter Otten
Albert-Jan Roskam wrote: How can I pack a unicode string using the struct module? If I simply use packed = struct.pack(fmt, hello) in the code below (and 'hello' is a unicode string), I get this: error: argument for 's' must be a string. I keep reading that I have to encode it to a utf-8

Re: [Tutor] how to struct.pack a unicode string?

2012-11-30 Thread Steven D'Aprano
On 01/12/12 03:43, Albert-Jan Roskam wrote: Hi, How can I pack a unicode string using the struct module? If I simply use packed = struct.pack(fmt, hello) in the code below (and 'hello' is a unicode string), I get this: error: argument for 's' must be a string. To be precise, it must be a

Re: [Tutor] how to struct.pack a unicode string?

2012-11-30 Thread eryksun
On Fri, Nov 30, 2012 at 11:43 AM, Albert-Jan Roskam fo...@yahoo.com wrote: How can I pack a unicode string using the struct module? struct.pack is for packing an arbitrary sequence of data into a C-like struct. You have to manually add pad bytes. Alternatively you can use a ctypes.Structure.

Re: [Tutor] how to struct.pack a unicode string?

2012-11-30 Thread eryksun
A clarification: in the default mode ('@'), struct uses native alignment padding, but not if you override this with , , =, or !, as you did. fmt = endianness + str(len(hello)) + s That's the wrong length. Use the length of the encoded string. Generally, however, you'd use a fixed size set by

Re: [Tutor] how to struct.pack a unicode string?

2012-11-30 Thread Steven D'Aprano
On 01/12/12 12:28, eryksun wrote: UTF-8 was designed to encode all of Unicode in a way that can seamlessly pass through libraries that process C strings (i.e. an array of non-null bytes terminated by a null byte). Byte values less than 128 are ASCII; beyond ASCII, UTF-8 uses 2-4 bytes, and all