On Tue, Jan 1, 2013 at 1:29 AM, Steven D'Aprano st...@pearwood.info wrote:
2 Since wide builds use so much extra memory for the average ASCII
string, hardly anyone uses them.
On Windows (and I think OS X, too) a narrow build has been practical
since the wchar_t type is 16-bit. As to Linux
On 03/01/13 23:52, eryksun wrote:
On Tue, Jan 1, 2013 at 1:29 AM, Steven D'Apranost...@pearwood.info wrote:
2 Since wide builds use so much extra memory for the average ASCII
string, hardly anyone uses them.
On Windows (and I think OS X, too) a narrow build has been practical
since the
I'm digging out an old email which I saved as a draft almost a month ago
but never got around to sending, because I think the new Unicode
implementation in Python 3.3 is one of the coolest things ever.
On 03/12/12 16:56, eryksun wrote:
CPython 3.3 has a new implementation that angles for the
How can I pack a unicode string using the struct module? If I simply use
packed = struct.pack(fmt, hello) in the code below (and 'hello' is a
unicode string), I get this: error: argument for 's' must be a string. I
keep reading that I have to encode it to a utf-8 bytestring, but this does
snip
* some encodings are more compact than others (e.g. Latin-1 uses
one byte per character, while UTF-32 uses four bytes per
character).
I read that performance of UTF32 is better (UTF-32 advantage: you don't need
to decode
stored data to the 32-bit Unicode
code point for e.g.
snip
to make is that the transform formats are multibyte encodings (except
ASCII in UTF-8), which means the expression str(len(hello)) is using
the wrong length; it needs to use the length of the encoded string.
Also, UTF-16 and UTF-32 typically have very many null bytes. Together,
On 12/02/2012 08:34 AM, Albert-Jan Roskam wrote:
snip
Hi Eryksun,
Observation #1: Yes, makes perfect sense. I should have thought about that.
Observation #2:
As I emailed earlier today to Peter Otten, I thought unicode_internal means
UCS-2 or UCS-4,
depending on the size of
On Sun, Dec 2, 2012 at 8:34 AM, Albert-Jan Roskam fo...@yahoo.com wrote:
As I emailed earlier today to Peter Otten, I thought unicode_internal means
UCS-2 or UCS-4, depending on the size of sys.maxunicode? How is this related
to UTF-16 and UTF-32?
UCS is the universal character set. Some
On Sat, Dec 1, 2012 at 2:30 AM, Steven D'Aprano st...@pearwood.info wrote:
The length and order of the optional byte order mark (BOM)
distinguishes UTF-16LE, UTF-16BE, UTF-32LE, and UTF-32BE.
That's not quite right. The UTF-16BE and UTF-16LE character sets do
not take BOMs, because the
Hi,
How can I pack a unicode string using the struct module? If I simply use packed
= struct.pack(fmt, hello) in the code below (and 'hello' is a unicode string),
I get this: error: argument for 's' must be a string. I keep reading that I
have to encode it to a utf-8 bytestring, but this does
Albert-Jan Roskam wrote:
How can I pack a unicode string using the struct module? If I simply use
packed = struct.pack(fmt, hello) in the code below (and 'hello' is a
unicode string), I get this: error: argument for 's' must be a string. I
keep reading that I have to encode it to a utf-8
On 01/12/12 03:43, Albert-Jan Roskam wrote:
Hi,
How can I pack a unicode string using the struct module? If I
simply use packed = struct.pack(fmt, hello) in the code below
(and 'hello' is a unicode string), I get this:
error: argument for 's' must be a string.
To be precise, it must be a
On Fri, Nov 30, 2012 at 11:43 AM, Albert-Jan Roskam fo...@yahoo.com wrote:
How can I pack a unicode string using the struct module?
struct.pack is for packing an arbitrary sequence of data into a C-like
struct. You have to manually add pad bytes. Alternatively you can use
a ctypes.Structure.
A clarification: in the default mode ('@'), struct uses native
alignment padding, but not if you override this with , , =, or !, as
you did.
fmt = endianness + str(len(hello)) + s
That's the wrong length. Use the length of the encoded string.
Generally, however, you'd use a fixed size set by
On 01/12/12 12:28, eryksun wrote:
UTF-8 was
designed to encode all of Unicode in a way that can seamlessly pass
through libraries that process C strings (i.e. an array of non-null
bytes terminated by a null byte). Byte values less than 128 are ASCII;
beyond ASCII, UTF-8 uses 2-4 bytes, and all
15 matches
Mail list logo