Terry Carroll wrote:
> I'm just saying that UTF-8 encodes ascii characters to themselves; but
> UTF-8 is not the same as ascii.
>
> I think we're ultimately saying the same thing; to merge both our ways of
> putting it, I think, is that ascii will map to UTF-8 identically; but
> UTF-8 may map bac
On Wed, 4 Jul 2007, Kent Johnson wrote:
> Terry Carroll wrote:
> > Now, superficially, s and e8 are equal, because for plain old ascii
> > characters (which is all I've used in this example), UTF-8 is equivalent
> > to ascii. And they compare the same:
> >
> s == e8
> > True
>
> They are
William O'Higgins Witteman wrote:
> On Wed, Jul 04, 2007 at 02:47:45PM -0400, Kent Johnson wrote:
>
>> encode() really wants a unicode string not a byte string. If you call
>> encode() on a byte string, the string is first converted to unicode
>> using the default encoding (usually ascii), then
On Wed, Jul 04, 2007 at 02:47:45PM -0400, Kent Johnson wrote:
>encode() really wants a unicode string not a byte string. If you call
>encode() on a byte string, the string is first converted to unicode
>using the default encoding (usually ascii), then converted with the
>given encoding.
Aha!
Terry Carroll wrote:
> I'm pretty iffy on this stuff myself, but as I see it, you basically have
> three kinds of things here.
>
> First, an ascii string:
>
> s = 'abc'
>
> In hex, this is 616263; 61 for 'a'; 62 for 'b', 63 for 'c'.
>
> Second, a unicode string:
>
> u = u'abc'
>
> I can
William O'Higgins Witteman wrote:
> The problem is that the Windows filesystem uses UTF-8 as the encoding
> for filenames,
That's not what I get. For example, I made a file called "Tést.txt" and
looked at what os.listdir() gives me. (os.listdir() is what os.walk()
uses to get the file and direc
On Wed, 4 Jul 2007, William O'Higgins Witteman wrote:
> >It is nonsense to talk about 'recasting' an ascii string as UTF-8; an
> >ascii string is *already* UTF-8 because the representation of the
> >characters is identical. OTOH it makes sense to talk about converting an
> >ascii string to a un
On Wed, 2007-07-04 at 12:00 -0400, William O'Higgins Witteman wrote:
> On Wed, Jul 04, 2007 at 11:28:53AM -0400, Kent Johnson wrote:
>
> >FWIW, I'm pretty sure you are confusing Unicode strings and UTF-8
> >strings, they are not the same thing. A Unicode string uses 16 bits to
> >represent each ch
On Wed, Jul 04, 2007 at 11:28:53AM -0400, Kent Johnson wrote:
>FWIW, I'm pretty sure you are confusing Unicode strings and UTF-8
>strings, they are not the same thing. A Unicode string uses 16 bits to
>represent each character. It is a distinct data type from a 'regular'
>string. Regular Python st
William O'Higgins Witteman wrote:
>> for thing in os.walk(u'.'):
>>
>> instead of:
>>
>> for thing in os.walk('.'):
>
> This is a good thought, and the crux of the problem. I pull the
> starting directories from an XML file which is UTF-8, but by the time it
> hits my program, because there are
On Tue, Jul 03, 2007 at 06:04:16PM -0700, Terry Carroll wrote:
>
>> Has anyone found a silver bullet for ensuring that all the filenames
>> encountered by os.walk are treated as UTF-8? Thanks.
>
>What happens if you specify the starting directory as a Unicode string,
>rather than an ascii string,
On Tue, 3 Jul 2007, William O'Higgins Witteman wrote:
> Has anyone found a silver bullet for ensuring that all the filenames
> encountered by os.walk are treated as UTF-8? Thanks.
What happens if you specify the starting directory as a Unicode string,
rather than an ascii string, e.g., if you'r
"Kent Johnson" <[EMAIL PROTECTED]> wrote
>> I suspect you need to set the Locale at the top of your file.
>
> Do you mean the
> # -*- coding: -*-
> comment? That only affects the encoding of the source file itself.
No, I meant the Locale but I got it mixed up with the encoding
in how it is set
William O'Higgins Witteman wrote:
> I have several programs which traverse a Windows filesystem with French
> characters in the filenames.
>
> I have having trouble dealing with these filenames when outputting these
> paths to an XML file - I get UnicodeDecodeError: 'ascii' codec can't
> decode by
Alan Gauld wrote:
> "William O'Higgins Witteman" <[EMAIL PROTECTED]> wrote
>
>> I have several programs which traverse a Windows filesystem with
>> French
>> characters in the filenames.
>
> I suspect you need to set the Locale at the top of your file.
Do you mean the
# -*- coding: -*-
comment
"William O'Higgins Witteman" <[EMAIL PROTECTED]> wrote
>I have several programs which traverse a Windows filesystem with
>French
> characters in the filenames.
I suspect you need to set the Locale at the top of your file.
Do a search for locale in this lists archive where we had a
thread on th
I have several programs which traverse a Windows filesystem with French
characters in the filenames.
I have having trouble dealing with these filenames when outputting these
paths to an XML file - I get UnicodeDecodeError: 'ascii' codec can't
decode byte 0xe9 ... etc. That happens when I try to c
17 matches
Mail list logo