Bengt Richter wrote:
At 11:43 2005-10-24 +0200, M.-A. Lemburg wrote:
Bengt Richter wrote:
Please bear with me for a few paragraphs ;-)
Please note that source code encoding doesn't really have
anything to do with the way the interpreter executes the
program - it's merely a way to tell
Fredrik Lundh wrote:
M.-A. Lemburg wrote:
I don't follow you here. The source code encoding
is only applied to Unicode literals (you are using string
literals in your example). String literals are passed
through as-is.
however, for Python 3000, it would be nice if the source-code
M.-A. Lemburg wrote:
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
I had to create three custom mapping files for cp1140, koi8-u
and tis-620.
Can you please publish the files you have used somewhere? They
best go into the Python CVS.
Sure; I'll check in the whole build machinery I'm
Josiah Carlson wrote:
Martin v. Löwis [EMAIL PROTECTED] wrote:
Fredrik Lundh wrote:
however, for Python 3000, it would be nice if the source-code encoding
applied
to the *entire* file (XML-style), rather than just unicode string literals
and (hope-
fully) comments and docstrings.
As MAL
Neil Hodgson wrote:
Guido van Rossum:
Folks, please focus on what Python 3000 should do.
I'm thinking about making all character strings Unicode (possibly with
different internal representations a la NSString in Apple's Objective
C) and introduce a separate mutable bytes array data type. But
Walter Dörwald wrote:
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
I've checked in a whole bunch of newly generated codecs
which now make use of the faster charmap decoding variant added
by Walter a short while ago.
Please let me know if you find any problems.
I think we should work
Bengt Richter wrote:
Please bear with me for a few paragraphs ;-)
Please note that source code encoding doesn't really have
anything to do with the way the interpreter executes the
program - it's merely a way to tell the parser how to
convert string literals (currently on the Unicode ones)
into
Walter Dörwald wrote:
Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put
a complete decoding_table into koi8_u.py?
KOI8-U is not available as mapping on ftp.unicode.org and
I only recreated codecs from the mapping files available
there.
OK, so we'd need something
M.-A. Lemburg wrote:
Walter Dörwald wrote:
Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put
a complete decoding_table into koi8_u.py?
KOI8-U is not available as mapping on ftp.unicode.org and
I only recreated codecs from the mapping files available
there.
OK, so
Walter Dörwald wrote:
I'd like to suggest a small cosmetic change: gencodec.py should output
byte values with two hexdigits instead of four. This makes it easier to
see what is a byte values and what is a codepoint. And it would make
grepping for stuff simpler.
True.
I'll rerun the creation with
Guido van Rossum wrote:
On 10/24/05, Phil Thompson [EMAIL PROTECTED] wrote:
I'm implementing a string-like object in an extension module and trying to
make it as interoperable with the standard string object as possible. To do
this I'm implementing the relevant slots and the buffer interface.
Neal Norwitz wrote:
Jeremy,
There are a bunch of mods from the AST branch that got integrated into
head. Hopefully, by doing this on python-dev more people will get
involved. I'll describe high level things first, but there will be a
ton of details later on. If people don't want to see
Walter Dörwald wrote:
We've already taken care of decoding. What we still need is a new
gencodec.py and regenerated codecs.
I'll take care of that; just haven't gotten around to it yet.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Oct 14 2005)
Hye-Shik Chang wrote:
On 10/6/05, M.-A. Lemburg [EMAIL PROTECTED] wrote:
Hye-Shik, could you please provide some timeit figures for
the fastmap encoding ?
Thanks for the timings.
(before applying Walter's patch, charmap decoder)
% ./python Lib/timeit.py -s s='a'*53*1024; e='iso8859_10
Martin v. Löwis wrote:
Another option would be to generate a big switch statement in C
and let the compiler decide about the best data structure.
I would try to avoid generating C code at all costs. Maintaining the
build processes will just be a nightmare.
We could automate this using
Hye-Shik Chang wrote:
On 10/5/05, M.-A. Lemburg [EMAIL PROTECTED] wrote:
Of course, a C version could use the same approach as
the unicodedatabase module: that of compressed lookup
tables...
http://aggregate.org/TechPub/lcpc2002.pdf
genccodec.py anyone ?
I had written a test
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
I would try to avoid generating C code at all costs. Maintaining the
build processes will just be a nightmare.
We could automate this using distutils; however I'm not sure
whether this would then also work on Windows.
It wouldn't.
Could
Martin v. Löwis wrote:
Walter Dörwald wrote:
OK, here's a patch that implements this enhancement to
PyUnicode_DecodeCharmap(): http://www.python.org/sf/1313939
Looks nice!
Indeed (except for the choice of the map this character
to undefined code point).
Hye-Shik, could you please provide
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
It wouldn't.
Could you elaborate why not ? Using distutils on Windows is really
easy...
The current build process for Windows simply doesn't provide it.
You expect to select Build/All from the menu (or some such),
and expect all code
Walter Dörwald wrote:
Am 04.10.2005 um 04:25 schrieb [EMAIL PROTECTED]:
As the OP suggests, decoding with a codec like mac-roman or
iso8859-1 is very
slow compared to encoding or decoding with utf-8. Here I'm working
with 53k of
data instead of 53 megs. (Note: this is a laptop, so it's
Reinhold Birkenfeld wrote:
Martin v. Löwis wrote:
Whether we think it should be supported depends
on who we is, as with all these minor features: some think it is
a waste of time, some think it should be supported if reasonably
possible, and some think this a conditio sine qua non. It certainly
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
Is the added complexity needed to support not having Unicode support
compiled into Python really worth it ?
If there are volunteers willing to maintain it, and the other volunteers
are not affected: certainly.
No objections there. I only see
Martin Blais wrote:
On 10/3/05, Antoine Pitrou [EMAIL PROTECTED] wrote:
If that's how things were designed, then Python's entire standard
brary (not to mention third-party libraries) is not unicode safe -
to quote your own words - since many functions may return 8-bit strings
containing
Bob Ippolito wrote:
/usr/bin/sw_vers technically calls a private (at least undocumented)
CoreFoundation API, it doesn't parse that plist directly :)
On further inspection, it looks like parsing the plist directly is
supported API these days (see the bottom of http://
Steven Bethard wrote:
On 9/29/05, Robey Pointer [EMAIL PROTECTED] wrote:
Yesterday I ran into a bug in the C API docs. The top of this page:
http://docs.python.org/api/unicodeObjects.html
says:
Py_UNICODE
This type represents a 16-bit unsigned storage type which is
used by Python
Bob Ippolito wrote:
On Sep 29, 2005, at 3:53 PM, M.-A. Lemburg wrote:
Perhaps a flag that fires up Python and runs platform.py
would help too.
python -mplatform
Cool :-)
Now we only need to add some more information to it (like e.g.
the Unicode variant).
--
Marc-Andre Lemburg
Fredrik Lundh wrote:
M.-A. Lemburg wrote:
* Unicode variant (UCS2, UCS4)
don't forget the Py_UNICODE is wchar_t subvariant.
True, but that's not relevant for binary compatibility of
Python package (at least not AFAIK).
UCS2 vs. UCS4 matters because the two versions use and expose
Ronald Oussoren wrote:
On 22-sep-2005, at 5:26, Guido van Rossum wrote:
The platform module has a way to map system names such as returned by
uname() to marketing names. It maps SunOS to Solaris, for example. But
it doesn't map Darwin to Mac OS X. I think I know how to map Darwin
version
Raymond Hettinger wrote:
[Guido]
Another observation: despite the derogatory remarks about regular
expressions, they have one thing going for them: they provide a higher
level of abstraction for string parsing, which this is all about.
(They are higher level in that you don't have to be
I must have missed this one:
Style for raising exceptions
Guido explained that these days exceptions should always be raised as::
raise SomeException(some argument)
instead of::
raise SomeException, some argument
Walter Dörwald wrote:
I wonder if we should switch back to a simple readline() implementation
for those codecs that don't require the current implementation
(basically every charmap codec).
That would be my preference as well. The 2.4 .readline() approach
is really only needed for codecs
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
I think it's worthwhile reconsidering this approach for
character type queries that do no involve a huge number
of code points.
I would advise against that. I measure both versions
(your version called PyUnicode_IsLinebreak2) with the
following
Thomas Heller wrote:
Neil Schemenauer [EMAIL PROTECTED] writes:
[Please mail followups to [EMAIL PROTECTED]
The PEP has been rewritten based on a suggestion by Guido to change
str() rather than adding a new built-in function. Based on my
testing, I believe the idea is feasible. It would be
James Y Knight wrote:
On Aug 17, 2005, at 2:55 PM, Timothy Fitz wrote:
On 8/16/05, Raymond Hettinger [EMAIL PROTECTED] wrote:
-0 The behavior of dir() already a bit magical. Python is much
simpler
to comprehend if we have direct relationships like dir() and vars()
corresponding as
Guido van Rossum wrote:
[Guido]
My first response to the PEP, however, is that instead of a new
built-in function, I'd rather relax the requirement that str() return
an 8-bit string -- after all, int() is allowed to return a long, so
why couldn't str() be allowed to return a Unicode string?
Michael Hudson wrote:
M.-A. Lemburg [EMAIL PROTECTED] writes:
Set the external encoding for stdin, stdout, stderr:
(also an example for adding encoding support to an
existing file object):
def set_sys_std_encoding(encoding):
# Load
Guido van Rossum wrote:
My first response to the PEP, however, is that instead of a new
built-in function, I'd rather relax the requirement that str() return
an 8-bit string -- after all, int() is allowed to return a long, so
why couldn't str() be allowed to return a Unicode string?
The
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
BTW, in one of your replies I read that you had a problem with
how cvs2svn handles trunk, branches and tags. In reality, this
is no problem at all, since Subversion is very good at handling
moves within the repository: you can easily change
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
I haven't received any offers to make a qualified statement. I only
know that I would oppose an approach to ask somebody but our
volunteers to do it for free, and I also know that I don't want to
spend my time researching commercial alternatives
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
True, but if we never ask, we'll never know :-)
My question was: Would asking a professional hosting company
be a reasonable approach ?
It would be an option, yes, of course. It's not an approach that
*I* would be willing to implement, though
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
The PSF does have a reasonable budget, so why not use it to
maintain the infrastructure needed for Python development and
let a company do the administration of the needed servers and
the importing of the CSV and tracker items
Martin v. Löwis wrote:
I'd like to see the Python source be stored in Subversion instead
of CVS,
+1
and on python.org instead of sf.net. To facilitate discussion,
I have drafted a PEP describing the rationale for doing so, and
the technical procedure to be performed.
Not sure about the
Martin v. Löwis wrote:
Guido van Rossum wrote:
Ah, sigh. I didn't know that os.listdir() behaves differently when the
argument is Unicode. Does os.listdir(.) really behave differently
than os.listdir(u.)? Bah! I don't think that's a very good design
(although I see where it comes from).
Reinhold Birkenfeld wrote:
Hi,
would anyone care to comment about this patch of mine --
https://sourceforge.net/tracker/?func=detailatid=305470aid=1214889group_id=5470
It makes file.encoding read-write and lets the write() and writelines()
methods
obey it.
Done. Please see SF.
PS:
Hi Neil,
With the proposed modification, sys.argv[1] u'\u20ac.txt' is
converted through cp1251
Actually, it is not: if you pass in a Unicode argument to
one of the file I/O functions and the OS supports Unicode
directly or at least provides the notion of a file system
encoding, then the file
Nick Coghlan wrote:
M.-A. Lemburg wrote:
May I suggest that you use a different name than context for
this ?!
The term context is way to broad for the application scopes
that you have in mind here (like e.g. managing a resource
in a multi-threaded application).
It's actually the broadness
Hi Neil,
2) Return unicode when the text can not be represented in ASCII. This
will cause a change of behaviour for existing code which deals with
non-ASCII data.
+1 on this one (s/ASCII/Python's default encoding).
I assume you mean the result of sys.getdefaultencoding() here.
Yes.
The
Bob Ippolito wrote:
A better proposal would probably be another string prefix that means
dedent, but I'm still not sold. doc processing software is clearly
going to have to know how to dedent anyway in order to support
existing code.
Agreed.
It is easy enough for any doc-string
Neil Hodgson wrote:
On unicode versions of Windows, for attributes like os.listdir,
os.getcwd, sys.argv, and os.environ, which can usefully return unicode
strings, there are 4 options I see:
1) Always return unicode. This is the option I'd be happiest to use,
myself, but expect this
Neil Hodgson wrote:
Thomas Heller:
But adding u'\u5b66\u6821\u30c7\u30fc' to sys.path won't allow to import
this file as module. Internally Python\import.c converts everything to
strings. I started to refactor import.c to work with PyStringObjects
instead of char buffers as a first step -
Neil Hodgson wrote:
M.-A. Lemburg:
I don't really buy this trick: what if you happen to have
a home directory with Unicode characters in it ?
Most people choose account names and thus home directory names that
are compatible with their preferred locale settings: German users
Nick Coghlan wrote:
OK, here's some draft documentation using Phillip's context
terminology. I think it works very well.
With Statements and Context Management
A frequent need in programming is to ensure a particular action is
taken after a specific section of code has been executed
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
I'm not breaking anything, I'm just correcting the
way things have to be configured in an effort to
bring back the cross-platforma configure default.
Your proposed change will break the build of Python
on Redhat/Fedora systems.
You know
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
I think we should remove the defaulting to whatever
TCL uses and instead warn the user about a possible
problem in case TCL is found and uses a Unicode
width which is incompatible with Python's choice.
-1.
Martin, please reconsider... the choice
Shane Hathaway wrote:
Martin v. Löwis wrote:
Shane Hathaway wrote:
I agree that UCS4 is needed. There is a balancing act here; UTF-16 is
widely used and takes less space, while UCS4 is easier to treat as an
array of characters. Maybe we can have both: unicode objects start with
an internal
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
Hmm, looking at the configure.in script, it seems you're right.
I wonder why this weird dependency on TCL was added.
If Python is configured for UCS-2, and Tcl for UCS-4, then
Tkinter would not work out of the box. Hence the weird dependency.
I
Nicholas Bastin wrote:
On May 7, 2005, at 9:29 AM, Martin v. Löwis wrote:
With --enable-unicode=ucs2, Python's Py_UNICODE does *not* start
supporting the full Unicode ccs the same way it supports UCS-2.
Individual surrogate values remain accessible, and supporting
non-BMP characters is left to
Nicholas Bastin wrote:
On May 7, 2005, at 5:09 PM, M.-A. Lemburg wrote:
However, I don't understand all the excitement
about Py_UNICODE: if you don't like the way this Python
typedef works, you are free to interface to Python using
any of the supported encodings using PyUnicode_Encode
Nicholas Bastin wrote:
On May 4, 2005, at 6:20 PM, Shane Hathaway wrote:
Nicholas Bastin wrote:
This type represents the storage type which is used by Python
internally as the basis for holding Unicode ordinals. Extension
module
developers should make no assumptions about the size of this
Fredrik Lundh wrote:
Thomas Heller wrote:
AFAIK, you can configure Python to use 16-bits or 32-bits Unicode chars,
independend from the size of wchar_t. The HAVE_USABLE_WCHAR_T macro
can be used by extension writers to determine if Py_UNICODE is the same as
wchar_t.
note that usable is
Nicholas Bastin wrote:
On May 4, 2005, at 6:03 PM, Martin v. Löwis wrote:
Nicholas Bastin wrote:
This type represents the storage type which is used by Python
internally as the basis for holding Unicode ordinals. Extension
module
developers should make no assumptions about the size of this
Nicholas Bastin wrote:
The documentation for Py_UNICODE states the following:
This type represents a 16-bit unsigned storage type which is used by
Python internally as basis for holding Unicode ordinals. On platforms
where wchar_t is available and also has 16-bits, Py_UNICODE is a
Shannon -jj Behrens wrote:
On 4/20/05, M.-A. Lemburg [EMAIL PROTECTED] wrote:
Fredrik Lundh wrote:
PS. a side effect of the for-in pattern is that I'm beginning to feel
that Python
might need a nice switch statement based on dictionary lookups, so I can
replace multiple callbacks with a single
Fredrik Lundh wrote:
PS. a side effect of the for-in pattern is that I'm beginning to feel
that Python
might need a nice switch statement based on dictionary lookups, so I can
replace multiple callbacks with a single loop body, without writing too
many
if/elif clauses.
PEP 275 anyone ?
Eyal Lotem wrote:
I would like to experiment with security based on Python references as
security capabilities.
Unfortunatly, there are several problems that make Python references
invalid as capabilities:
* There is no way to create secure proxies because there are no
private attributes.
* Lots
Nicholas Bastin wrote:
On Apr 7, 2005, at 5:07 AM, M.-A. Lemburg wrote:
The current implementation of the utf-16 codecs makes for some
irritating gymnastics to write the BOM into the file before reading it
if it contains no BOM, which seems quite like a bug in the codec.
The codec
Martin v. Löwis wrote:
Stephen J. Turnbull wrote:
So there is a standard for the UTF-8 signature, and I know of
applications which produce it. While I agree with you that Python's
codecs shouldn't produce it (by default), providing an option to strip
is a good idea.
I would personally
Stephen J. Turnbull wrote:
MAL == M [EMAIL PROTECTED] writes:
MAL The BOM (byte order mark) was a non-standard Microsoft
MAL invention to detect Unicode text data as such (MS always uses
MAL UTF-16-LE for Unicode text files).
The Japanese memopado (Notepad) uses UTF-8
Evan Jones wrote:
I recently rediscovered this strange behaviour in Python's Unicode
handling. I *think* it is a bug, but before I go and try to hack
together a patch, I figure I should run it by the experts here on
Python-Dev. If you understand Unicode, please let me know if there are
Martin v. Löwis wrote:
Skip Montanaro wrote:
I say backport. If people were trying to call os.access with unicode
filenames it would have been failing and they were either avoiding
unicode
filenames as a result or working around it some other way. I can't
see how
making os.access work with
Martin v. Löwis wrote:
M.-A. Lemburg wrote:
The question is whether it would encourage conditional work-arounds.
-1. That only makes the code more complicated.
You misunderstand. I'm not proposing that the work-around is added
to Python. I'm saying that Python *users* might introduce such
work
Raymond Hettinger wrote:
BTW I definitely expect having to defend removing
map/filter/reduce/lambda with a PEP; that's much more controversial
because it's *removing* something and hence by definition breaking
code.
+1 on the PEP
-1 on removing those tools - breaks too much code.
I suspect that
Guido van Rossum wrote:
Here's my take on the key issues brought up:
Alternative names anytrue(), alltrue(): before I posted to my blog I
played with these names (actually anyTrue(), allTrue(), anyFalse(),
allFalse()). But I realized (1) any() and all() read much better in
their natural context
Nick Coghlan wrote:
Guido van Rossum wrote:
No, the reason is that if we did this with exceptions, it would be
liable to mask errors; an exception does not necessarily originate
immediately with the code you invoked, it could have been raised by
something else that was invoked by that code. The
Neil Schemenauer wrote:
On Wed, Mar 09, 2005 at 11:10:59AM +0100, M.-A. Lemburg wrote:
The patch implements the PyObjbect_Text() idea (an API that
returns a basestring instance, ie. string or unicode) and
then uses this in '%s' (the string version) to properly propogate
to u'%s' (the unicode
Brett C. wrote:
Martin v. Löwis wrote:
Apparently, os.access was forgotten when the file system encoding
was introduced in Python 2.2, and then it was again forgotten in
PEP 277.
I've now fixed it in the trunk (posixmodule.c:2.334), and I wonder
whether this is a backport candidate. People who try
Raymond Hettinger wrote:
Based on some ideas from Skip, I had tried transforming the likes of x
in (1,2,3) into x in frozenset([1,2,3]). When applicable, it
substantially simplified the generated code and converted the O(n)
lookup into an O(1) step. There were substantial savings even if the
set
Walter Dörwald wrote:
Raymond Hettinger wrote:
The most recent test_codecs check-in (1.19) is failing on a MSCV6.0
compilation running on WinMe:
--
Ran 35 tests in 1.430s
FAILED (failures=1)
Traceback (most recent call last):
Walter Dörwald wrote:
M.-A. Lemburg wrote:
[...]
__str__ and __unicode__ as well as the other hooks were
specifically added for the type constructors to use.
However, these were added at a time where sub-classing
of types was not possible, so it's time now to reconsider
whether
Guido van Rossum wrote:
[me]
I'm not sure I understand how basemethod is supposed to work; I can't
find docs for it using Google (only three hits for the query mxTools
basemethod). How does it depend on im_class?
[Marc-Andre]
It uses im_class to find the class defining the (unbound) method:
def
Walter Dörwald wrote:
M.-A. Lemburg wrote:
So the question is whether conversion of a Unicode sub-type
to a true Unicode object should honor __unicode__ or not.
The same question can be asked for many other types, e.g.
floats (and __float__), integers (and __int__), etc.
class float2(float
Guido van Rossum wrote:
[Guido]
Apart from the tests that were testing the behavior of im_class, I
found only a single piece of code in the standard library that used
im_class of an unbound method object (the clever test in the pyclbr
test). Uses of im_self and im_func were more widespread. Given
Nick Coghlan wrote:
Guido van Rossum wrote:
What do people think? (My main motivation for this, as stated before,
is that it adds complexity without much benefit.)
I'm in favour, since it removes the an unbound method is almost like a
bare function, only not quite as useful distinction. It would
Guido van Rossum wrote:
Apart from the tests that were testing the behavior of im_class, I
found only a single piece of code in the standard library that used
im_class of an unbound method object (the clever test in the pyclbr
test). Uses of im_self and im_func were more widespread. Given the
Martin v. Löwis wrote:
Andrew McNamara wrote:
There's a bunch of jobs we (CSV module maintainers) have been putting
off - attached is a list (in no particular order):
* unicode support (this will probably uglify the code considerably).
Can you please elaborate on that? What needs to be done, and
using pre-processor macros). Quite
a large job. Suggestions gratefully received.
M.-A. Lemburg wrote:
Indeed. The trick is to convert to Unicode early and to use Unicode
literals instead of string literals in the code.
Yes, although it would be nice to also retain the 8-bit versions as well.
You can
Andrew McNamara wrote:
Yes, although it would be nice to also retain the 8-bit versions as well.
You can do so by using latin-1 as default encoding. Works great !
Yep, although that means we wear the cost of decoding and encoding for
all 8 bit input.
Right, but it makes the code very clean and
Martin v. Lwis wrote:
As for PEP 4: I don't know whether it needs to be listed there. It
appears that the PEP is largely unmaintained (I, personally, do not
really maintain it). So one option would be to just stop using PEP 4
for recording deprecations, since we now have the warnings module.
If we
I would like to remove the support for using libc wctype functions
(e.g. towupper(), towlower(), etc.) from the code base.
The reason is that compiling Python using this switch not only
breaks the test suite, it also causes the functions .lower() and
.upper() to become locale aware and creates
901 - 989 of 989 matches
Mail list logo