On Thu, Jan 9, 2014 at 5:50 PM, Lennart Regebro rege...@gmail.com wrote:
To be honest, you can define text as A stream of bytes that are split
up in lines separated by a linefeed, and do some basic text
processing like that. Just very *basic*, but still. Replacing
characters. Extracting
On Thu, Jan 9, 2014 at 8:16 AM, Ben Finney ben+pyt...@benfinney.id.au wrote:
Nick Coghlan ncogh...@gmail.com writes:
Set the mode to rb, process it as binary. Done.
Which entails abandoning the stated goal of “just want to parse text
files” :-)
Only if your definition of text files means
On 09/01/14 00:07, Ben Finney wrote:
Kristján Valur Jónsson krist...@ccpgames.com writes:
Believe it or not, sometimes you really don't care about encodings.
Sometimes you just want to parse text files.
Files don't contain text, they contain bytes. Bytes only become text
when filtered
-Original Message-
From: Python-Dev [mailto:python-dev-
bounces+kristjan=ccpgames@python.org] On Behalf Of Ben Finney
Sent: 9. janúar 2014 00:50
To: python-dev@python.org
Subject: Re: [Python-Dev] Python3 complexity
Kristján Valur Jónsson krist...@ccpgames.com writes:
I
On 9 January 2014 09:01, Mark Shannon m...@hotpy.org wrote:
On 09/01/14 00:07, Ben Finney wrote:
Kristján Valur Jónsson krist...@ccpgames.com writes:
Believe it or not, sometimes you really don't care about encodings.
Sometimes you just want to parse text files.
Files don't contain text,
just became harder to use for that purpose.
The entire discussion reminds me very much of the situation with file
names in OS X. Whenever I want to look at an old zip file or tarball
which happens to have been lying around on my hard drive for a decade
or more, I can't because OS X insist that
Am 08.01.14 16:03, schrieb Nick Coghlan:
On 9 January 2014 00:43, Bob Hanson d2mp...@newsguy.com wrote:
When I read this comment of yours, Guido, I immediately started
wondering about this. You may well be right -- indeed, I have a
very old install (c.2007) which has not been updated (other
Paul Moore writes:
So I think that if this discussion is to be of any real benefit, a
specific example is needed. I honestly don't think I've ever
encountered a case where Sometimes [I] just want to parse text
files and code that uses the default encoding (i.e., looks pretty
much
-Original Message-
From: Python-Dev [mailto:python-dev-
bounces+kristjan=ccpgames@python.org] On Behalf Of Stefan Ring
Sent: 9. janúar 2014 09:32
To: python-dev@python.org
Subject: Re: [Python-Dev] Python3 complexity
just became harder to use for that purpose.
The entire
On 9 Jan 2014 11:29, INADA Naoki songofaca...@gmail.com wrote:
And I think everyone was well intentioned - and python3 covers most of
the
bases, but working with binary data is not only a wire-protocol
programmer's
problem.
If you're working with binary data, use the binary API offered by
Am 06.01.14 17:26, schrieb Michael Urman:
Here's some more guesswork. Does it seem possible that msiexec is
trying to verify the revocation status of the certificate used to sign
the python .msi file? Per
http://blogs.technet.com/b/pki/archive/2006/11/30/basic-crl-checking-with-certutil.aspx
On 9 January 2014 10:15, Kristján Valur Jónsson krist...@ccpgames.com wrote:
Also, the problem I'm describing has to do with real world stuff.
This is the python 2 program:
with open(fn1) as f1:
with open(fn2, 'w') as f2:
f2.write(process_text(f1.read())
Moving to python 3, I
07.01.14 22:51, Ethan Furman написав(ла):
On 01/07/2014 12:39 PM, Serhiy Storchaka wrote:
* It clutters up hg log and hg blame results. Every time when you
change clinic.py to generate different output, it
touches multiple lines in all files which use Argument Clinic and
clutters up their
On Thu, Jan 09, 2014 at 05:11:06PM +1000, Nick Coghlan wrote:
On 9 January 2014 10:07, Ben Finney ben+pyt...@benfinney.id.au wrote:
So, if what you want is to parse text and not get gibberish, you need to
*tell* Python what the encoding is. That's a brute fact of the world of
text in
On Thu, 9 Jan 2014 10:15:08 +
Kristján Valur Jónsson krist...@ccpgames.com wrote:
Moving to python 3, I found that this quickly caused problems. So, I
explicitly added an encoding. Better guess an encoding, something that is
likely, e.g. cp1252
with open(fn1, encoding='cp1252') as
On Thu, 09 Jan 2014 03:54:13 +
MRAB pyt...@mrabarnett.plus.com wrote:
I'm thinking that the i format could be used for signed integers and
the u for unsigned integers. The width would be the number of bytes.
You would also need to have a way of specifying the endianness.
For example:
On Thu, 9 Jan 2014 17:09:10 +1000
Nick Coghlan ncogh...@gmail.com wrote:
There's also the fact that POSIX folks are used to r and rb being
the same thing.
Which fails immediately under Windows :-)
Regards
Antoine.
___
Python-Dev mailing list
-Original Message-
From: Paul Moore [mailto:p.f.mo...@gmail.com]
Sent: 9. janúar 2014 10:53
To: Kristján Valur Jónsson
Cc: Stefan Ring; python-dev@python.org
Moving to python 3, I found that this quickly caused problems.
You don't say what problems, but I assume
Right. But even latin-1, or better, cp1252 (on windows) does not solve it
because these have undefined
code points.
That's not true. latin-1 does not have undefined code points.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
On Thu, 9 Jan 2014 12:55:35 +
Kristján Valur Jónsson krist...@ccpgames.com wrote:
If you don't care about the encoding, why don't you use latin1?
Things will roundtrip fine and work as well as under Python 2.
Because latin1 does not define all code points, giving you errors there.
b =
On 9 January 2014 13:00, Kristján Valur Jónsson krist...@ccpgames.com wrote:
You don't say what problems, but I assume encoding/decoding errors. So the
files apparently weren't in the system encoding. OK, at that point I'd
probably say to heck with it and use latin-1. Assuming I was sure that
-Original Message-
From: Python-Dev [mailto:python-dev-
bounces+kristjan=ccpgames@python.org] On Behalf Of Antoine Pitrou
Sent: 9. janúar 2014 13:18
To: python-dev@python.org
Subject: Re: [Python-Dev] Python3 complexity
On Thu, 9 Jan 2014 12:55:35 +
Kristján Valur
2014/1/9 Kristján Valur Jónsson krist...@ccpgames.com:
This definition is funny, because according to Wikipedia, it is a superset
of 8869-1 ( latin1)
Bytes 0x80..0x9f are unassigned in ISO/CEI 8859-1... but are assigned
in (IANA's) ISO-8859-1.
Python implements the latter, ISO-8859-1.
-Original Message-
From: Python-Dev [mailto:python-dev-
bounces+kristjan=ccpgames@python.org] On Behalf Of Kristján Valur
Jónsson
Sent: 9. janúar 2014 13:37
To: Antoine Pitrou; python-dev@python.org
Subject: Re: [Python-Dev] Python3 complexity
This definition is funny,
So the customer you're looking for is the person who cares a lot about
encodings, knows how to do Unicode correctly, and has noticed that
certain valid cases not limited to imperialist simpletons (dealing
with specific common things invented before 1996, dealing with mixed
encodings, doing what
On Thu, 9 Jan 2014 09:03:40 -0500
Daniel Holth dho...@gmail.com wrote:
They emphatically do not want the Python 2
model especially not implicit coercion. They only want additional
tools for text or string processing in Python 3.
That's a good point. Now it's up to people who need those
On Thu, Jan 09, 2014 at 01:00:59PM +, Kristján Valur Jónsson wrote:
Which reminds me, can Python3 read text files with BOM automatically yet?
I'm not sure what you mean by that. If you mean, can Python3 distinguish
between UTF-16BE and UTF-16LE on the basis of a BOM, then it's been able
-Original Message-
From: Victor Stinner [mailto:victor.stin...@gmail.com]
Sent: 9. janúar 2014 13:51
To: Kristján Valur Jónsson
Cc: Antoine Pitrou; python-dev@python.org
Subject: Re: [Python-Dev] Python3 complexity
2014/1/9 Kristján Valur Jónsson krist...@ccpgames.com:
This
On 01/09/2014 03:39 AM, Serhiy Storchaka wrote:
07.01.14 22:51, Ethan Furman написав(ла):
AFAIK you don't write much C code. So perhaps C sources maintainability is not
too valuable for you.
I don't write much C code yet, no, but C source maintainability is even more important to me because
Steven D'Aprano writes:
If it were, we wouldn't need text strings :-)
Speak for yourself, Kemosabe. Red man need Unicode, full meal not
just a few bytes.
___
Python-Dev mailing list
Python-Dev@python.org
(Resending with an adjusted Subject and not through Gmane. Apologies for
duplicates.)
On Jan 08, 2014, at 01:51 PM, Stephen J. Turnbull wrote:
Benjamin Peterson writes:
I agree. This is a very important, much-requested feature for low-level
networking code.
I hear it's much-requested, but
On Jan 08, 2014, at 01:51 PM, Stephen J. Turnbull wrote:
Benjamin Peterson writes:
I agree. This is a very important, much-requested feature for low-level
networking code.
I hear it's much-requested, but is there any description of typical
use cases?
The two unported libraries that are
On 9 Jan 2014 22:08, Antoine Pitrou solip...@pitrou.net wrote:
On Thu, 9 Jan 2014 09:03:40 -0500
Daniel Holth dho...@gmail.com wrote:
They emphatically do not want the Python 2
model especially not implicit coercion. They only want additional
tools for text or string processing in Python
On 9 Jan 2014 22:25, Kristján Valur Jónsson krist...@ccpgames.com wrote:
-Original Message-
From: Victor Stinner [mailto:victor.stin...@gmail.com]
Sent: 9. janúar 2014 13:51
To: Kristján Valur Jónsson
Cc: Antoine Pitrou; python-dev@python.org
Subject: Re: [Python-Dev]
On 9 Jan 2014 06:43, Antoine Pitrou solip...@pitrou.net wrote:
Hi,
With Victor's consent, I overhauled PEP 460 and made the feature set
more restricted and consistent with the bytes/str separation.
+1
I was initially dubious about the idea, but the proposed semantics look
good to me.
We
On Fri, 10 Jan 2014 05:26:04 +1000
Nick Coghlan ncogh...@gmail.com wrote:
We should probably include format_map for consistency with the str API.
Yes, you're right.
However, I
also added bytearray into the mix, as bytearray objects should
generally support the same operations as bytes
Thanks Nick. This does seem to cover it all. Perhaps it is worth mentioning
cp1252 as the windows version of latin1, which _does_not_ cover all code points
and hence requires surrogateescapes for best effort solution.
K
From: Nick Coghlan
This has all gotten a bit complicated because everyone has been thinking in
terms of actual encodings and actual text files. But I think the use-case
here is something different:
A file with a bunch of bytes in it, _some_of which are ascii, and the rest
are other bytes (maybe binary data, maybe
I'm not sure how format_map helps in porting from 2 to 3, since it
doesn't exist in any version of 2.
Although that said, it's no doubt a useful feature, just not useful in
code that supports both 2 and 3 with a single code base or when porting
to 3.
Eric.
On 1/9/2014 4:02 PM, antoine.pitrou
On Thu, 9 Jan 2014 13:36:05 -0800
Chris Barker chris.bar...@noaa.gov wrote:
Some folks have suggested using latin-1 (or other 8-bit encoding) -- is
that guaranteed to work with any binary data, and round-trip accurately?
Yes, it is.
and will surrogateescape work for arbitrary binary data?
On Thu, Jan 9, 2014 at 1:45 PM, Antoine Pitrou solip...@pitrou.net wrote:
latin-1 guaranteed to work with any binary data, and round-trip
accurately?
Yes, it is.
and will surrogateescape work for arbitrary binary data?
Yes, it will.
Then maybe this is really a documentation issue,
On Thu, Jan 9, 2014 at 5:00 PM, Chris Barker chris.bar...@noaa.gov wrote:
On Thu, Jan 9, 2014 at 1:45 PM, Antoine Pitrou solip...@pitrou.netwrote:
latin-1 guaranteed to work with any binary data, and round-trip
accurately?
Yes, it is.
and will surrogateescape work for arbitrary binary
On 9 January 2014 22:00, Chris Barker chris.bar...@noaa.gov wrote:
On Thu, Jan 9, 2014 at 1:45 PM, Antoine Pitrou solip...@pitrou.net wrote:
latin-1 guaranteed to work with any binary data, and round-trip
accurately?
Yes, it is.
and will surrogateescape work for arbitrary binary data?
On 01/09/2014 02:00 PM, Chris Barker wrote:
On Thu, Jan 9, 2014 at 1:45 PM, Antoine Pitrou wrote:
Chris Barker wrote:
latin-1 guaranteed to work with any binary data, and round-trip accurately?
Yes, it is.
and will surrogateescape work for arbitrary binary data?
Yes, it will.
Then
On 9 January 2014 22:08, Ethan Furman et...@stoneleaf.us wrote:
For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80'
If that were decoded using latin1 how would I then get the first two bytes
to the integer 256 and the last six bytes to their Cyrillic meaning?
(Apologies for not testing myself,
On 01/09/2014 02:54 PM, Paul Moore wrote:
On 9 January 2014 22:08, Ethan Furman wrote:
For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80'
If that were decoded using latin1 how would I then get the first two bytes
to the integer 256 and the last six bytes to their Cyrillic meaning?
(Apologies
On Thu, Jan 9, 2014 at 2:54 PM, Paul Moore
For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80'
If that were decoded using latin1 how would I then get the first two
bytes
to the integer 256 and the last six bytes to their Cyrillic meaning?
(Apologies for not testing myself, short on time.)
On 01/09/2014 02:54 PM, Paul Moore wrote:
On 9 January 2014 22:08, Ethan Furman et...@stoneleaf.us wrote:
For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80'
If that were decoded using latin1 how would I then get the first two bytes
to the integer 256 and the last six bytes to their Cyrillic
On Thu, Jan 9, 2014 at 3:14 PM, Ethan Furman et...@stoneleaf.us wrote:
Sorry, I was too short with my example. My use case is binary files, with
ASCII metadata and binary metadata, as well as ASCII-encoded numeric
values, binary-coded numeric values, ASCII-encoded boolean values, and
latin1 is OK but is it Pythonic?
I've posted suggestion about add 'bytes' as a alias for 'latin1'.
http://comments.gmane.org/gmane.comp.python.ideas/10315
I want one Pythonic way to handle binary containing ascii (or latin1 or
utf-8 or other ascii compatible).
On Fri, Jan 10, 2014 at 8:53 AM,
On Thu, Jan 9, 2014 at 10:00 AM, Mark Lawrence breamore...@yahoo.co.uk wrote:
On 09/01/2014 06:50, Lennart Regebro wrote:
On Thu, Jan 9, 2014 at 1:07 AM, Ben Finney ben+pyt...@benfinney.id.au
wrote:
Kristján Valur Jónsson krist...@ccpgames.com writes:
Believe it or not, sometimes you
On 9 January 2014 04:50, Lennart Regebro rege...@gmail.com wrote:
To be honest, you can define text as A stream of bytes that are split
up in lines separated by a linefeed, and do some basic text
processing like that. Just very *basic*, but still. Replacing
characters. Extracting certain lines
On Fri, Jan 10, 2014 at 11:53 AM, anatoly techtonik techto...@gmail.com wrote:
2. introduce autodetect mode to open functions
1. read and transform on the fly, maintaining a buffer that
stores original bytes
and their mapping to letters. The mapping is updated as bytes
On 10 Jan 2014 03:32, Antoine Pitrou solip...@pitrou.net wrote:
On Fri, 10 Jan 2014 05:26:04 +1000
Nick Coghlan ncogh...@gmail.com wrote:
We should probably include format_map for consistency with the str API.
Yes, you're right.
However, I
also added bytearray into the mix, as
On Thu, Jan 09, 2014 at 02:08:57PM -0800, Ethan Furman wrote:
If latin1 is used to convert binary to text, how convoluted is it to then
take chunks of that text and convert to int, or some other variety of
unicode?
For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80'
If that were decoded
On 1/9/2014 6:25 PM, Chris Barker wrote:
as so -- I want to replace a bit of ascii text surrounded by arbitrary
binary:
(apologies for the py2...)
In [24]: b
Out[24]: '\x01\x00\xd1\x80\xd1a name\xd0\x80'
In [25]: u = b.decode('latin-1')
In [26]: u2 = u.replace('a name', 'a different name')
In
On Fri, Jan 10, 2014 at 12:22:02PM +1100, Chris Angelico wrote:
On Fri, Jan 10, 2014 at 11:53 AM, anatoly techtonik techto...@gmail.com
wrote:
2. introduce autodetect mode to open functions
1. read and transform on the fly, maintaining a buffer that
stores original bytes
On Thu, Jan 9, 2014 at 10:06 AM, Kristján Valur Jónsson
krist...@ccpgames.com wrote:
Do I speak Chinese to my grocer because china is a growing force in the
world? Or start every discussion with my children with a negotiation on what
language to use?
No, because your environment have a
On Fri, Jan 10, 2014 at 2:03 AM, Joao S. O. Bueno jsbu...@python.org.br wrote:
On 9 January 2014 04:50, Lennart Regebro rege...@gmail.com wrote:
To be honest, you can define text as A stream of bytes that are split
up in lines separated by a linefeed, and do some basic text
processing like
On Fri, Jan 10, 2014 at 1:39 PM, Steven D'Aprano st...@pearwood.info wrote:
On Fri, Jan 10, 2014 at 12:22:02PM +1100, Chris Angelico wrote:
On Fri, Jan 10, 2014 at 11:53 AM, anatoly techtonik techto...@gmail.com
wrote:
2. introduce autodetect mode to open functions
1. read and
Steven D'Aprano st...@pearwood.info writes:
I think that heuristics to guess the encoding have their role to play,
if the caller understands the risks.
I think, for a language whose developers espouse a principle “In the
face of ambiguity, refuse the temptation to guess”, heuristics have no
INADA Naoki writes:
latin1 is OK but is it Pythonic?
Yes. EIBTI, including being explicit that you're doing something that
has semantics that you are ignoring but may come back to bite you or
somebody who naively uses your module.
There's nothing un-Pythonic about using potentially dangerous
Chris Angelico writes:
I'm not saying that chardet is bad, but I *am* saying, and I stand
by this, that an auto-detect option on file open is a bad idea.
I have used it by default in Emacs and XEmacs since 1990, and I
certainly haven't experienced it as a bad idea at *any* time in more
than
63 matches
Mail list logo