Nick Coghlan added the comment:
The core problem with the idea of adding bytes.format to Python 3 is that the
real power of str.format actually lies in the extensible __format__ protocol
and the associated format() builtin, as those rely heavily on text-specific
assumptions.
I interpreted
Derek Wilson added the comment:
Gregory - I'm glad that you're willing to consider this again. It still is a
constant issue for me, and .format with variable width fields in binary
protocols is so the right tool for the job. If there is anything I can do to
help get this added to 3.6 let me
Gregory P. Smith added the comment:
This came up in the language summit today when discussing twisted. .format()
is still not supported on bytes though % is in 3.5.
realistically it sounded like twisted needs to support python 3.4 for many
years so they can't rely on bytes having a .format()
Terry J. Reedy added the comment:
http://legacy.python.org/dev/peps/pep-0461/
adds % formatting for bytes and bytes array.
Nick, I have the impression that there was a decision to not add bytes.format.
Correct? If so, this issue should be closed. If not, what, if anything, has
been decided?
Nick Coghlan added the comment:
Right, bytes.format was considered as part of the PEP 461 discussions, and
rejected as an operation that only made sense in the text domain:
http://www.python.org/dev/peps/pep-0461/#proposed-variations
With PEP 461 accepted, and PEP 460 withdrawn, that means we
Derek Wilson added the comment:
First off, +1 for this feature. It's not just for twisted, but anyone doing
anything with binary data (storage, compression, encryption and networking for
me) with python since 2.6 will very likely have been using .format for building
messages. I know I have
Derek Wilson added the comment:
sorry, terry's patch does handle padding - just with the caveats i listed
later. i should have removed that bullet.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
Changes by Tshepang Lekhonkhobe tshep...@gmail.com:
--
nosy: +tshepang
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
___
Python-bugs-list
Changes by Brett Cannon br...@python.org:
--
nosy: +brett.cannon
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
___
Python-bugs-list
Changes by Brett Cannon br...@python.org:
--
versions: +Python 3.5 -Python 3.4
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
___
Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:
--
nosy: +Arfrever
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
Changes by Guido van Rossum gu...@python.org:
--
nosy: -gvanrossum
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
___
Python-bugs-list
Ezio Melotti added the comment:
You can use sys.stdout.buffer.write.
Note that there's no guarantee that sys.stdout.buffer exists, e.g. if
sys.stdout has been replaced with a StringIO.
--
___
Python tracker rep...@bugs.python.org
Glyph Lefkowitz added the comment:
Tempting as it is to reply to the comment about 'buffer' not existing, we're
way off topic here. Let's please keep further comments on this bug to issues
about a 'format' methods on the 'bytes' object.
--
___
Antoine Pitrou added the comment:
I'd like to put a nudge towards supporting the __mod__ interface on bytes -
for Mercurial this is the single biggest impediment to even getting our
testrunner working, much less starting the porting process.
Given a spec hasn't been written (bytes.__mod__
Augie Fackler added the comment:
Is there any chance we could just have it work for bytes, ints, and floats?
That'd solve the immediate need, and it'd be obviously correct how to have
those behave.
Punting this to 3.5 basically means we'll have to either wait for 3.5, or do
something awful
Eric V. Smith added the comment:
If you could write up a concrete proposal, including which format specifiers
would be supported, that would be helpful.
Would it be extensible with something like __bformat__?
There's really quite a bit of work to be done to specify how this would work.
Eric V. Smith added the comment:
Also, with the PEP 393 changes, the implementation will be much more difficult.
Sharing code with str (unicode) will likely be impossible, or require much
refactoring of the existing code.
--
___
Python tracker
Antoine Pitrou added the comment:
Is there any chance we could just have it work for bytes, ints, and
floats? That'd solve the immediate need, and it'd be obviously
correct how to have those behave.
You mean %s and %d?
Punting this to 3.5 basically means we'll have to either wait for
Augie Fackler added the comment:
On Tue, Oct 8, 2013 at 11:08 AM, Antoine Pitrou rep...@bugs.python.orgwrote:
Is there any chance we could just have it work for bytes, ints, and
floats? That'd solve the immediate need, and it'd be obviously
correct how to have those behave.
You mean %s
Glyph Lefkowitz added the comment:
On Oct 8, 2013, at 8:10 AM, Augie Fackler rep...@bugs.python.org wrote:
Hah. Probably too slow for anything beyond a proof of concept, no?
It should perform acceptably on PyPy ;-).
--
___
Python tracker
Antoine Pitrou added the comment:
Punting this to 3.5 basically means we'll have to either wait for
3.5, or do something awful like use cffi to grab sprintf to port
Mercurial.
Or write a pure Python implementation.
Hah. Probably too slow for anything beyond a proof of concept,
Augie Fackler added the comment:
On Tue, Oct 8, 2013 at 5:11 PM, Antoine Pitrou rep...@bugs.python.orgwrote:
Antoine Pitrou added the comment:
Punting this to 3.5 basically means we'll have to either wait for
3.5, or do something awful like use cffi to grab sprintf to port
STINNER Victor added the comment:
2013/10/8 Augie Fackler rep...@bugs.python.org:
sys.stdout.write('%(state)s %(path)s\n' % {'state': 'M', 'path':
'some/filesystem/path'})
except we don't know the encoding of the filesystem path (Hi unix!) so we
have to treat the whole thing as opaque
Eric V. Smith added the comment:
I've lost track what we were talking about. I thought we were trying to support
b'something'.format() in 3.4, for a restricted set of arguments.
I don't see how a third-party package is going to help, if the goal is to allow
3.4 to be source compatible with
Glyph Lefkowitz added the comment:
On Oct 8, 2013, at 2:35 PM, Eric V. Smith wrote:
What proposal is actually on the table here?
Sorry Eric, you're right, there is too much discussion here. This issue ought
to be about .format, like the title says. There should be a separate ticket
for
Augie Fackler added the comment:
On Oct 8, 2013, at 5:24 PM, STINNER Victor rep...@bugs.python.org wrote:
STINNER Victor added the comment:
2013/10/8 Augie Fackler rep...@bugs.python.org:
sys.stdout.write('%(state)s %(path)s\n' % {'state': 'M', 'path':
'some/filesystem/path'})
except
Augie Fackler added the comment:
On Oct 8, 2013, at 6:19 PM, Glyph Lefkowitz rep...@bugs.python.org wrote:
Glyph Lefkowitz added the comment:
On Oct 8, 2013, at 2:35 PM, Eric V. Smith wrote:
What proposal is actually on the table here?
Sorry Eric, you're right, there is too much
Augie Fackler added the comment:
On Oct 8, 2013, at 6:28 PM, Terry J. Reedy rep...@bugs.python.org wrote:
http://www.python.org/dev/peps/pep-0383/
One point of the pep is round-trip filenames without loss on all systems,
which is just what you say you need.
At a quick skim, likely not good
Terry J. Reedy added the comment:
Augie, to understand what Viktor meant, I suggest reading
http://www.python.org/dev/peps/pep-0383/
One point of the pep is round-trip filenames without loss on all systems, which
is just what you say you need.
--
Glyph Lefkowitz added the comment:
On Oct 8, 2013, at 3:19 PM, Augie Fackler wrote:
No, I'm not. In Mercurial, all end-user data is OPAQUE BYTES, and must remain
that way.
The PEP 383 technique for handling file names is completely capable of
round-tripping exact bytes, given one encoding
Changes by Terry J. Reedy tjre...@udel.edu:
Added file: http://bugs.python.org/file32008/byte_format.py
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
Changes by Terry J. Reedy tjre...@udel.edu:
Removed file: http://bugs.python.org/file32008/byte_format.py
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
Terry J. Reedy added the comment:
Here is a proof of concept Python function, with a minimal test. It is similar
to how str.format could be coded in Python, with re.split and ''.join, except
that it does not allow anything before : in the format specification. By
default (no format spec
Changes by Stendec m...@stendec.me:
--
nosy: +stendec
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
___
Python-bugs-list mailing list
Changes by nlev...@gmail.com nlev...@gmail.com:
--
nosy: +nlev...@gmail.com
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
___
Augie Fackler added the comment:
I'd like to put a nudge towards supporting the __mod__ interface on bytes - for
Mercurial this is the single biggest impediment to even getting our testrunner
working, much less starting the porting process.
--
nosy: +durin42
Changes by Ecir Hana ecir.h...@gmail.com:
--
nosy: +ecir.hana
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
___
Python-bugs-list mailing
Changes by Barry A. Warsaw ba...@python.org:
--
nosy: +barry
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
___
Python-bugs-list mailing
Changes by Gregory P. Smith g...@krypto.org:
--
nosy: +gregory.p.smith
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
___
Python-bugs-list
Guido van Rossum added the comment:
I don't believe it either. I find join consistently faster than format:
python2.7 -m timeit -s 'x = [bx*1000']*10 'b.join(x)'
100 loops, best of 3: 0.686 usec per loop
python2.7 -m timeit -s 'x = bx*1000'
'(b{}{}{}{}{}{}{}{}{}{}).format(x, x, x, x, x, x,
Eric V. Smith added the comment:
I think ''.join() will always be faster than ''.format(), for a number of
reasons (some already stated):
- it doesn't have to pass the format string
- it doesn't have to do the __format__ lookup and call the resulting function
(although I believe there's an
Antoine Pitrou added the comment:
Whether b''.format() would have to lookup and call __format__ remains
to be seen. From what I've read, maybe baking in knowledge of bytes,
float, and int would be good enough. I suspect there might be some
need for datetimes, but I could be wrong.
The
Eric V. Smith added the comment:
I retract the datetime comment. Given what we're trying to accomplish, I think
we only need to support types that are supported by 2.7's %-formatting.
--
___
Python tracker rep...@bugs.python.org
Guido van Rossum added the comment:
Remember, the only reason to add this would be to enable writing code
that works in both 2.7 and 3.4. So it has to be called .format() and
it has to format numbers as decimal strings by default.
--
___
Python
Changes by Florent Xicluna florent.xicl...@gmail.com:
--
nosy: +flox
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
___
Python-bugs-list
Glyph Lefkowitz added the comment:
On Jan 22, 2013, at 11:27 PM, Antoine Pitrou rep...@bugs.python.org wrote:
Antoine Pitrou added the comment:
The ASCII superset commands part is clearly separated from the binary
data part. Your own LineReceiver is able to switch between raw mode
and
Glyph Lefkowitz added the comment:
On Jan 22, 2013, at 11:31 PM, Martin v. Löwis rep...@bugs.python.org wrote:
I admit that it is puzzling that string interpolation is apparently the
fastest way to assemble byte strings. It involves parsing the format string,
so it ought to be slower than
Glyph Lefkowitz added the comment:
On Jan 23, 2013, at 1:58 AM, Antoine Pitrou rep...@bugs.python.org wrote:
Numbers currently don't have a __bytes__ method:
(5).__bytes__()
Traceback (most recent call last):
File stdin, line 1, in module
AttributeError: 'int' object has no attribute
Antoine Pitrou added the comment:
They do have some rather odd behavior when passed to the builtin
though:
bytes(10)
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
It would be much more convenient for me if bytes(int) returned the
ASCIIfication of that int; but honestly, even an error
Glyph Lefkowitz added the comment:
On Jan 23, 2013, at 11:02 AM, Antoine Pitrou rep...@bugs.python.org wrote:
I would agree with you, but it's probably too late to change...
Understandable, and, in any case, out of scope for this ticket.
--
___
Eric V. Smith added the comment:
So it sounds like the use case is (as Glyph said in msg180432):
- Provide a transition for users of 2.7's of str %-formatting into a style
that's compatible with both str in 2.7 and bytes in 3.4.
In that case the only options I see are to implement __mod__ or
Guido van Rossum added the comment:
Twisted still would like to see this.
--
nosy: +gvanrossum
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
Benjamin Peterson added the comment:
Implementing this certainly hasn't gotten any easier as 3.x str.format has
evoled. The kind of format codes and modifiers wanted to for formatting byte
strings might be different that those for text strings. I think it probably
needs a pep.
--
Guido van Rossum added the comment:
Would it be easier if the only format codes/types supported were
bytes, int and float?
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
Christian Heimes added the comment:
IMHO a useful API has to provide a more low level functionality like format
number as 32 bit unsigned integer in network endian. A bytes.format() function
should support all format chars from
http://docs.python.org/3/library/struct.html#format-characters
Benjamin Peterson added the comment:
The problem is not so much the types allowed the code for dealing with the
format string. The parsing code for format specificers is pretty unicode
specific now. If that was to be made generic again, it's worth considering
exactly what features belong in a
Guido van Rossum added the comment:
Honestly, what Twisted is mostly after is a way to write code that
works both with Python 2 and Python 3. They need the types I mentioned
only (bytes, int, float) and not too many advanced features of
.format() -- but if it's not called .format() or if the
Antoine Pitrou added the comment:
Given the issues which have been brought here, I agree that it's PEP material.
--
nosy: +pitrou
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
Ezio Melotti added the comment:
Serhiy did a nice summary in msg171804, and I think this is PEP material too.
What he wrote could be used as a starting point; the next step would be
collecting use cases (the Twisted guys seem to have some). Once we have
defined what we want we can figure
Guido van Rossum added the comment:
Well, msg171804 makes it a much bigger project than the feature that Twisted
actually needs. Quoting:
* The default formatting should not use str(), but buffer protocol.
Fine.
* There is no place for floating point.
Actually they do need it -- and it's
Changes by Glyph Lefkowitz gl...@twistedmatrix.com:
--
nosy: +glyph
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
___
Python-bugs-list
Antoine Pitrou added the comment:
Right, but we're not writing builtin type methods specifically for Twisted. I
agree with the idea that the feature set should very limited, actually perhaps
more limited than what you just said. For example, I think any kind of implicit
str-bytes conversion
Glyph Lefkowitz added the comment:
On Jan 22, 2013, at 11:39 AM, Antoine Pitrou rep...@bugs.python.org wrote:
Antoine Pitrou added the comment:
I agree with the idea that the feature set should very limited, actually
perhaps more limited than what you just said. For example, I think any
Antoine Pitrou added the comment:
there are plenty of other Python applications that don't use Twisted
which nevertheless need to emit formatted sequences of bytes.
The fact that there are plenty of other Python applications that don't
use Twisted which nevertheless need to emit formatted
STINNER Victor added the comment:
2013/1/22 Guido van Rossum rep...@bugs.python.org:
Twisted still would like to see this.
Sorry, but this argument doesn't convince me. A better argument is
that bytes+bytes+...+bytes is inefficient: it creates a lot of
temporary objects instead of computing
Glyph Lefkowitz added the comment:
On Jan 22, 2013, at 1:46 PM, STINNER Victor rep...@bugs.python.org wrote:
2013/1/22 Guido van Rossum rep...@bugs.python.org:
Twisted still would like to see this.
Sorry, but this argument doesn't convince me. A better argument is
that
Terry J. Reedy added the comment:
it would probably be reasonable to make these protocols use str objects at the
heart, and only convert to bytes after the formatting is done.
I presume this would mean adding 'if py3: out = out.encode()' after the
formatting. As I said before, this works much
Glyph Lefkowitz added the comment:
Antoine Pitrou added the comment:
The fact that there are plenty of other Python applications that don't
use Twisted which nevertheless need to emit formatted sequences of
bytes is *precisely* a good reason for this to be discussed more
visibly.
I don't
Glyph Lefkowitz added the comment:
On Jan 22, 2013, at 3:34 PM, Terry J. Reedy rep...@bugs.python.org wrote:
I presume this would mean adding 'if py3: out = out.encode()' after the
formatting. As I said before, this works much better in 3.3+ than in 3.2-.
Some actual numbers:
I'm glad
Antoine Pitrou added the comment:
Le mardi 22 janvier 2013 à 23:34 +, Terry J. Reedy a écrit :
Terry J. Reedy added the comment:
it would probably be reasonable to make these protocols use str objects at
the heart, and only convert to bytes after the formatting is done.
I presume
Terry J. Reedy added the comment:
After re-reading everything, I have somewhat changed my mind on this proposal.
Perhaps 3.0 threw out too much, making it overly difficult to do some things
that were to easy in 2.x and to write cross-version code.
String formatting converts all arguments to
Antoine Pitrou added the comment:
What I know from Twisted is there are many specific cases where, indeed,
binary protocol strings are formed by string formatting, e.g. in the FTP
implementation (and for good reason since those protocols are either ASCII
or an ASCII superset).
These
Martin v. Löwis added the comment:
I admit that it is puzzling that string interpolation is apparently the fastest
way to assemble byte strings. It involves parsing the format string, so it
ought to be slower than anything that merely concatenates (such as cStringIO).
(I do understand why +
Jean-Paul Calderone added the comment:
Since Benjamin originally requested this feature, and then decided that he
could accomplish his desired goal (ftplib porting, as far as I can tell)
without it, I think that the rejected status is actually incorrect. I think
that Benjamin just wanted to
Christian Heimes added the comment:
The proposal sounds like a good idea to me.
Benjamin, what needs to be done to implement the feature?
--
nosy: +christian.heimes
versions: +Python 3.4 -Python 3.1
___
Python tracker rep...@bugs.python.org
Serhiy Storchaka added the comment:
Formatting is a very complicated part of Python (especially after Victor's
optimizations). I think no one wants to maintain this code for a long time. The
price of maintaining exceeds the potential very limited benefits from the use.
--
nosy:
Eric V. Smith added the comment:
I was just logging in to make this point, but Serhiy beat me to it. When I
wrote several years ago that this was easy, it was before the (awesome) PEP
393 work. I suspect, but have not verified, that having a bytes version of this
code would now require an
Jean-Paul Calderone added the comment:
The price of maintaining exceeds the potential very limited benefits from the
use.
The very limited benefits of being able to write I/O code without roughly 3
times code bloat? Perhaps for people who don't write code that does
non-trivial I/O, but
Eric V. Smith added the comment:
The implementation may be difficult, therefore no one should attempt it?
The development cost and maintenance cost is surely part of the evaluation when
deciding whether to implement a feature, no?
--
___
Python
Jean-Paul Calderone added the comment:
The development cost and maintenance cost is surely part of the evaluation
when deciding whether to implement a feature, no?
Sure, but in an open source project where almost all contributions are done by
volunteers (ie, donated), what is the
Serhiy Storchaka added the comment:
I suspect, but have not verified, that having a bytes version of this code
would now require an implementation that shared very little with the str
version.
This is not all. The usage model will be completely different too.
* The default formatting
Benjamin Peterson added the comment:
As Serhiy suggests, it would be best to collect th eusecases for a format-like
method for bytes and design something which can meet them. It's definitely a
PEP.
--
___
Python tracker rep...@bugs.python.org
Terry J. Reedy added the comment:
In 3.3+, somestring.encode('ascii') is a small constant-time operation. So for
pure ascii *text* bytes, that seems the appropriate 3.x approach.
I agree that something else should be used for binary formatting. Perhaps
struct.pack could be extended to work
Benjamin Peterson added the comment:
It's not constant time.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
___
Python-bugs-list
Terry J. Reedy added the comment:
Sorry, I was thinking of something else. Encoding ascii-only text is merely
much faster (3x?) than in 3.2- because it directly copies without using the
codec.
--
___
Python tracker rep...@bugs.python.org
Serhiy Storchaka added the comment:
Sorry, I was thinking of something else. Encoding ascii-only text is merely
much faster (3x?) than in 3.2- because it directly copies without using
the codec.
In 3.3 encoding to ascii or latin1 as fast as memcpy. 12-15x on my computer.
--
Uoti Urpala uoti.urp...@pp1.inet.fi added the comment:
I've hit this limitation a couple more times, and none of the proposed
workarounds are adequate. Working with protocols and file formats that use
human-readable markup is significantly clumsier than it was with Python 2
(using either the
Terry J. Reedy tjre...@udel.edu added the comment:
If you want to discuss this issue further, I think you post to python-ideas
list with concrete examples.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
Changes by Martin Panter vadmium...@gmail.com:
--
nosy: +vadmium
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
___
Python-bugs-list
Changes by Ezio Melotti ezio.melo...@gmail.com:
--
nosy: +ezio.melotti
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3982
___
___
Python-bugs-list
Arjen Nienhuis a.g.nienh...@gmail.com added the comment:
struct.pack does not work with variable length data. Something like:
b'{0:x}\r\n{1}\r\n'.format(len(block), block)
or
b'%x\r\n%s\r\n' % (len(block), block)
is not possible with struct.pack
--
Terry J. Reedy tjre...@udel.edu added the comment:
You are right, I misinterpreted the meaning of 's' without a count (and opened
#11436 to clarify). However, for the fairly common case where a variable-length
binary block is preceded by a 4 byte *binary* count, one can do something which
is
Terry J. Reedy tjre...@udel.edu added the comment:
For future reference, struct.pack, not mentioned here, is a binary bytes
formatting function. It can mix ascii bytes with binary octets. It works the
same in Python 2 and 3.
Str.bytes does two things: convert objects to strings according to
Uoti Urpala uoti.urp...@pp1.inet.fi added the comment:
This kind of formatting is needed quite often when working on network protocols
or file formats, and I think the replies here fail to address important issues.
In general you can't encode after formatting, as that doesn't work with binary
Arjen Nienhuis a.g.nienh...@gmail.com added the comment:
There are many binary formats that use ASCII numbers.
'HTTP chunking' uses ASCII mixed with binary (octets).
With 2.6 you could write:
def chunk(block):
return b'{0:x}\r\n{1}\r\n'.format(len(block), block)
With 3.0 you'd have to
Martin v. Löwis mar...@v.loewis.de added the comment:
def chunk(block):
return format(len(block), 'x').encode('ascii') + b'\r\n' + block +
b'\r\n'
You cannot convert to ascii at the end of the pipeline as there are
bytes 127 in the data blocks.
I wouldn't write it in such a
Arjen Nienhuis a.g.nienh...@gmail.com added the comment:
def chunk(block):
return hex(len(block)).encode('ascii') + b'\r\n' + block + b'\r\n'
hex(10) returns '0xa' instead of 'a'.
This doesn't need any format call, and describes adequatly how the
protocol works: send an ASCII-encoded hex
Martin v. Löwis mar...@v.loewis.de added the comment:
hex(10) returns '0xa' instead of 'a'.
Ah, right. So I would still use
'{0:x}'.format(100).encode(ascii)
rather than the format builtin format function. Actually, I would
probably use
('%x' % len(bytes)).encode(ascii)
The point is
STINNER Victor victor.stin...@haypocalc.com added the comment:
loewis That's indeed exactly what I had proposed
loewis - only that you shouldn't repeat the .encode('ascii')
loewis all over the place, (...)
If you can only use bytes 0..127, it can not used for binary protocols
and so I don't
1 - 100 of 111 matches
Mail list logo