Hi,
On 02/05/2014 09:55, Malthe Borch wrote:
It blows up – as expected, because ascii is a limited encoding.
It's not just ascii. That said, blowing up at encoding time is terrible because
you don't know where the error comes from. This is especially a huge problem on
Python 3 right now
Hi,
On 02/05/2014 00:03, John Downey wrote:
I have actually always been a fan of how .NET did this. The System.String type
is opinionated in how it is stored internally and does not allow anyone to
change that (unlike Ruby). The conversion from String to byte[] is done using
explicit conversion
On Wed, May 14, 2014 at 2:25 PM, Armin Ronacher armin.ronac...@active-4.com
wrote:
Hi,
On 02/05/2014 00:03, John Downey wrote:
I have actually always been a fan of how .NET did this. The System.String
type
is opinionated in how it is stored internally and does not allow anyone to
change
The encoding / glob. code in .NET works well , the strings use of
code-points is poor choice and both C# and Java suffer heavily for it
when doing IO.
Ropes / chords/ chains etc belong at a higher level not the lowest level
type.
Ben
On Fri, May 2, 2014 at 8:03 AM, John Downey
On 2 May 2014 00:06, Tony Arcieri basc...@gmail.com wrote:
This sounds like the exact same painful failure mode as Ruby (transcoding
blowing up at completely unexpected times) with even more complexity, making
it even harder to debug.
Here is a concrete example of when this would blow up:
1.
On 5/2/14, Malthe Borch mbo...@gmail.com wrote:
On 2 May 2014 00:06, Tony Arcieri basc...@gmail.com wrote:
This sounds like the exact same painful failure mode as Ruby (transcoding
blowing up at completely unexpected times) with even more complexity,
making
it even harder to debug.
Here is
Hi,
2014-05-02 3:52 GMT+03:00 Nathan Myers n...@cantrip.org:
There's a string type because it *enforces* the guarantee of containing
valid UTF-8, meaning it can always be converted to code points. This
also means all of the Unicode algorithms can assume that they're dealing
with a valid
Hi,
2014-05-01 16:53 GMT+03:00 Malthe Borch mbo...@gmail.com:
In Rust, the built-in std::str type is a sequence of unicode
codepoints encoded as a stream of UTF-8 bytes.
Meanwhile, building on experience with Python 2 and 3, I think it's
worth considering a more flexible design.
A string
It would be a mistake for a byte sequence container, stream, or string type
to know anything about particular encodings. An encoding is an
interpretation imposed on a byte sequence. Users of a sequence need to be
able to choose what interpretation to apply without interference from some
On Thursday, May 1, 2014, Nathan Myers n...@cantrip.org wrote:
It would be a mistake for a byte sequence container, stream, or string
type to know anything about particular encodings. An encoding is an
interpretation imposed on a byte sequence. Users of a sequence need to be
able to choose
On Thu, May 1, 2014 at 6:53 AM, Malthe Borch mbo...@gmail.com wrote:
A string would be essentially a rope where each leaf specifies an
encoding, e.g. UTF-8 or ISO8859-1 (ideally expressed as one or two
bytes).
That is, a string may be comprised of segments of different encodings.
Oh god
On 1 May 2014 21:03, Tony Arcieri basc...@gmail.com wrote:
Oh god no! Please no. This is what Ruby does and it's a complete nightmare.
This creates an entire new class of bug when operations are performed on
strings with incompatible encodings. It's an entire class of bug that simply
doesn't
On 1 May 2014 18:54, Mikhail Zabaluev mikhail.zabal...@gmail.com wrote:
I don't think that so much hidden complexity would be justified in the
built-in string type. Encoded text is typically dealt with in protocol
libraries or similar I/O barriers where it should be passed through a
validating
On Thu, May 1, 2014 at 1:06 PM, Malthe Borch mbo...@gmail.com wrote:
This is not the case in the proposed design.
You're wrong.
All string operations would behave exactly as if there was only a
single encoding. The only requirement is that the strings are properly
declared with an
On 01/05/14 09:53 AM, Malthe Borch wrote:
In Rust, the built-in std::str type is a sequence of unicode
codepoints encoded as a stream of UTF-8 bytes.
Meanwhile, building on experience with Python 2 and 3, I think it's
worth considering a more flexible design.
A string would be essentially
Yes, this is what Ruby does, and yes, it causes a lot of tears. It's
one of the biggest things that made the 1.8 - 1.9 transition
difficult.
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev
On 1 May 2014 22:42, Tony Arcieri basc...@gmail.com wrote:
No, when you combine strings with different encodings, you need to transcode
one of the strings. When this happens, the transcoding process may encounter
some characters which are valid in one encoding, but not another, in which
case
On 5/1/14 6:53 AM, Malthe Borch wrote:
In Rust, the built-in std::str type is a sequence of unicode
codepoints encoded as a stream of UTF-8 bytes.
Meanwhile, building on experience with Python 2 and 3, I think it's
worth considering a more flexible design.
A string would be essentially a rope
I have actually always been a fan of how .NET did this. The System.String
type is opinionated in how it is stored internally and does not allow
anyone to change that (unlike Ruby). The conversion from String to byte[]
is done using explicit conversion methods like:
-
On Thu, May 1, 2014 at 2:45 PM, Malthe Borch mbo...@gmail.com wrote:
The transcoding needs to happen only at the time when you flatten
the rope into a single encoding. And yes, it may then fail if you
attempt to encode into a non-unicode encoding.
This sounds like the exact same painful
Agreed with Patrick. This proposal should not be in std::str ... it can
live somewhere else...but not there.
--
-Thad
+ThadGuidry https://www.google.com/+ThadGuidry
Thad on LinkedIn http://www.linkedin.com/in/thadguidry/
On Thu, May 1, 2014 at 4:52 PM, Patrick Walton pcwal...@mozilla.com
On 05/01/2014 02:52 PM, Patrick Walton wrote:
On 5/1/14 6:53 AM, Malthe Borch wrote:
In Rust, the built-in std::str type is a sequence of unicode
codepoints encoded as a stream of UTF-8 bytes.
...
A string would be essentially a rope where each leaf specifies an
encoding, e.g. UTF-8 or
On 01/05/14 07:49 PM, Nathan Myers wrote:
On 05/01/2014 02:52 PM, Patrick Walton wrote:
On 5/1/14 6:53 AM, Malthe Borch wrote:
In Rust, the built-in std::str type is a sequence of unicode
codepoints encoded as a stream of UTF-8 bytes.
...
A string would be essentially a rope where each leaf
On Thu, May 1, 2014 at 4:49 PM, Nathan Myers n...@cantrip.org wrote:
The history of programming languages is littered with mistakes
around string types. There's no reason why Rust must repeat
them all.
FWIW, I've worked in systems that work the way you describe, and I disagree
and think
On 05/01/2014 04:57 PM, Daniel Micay wrote:
On 01/05/14 07:49 PM, Nathan Myers wrote:
In defining a library string we always grapple over how it
should differ from a raw (variable or fixed) array of bytes.
Ease of appending and of assigning into substrings always
comes up. In the old days,
25 matches
Mail list logo