Christopher Lenz wrote:
Am 06.04.2006 um 10:27 schrieb Christian Boos:
Of course, that would not be used to every text used in the system,
only for file content. The above `to_unicode` is also used for that,
so I think I'll rename it `data_to_unicode` (preserve content),
to contrast it with `text_to_unicode` (which might be "lossy").
I still fail to see how any text decoding can *not* be lossy if you
don't know the encoding. Decoding using ISO-8859-15 is only going to
be non-lossy if that *happens to be* the encoding of the text.
It depends: if we're talking about encoding an unicode object
to an ISO-8859-15 str object, then you're right: most of the
unicode code points can't be mapped, of course.
But this is not this situation I'm talking about.
I'm talking about the reverse situation, that is building an
unicode object from a str object. Here, using ISO-8859-15 or
any other fixed 1-byte size encoding will always succeed
(i.e. any byte value x from the input canbe associated
to an unicode code point y (*)).
Later, converting this unicode object back to a str object using
the same encoding will also succeed.
And why would we ever want to decode non-textual data to unicode? Such
attempts should be considered a bug.
Yes, here I agree that we can perhaps improve this situation.
This mainly concerns file content, as read from the filesystem or
from the repository.
My motivation was to provide unicode objects to the IMimeViewRenders
plugins. Some of those plugins may want to have an access to the
unmodified raw data, like a ImageThumbnailRenderer.
But then, we could make a special exception, that is, document
very clearly that the `content` argument for those renderers
(as well as for the IMimeTypeDetectors of my previous mail)
is a raw `str` object.
This was an alternative I've already considered, and I'm OK
to do it that way, if you think it's better.
-- Christian
(*) This was explained in the old to_utf8 method,
but I just verified and actually it's for iso-8859-1
and not for iso-8859-15 that you'd have y == x.
_______________________________________________
Trac-dev mailing list
[email protected]
http://lists.edgewall.com/mailman/listinfo/trac-dev