Am 06.04.2006 um 10:27 schrieb Christian Boos:
Of course, that would not be used to every text used in the system,
only for file content. The above `to_unicode` is also used for that,
so I think I'll rename it `data_to_unicode` (preserve content),
to contrast it with `text_to_unicode` (which might be "lossy").

I still fail to see how any text decoding can *not* be lossy if you don't know the encoding. Decoding using ISO-8859-15 is only going to be non-lossy if that *happens to be* the encoding of the text.

And why would we ever want to decode non-textual data to unicode? Such attempts should be considered a bug.

An alternative to having 2 versions `*_to_unicode` would be to add
a third optional argument: `to_unicode(text, charset=None, lossy=False)`.

That would be better, but as explained above, I fail to see the point of `lossy=False`.

PS: Hm, I just realized I begin to wiki format my e-mails... Damn :)

And I thought you've been doing that since, like, forever :-P

Cheers,
Chris
--
Christopher Lenz
  cmlenz at gmx.de
  http://www.cmlenz.net/

_______________________________________________
Trac-dev mailing list
[email protected]
http://lists.edgewall.com/mailman/listinfo/trac-dev

Reply via email to