Yves, we are thinking about a general API for encoding detection that could initially
just check for BOM/Unicode signatures. I believe we have a feature request for this
already. Mark and I just brainstormed about what we may want an API look like.
The reason for doing what ICU is doing currently is simple pragmatism. None of our
converters auto-detects anything, and they write only what you tell them to write.
When you deal with serialized data structures and fields in files or databases, that
is exactly what you want.
With signature-carrying files and transmission protocols, there is more work necessary.
It seems to me that a converter API with its ability to take one byte at a time, and
no other way to pass additional information ("I know the language of the text..."), is
not the best way to implement this.
On output, writing a BOM/signature is easy: if you know you need one, then just call
the converter once with U+feff.
Although, with this one feature, I could imagine having an API "emit a Unicode
signature if you are a converter for a Unicode encoding".
markus