Re: [sqlite] [OT] UTF8-BOM and text encoding detection (was: UTF8-BOM not disregarded in CSV import)

Simon Slavin Tue, 27 Jun 2017 04:54:14 -0700


On 27 Jun 2017, at 7:12am, Rowan Worth <row...@dug.com> wrote:


> In fact using this assumption we could dispense with the BOM entirely for
> UTF-8 and drop case 5 from the list.

If you do that, you will try to process the BOM at the beginning of a UTF-8 
stream as if it is characters.

> So my question is, what advantage does
> a BOM offer for UTF-8? What other cases can we identify with the
> information it provides?

Suppose your software processes only UTF-8 files, but someone feeds it a file 
which begins with FE FF.  Your software should recognise this and reject the 
file, telling the user/programmer that it can’t process it because it’s in the 
wrong encoding.

Processing BOMs is part of the work you have to do to make your software 
Unicode-aware.  Without it, your documentation should state that your software 
handles the one flavour of Unicode it handles, not Unicode in general.  There’s 
nothing wrong with this, if it’s all the programmer/user needs, as long as it’s 
correctly documented.

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] [OT] UTF8-BOM and text encoding detection (was: UTF8-BOM not disregarded in CSV import)

Reply via email to