Re: [PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-07 Thread Torsten Bögershausen
On Tue, Mar 06, 2018 at 03:37:16PM -0800, Junio C Hamano wrote: > Lars Schneider writes: > > > After thinking about it I wonder if we should barf on "utf16" without > > dash. Your Linux iconv would handle this correctly. My macOS iconv would > > not. > > That means the repo would checkout correc

Re: [PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-06 Thread Junio C Hamano
Lars Schneider writes: > After thinking about it I wonder if we should barf on "utf16" without > dash. Your Linux iconv would handle this correctly. My macOS iconv would not. > That means the repo would checkout correctly on your machine but not on mine. > > What do you think? To be bluntly hone

Re: [PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-06 Thread Lars Schneider
> On 07 Mar 2018, at 00:07, Junio C Hamano wrote: > > Junio C Hamano writes: > >> Lars Schneider writes: >> Also "UTF16" or other spelling the platform may support but this code fails to recognise will go unchecked. >>> >>> That is true. However, I would assume all iconv impl

Re: [PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-06 Thread Junio C Hamano
Junio C Hamano writes: > Lars Schneider writes: > >>> Also "UTF16" or other spelling >>> the platform may support but this code fails to recognise will go >>> unchecked. >> >> That is true. However, I would assume all iconv implementations use the >> same encoding names for UTF encodings, no? T

Re: [PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-06 Thread Lars Schneider
> On 06 Mar 2018, at 23:53, Junio C Hamano wrote: > > Lars Schneider writes: > >>> Also "UTF16" or other spelling >>> the platform may support but this code fails to recognise will go >>> unchecked. >> >> That is true. However, I would assume all iconv implementations use the >> same encoding

Re: [PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-06 Thread Junio C Hamano
Lars Schneider writes: >> Also "UTF16" or other spelling >> the platform may support but this code fails to recognise will go >> unchecked. > > That is true. However, I would assume all iconv implementations use the > same encoding names for UTF encodings, no? That means UTF16 would never be > v

Re: [PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-06 Thread Lars Schneider
> On 06 Mar 2018, at 21:50, Junio C Hamano wrote: > > lars.schnei...@autodesk.com writes: > >> +int is_missing_required_utf_bom(const char *enc, const char *data, size_t >> len) >> +{ >> +return ( >> + !strcmp(enc, "UTF-16") && >> + !(has_bom_prefix(data, len, utf16_be_bom, siz

Re: [PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-06 Thread Junio C Hamano
lars.schnei...@autodesk.com writes: > +int is_missing_required_utf_bom(const char *enc, const char *data, size_t > len) > +{ > + return ( > +!strcmp(enc, "UTF-16") && > +!(has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) || > + has_bom_prefix(data, len, u

[PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-04 Thread lars . schneider
From: Lars Schneider If the endianness is not defined in the encoding name, then let's be strict and require a BOM to avoid any encoding confusion. The is_missing_required_utf_bom() function returns true if a required BOM is missing. The Unicode standard instructs to assume big-endian if there i