Re: [fossil-users] Whether or not this function examines the entire contents of the blob is officially unspecified.

2013-03-20 Thread Jan Nijtmans
2013/3/20 Martijn Coppoolse : > Hm... NUL as a *character* might be invalid, NUL as a *byte* is perfectly > valid, in UTF-16 at the least. > > Not certain which one you meant there. I mean "character", not "byte". All UTF-16 functions in fossil operate on (2-byte) characters, not bytes. Thanks!

Re: [fossil-users] Whether or not this function examines the entire contents of the blob is officially unspecified.

2013-03-20 Thread Martijn Coppoolse
On 20-3-2013 9:33, Jan Nijtmans wrote: 2013/3/20 Joe Mistachkin : Actually, given the variety of possible text encodings, we know very little with absolute certainty. That's true. The fact that NUL is no valid character in any text encoding is an absolute certainty, but other assumptions canno

Re: [fossil-users] Whether or not this function examines the entire contents of the blob is officially unspecified.

2013-03-20 Thread Eduardo Morras
On Tue, 19 Mar 2013 16:28:08 +0100 Jan Nijtmans wrote: > This line can be found twice in the fossil source code, > and it refers to the functions looks_like_utf8() and > looks_like_utf16() (src/diff.c lines 233 and 336). > > In Fossil 1.25 and earlier, looks_like_utf8/16 bailed out as soon > as

Re: [fossil-users] Whether or not this function examines the entire contents of the blob is officially unspecified.

2013-03-20 Thread Jan Nijtmans
2013/3/20 Joe Mistachkin : > Actually, given the variety of possible text encodings, we know > very little with absolute certainty. That's true. The fact that NUL is no valid character in any text encoding is an absolute certainty, but other assumptions cannot be made. > If I'm understanding you

Re: [fossil-users] Whether or not this function examines the entire contents of the blob is officially unspecified.

2013-03-19 Thread Joe Mistachkin
Jan Nijtmans wrote: > > Both NUL-bytes and long lines no longer > abort the function. This has the adverse effect that even > binary files containing NUL bytes are always scanned > completely, even though we know already that the file > is binary. > Actually, given the variety of possible text

[fossil-users] Whether or not this function examines the entire contents of the blob is officially unspecified.

2013-03-19 Thread Jan Nijtmans
This line can be found twice in the fossil source code, and it refers to the functions looks_like_utf8() and looks_like_utf16() (src/diff.c lines 233 and 336). In Fossil 1.25 and earlier, looks_like_utf8/16 bailed out as soon as either a NUL byte or a long line was encountered. Stopping at long l