Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Gary R. Schmidt
On 27/01/2018 05:32, Peter Da Silva wrote: On 1/26/18, 12:31 PM, "sqlite-users on behalf of J Decker" wrote: ctrl-z was end of file text character in DOS (wrote char 26; not FF) DOS wasn't an operating system. That will come as a surprise to the people who used DOS/360 and DOS/VSE and th

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread petern
tyle-Wide-String is defined as a >> > > "bunch-a-non-zero-words-terminated-by-a-zero-word", then how is it >> > > possible to have a zero/null word "embedded" within a >> > C-Style-Wide-String? >> > > >> > > Given that SQLite3 i

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
e String? > > > > > > > > Similarly, if a C-Style-Wide-String is defined as a > > > > "bunch-a-non-zero-words-terminated-by-a-zero-word", then how is it > > > > possible to have a zero/null word "embedded" within a > > > C-Style-Wide-String? > > &

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread petern
es C-Strings or > > > C-Style-Wide-Strings, then you cannot have zero/null bytes embedded in > > > those strings. > > > > > > You may of course argue that perhaps SQLite3 should use something other > > > than C-Style-Strings, however, this is not what seems

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
posed. It > > seems to be proposing the use of some magical C-Style-String that is not > > actually a C-Style-String, without explicitly stating this. > > > > SQLite3 does handle non-C-Ctyle-Strings. They are called "blobs". > > > > --- > > The fact that th

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread petern
to Hell but only a Stairway to Heaven says > a lot about anticipated traffic volume. > > > >-Original Message- > >From: sqlite-users [mailto:sqlite-users- > >boun...@mailinglists.sqlite.org] On Behalf Of J Decker > >Sent: Friday, 26 January, 2018 17:18 > >

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
r a decade or more as char. > --- > The fact that there's a Highway to Hell but only a Stairway to Heaven says > a lot about anticipated traffic volume. > > > >-Original Message- > >From: sqlite-users [mailto:sqlite-users- > >boun...@mailinglists.sqlite.org] On Be

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Keith Medcalf
On Behalf Of J Decker >Sent: Friday, 26 January, 2018 17:18 >To: SQLite mailing list >Subject: Re: [sqlite] UTF8 and NUL > >On Fri, Jan 26, 2018 at 3:56 PM, Peter Da Silva < >peter.dasi...@flightaware.com> wrote: > >> On 2018-01-26, at 17:05, J Decker wrote:

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
On Fri, Jan 26, 2018 at 3:56 PM, Peter Da Silva < peter.dasi...@flightaware.com> wrote: > On 2018-01-26, at 17:05, J Decker wrote: > > On Fri, Jan 26, 2018 at 1:21 PM, Peter Da Silva < > > peter.dasi...@flightaware.com> wrote: > >> Sqlite uses NUL as the string terminator internally, the publishe

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
On 2018-01-26, at 17:05, J Decker wrote: > On Fri, Jan 26, 2018 at 1:21 PM, Peter Da Silva < > peter.dasi...@flightaware.com> wrote: >> Sqlite uses NUL as the string terminator internally, the published API >> specifies has stuff like this all over the place: >>> In those routines that have a fou

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
On Fri, Jan 26, 2018 at 1:21 PM, Peter Da Silva < peter.dasi...@flightaware.com> wrote: > Sqlite uses NUL as the string terminator internally, the published API > specifies has stuff like this all over the place: > > > In those routines that have a fourth argument, its value is the number > of byt

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Simon Slavin
On 26 Jan 2018, at 9:04pm, J Decker wrote: > I bet windows command line tools still use it because copy has /B and /A on > windows 10. Windows is indeed a problem. I don't know enough about it to know whether the above statement outlines the problem but Windows in general is terrifically diff

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
Sqlite uses NUL as the string terminator internally, the published API specifies has stuff like this all over the place: > In those routines that have a fourth argument, its value is the number of > bytes in the parameter. To be clear: the value is the number of bytes in the > value, not the nu

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
On Fri, Jan 26, 2018 at 11:41 AM, Peter Da Silva < peter.dasi...@flightaware.com> wrote: > On 1/26/18, 1:37 PM, "sqlite-users on behalf of J Decker" < > sqlite-users-boun...@mailinglists.sqlite.org on behalf of d3c...@gmail.com> > wrote: > >doesn't get 26 either. 0x1a > > 26 isn't EOF, it's SU

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
On 1/26/18, 2:34 PM, "sqlite-users on behalf of J. King" wrote: > Do you have a point in making either statement? If you do, I'm really not > seeing it. The point is that apart from CP/M and derivatives like DOS, this kind of behavior is strictly a leftover from the '60s. And CP/M only had th

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J. King
On 2018-01-26 15:13:46, "Peter Da Silva" wrote: On 1/26/18, 2:11 PM, "sqlite-users on behalf of John McKown" john.archie.mck...@gmail.com> wrote: ​In the distant past (CP/M-80), the filesystem meta data did not include the actual _length_ of the data for a text data file. Since DOS wasn't a

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
On 1/26/18, 2:11 PM, "sqlite-users on behalf of John McKown" wrote: > ​In the distant past (CP/M-80), the filesystem meta data did not include the > actual _length_ of the data for a text data file. Since DOS wasn't an OS, then CP/M certainly wasn't. _

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread John McKown
On Fri, Jan 26, 2018 at 1:41 PM, Peter Da Silva < peter.dasi...@flightaware.com> wr > On 1/26/18, 1:37 PM, "sqlite-users on behalf of J Decker" < > sqlite-users-boun...@mailinglists.sqlite.org on behalf of d3c...@gmail.com> > wrote: > >doesn't get 26 either. 0x1a > > 26 isn't EOF, it's SUB (su

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
On 1/26/18, 1:37 PM, "sqlite-users on behalf of J Decker" wrote: >doesn't get 26 either. 0x1a 26 isn't EOF, it's SUB (substitute). It was used to represent untranslatable characters when converting (for example) EBCDIC to ASCII. ___ sqlite-users

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
On Fri, Jan 26, 2018 at 10:44 AM, Peter Da Silva < peter.dasi...@flightaware.com> wrote: > On 1/26/18, 12:40 PM, "sqlite-users on behalf of J Decker" < > sqlite-users-boun...@mailinglists.sqlite.org on behalf of d3c...@gmail.com> > wrote: > > reads the bytes and does things with them. the EOF wo

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
On 1/26/18, 12:40 PM, "sqlite-users on behalf of J Decker" wrote: > reads the bytes and does things with them. the EOF would get returned with > fgetc() but not the character. Fgetc returns an int, not a byte. That EOF is -1, not 0xFF. ___ sqlit

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
On Fri, Jan 26, 2018 at 10:35 AM, Tim Streater wrote: > On 26 Jan 2018, at 18:12, Keith Medcalf wrote: > > > Actually, EOF (0xFF) *is* part of a text file, and is the byte in an > ASCII > > byte-stream that indicates end-of-file. > > First I've heard of that. Which systems did that then? EOF is

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Tim Streater
On 26 Jan 2018, at 18:12, Keith Medcalf wrote: > Actually, EOF (0xFF) *is* part of a text file, and is the byte in an ASCII > byte-stream that indicates end-of-file. First I've heard of that. Which systems did that then? EOF is normally indicated by the file system, not by file data. -- Chee

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
On 1/26/18, 12:31 PM, "sqlite-users on behalf of J Decker" wrote: > ctrl-z was end of file text character in DOS (wrote char 26; not FF) DOS wasn't an operating system. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://maili

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
On Fri, Jan 26, 2018 at 10:22 AM, Peter Da Silva < peter.dasi...@flightaware.com> wrote: > On 1/26/18, 12:12 PM, "sqlite-users on behalf of Keith Medcalf" < > sqlite-users-boun...@mailinglists.sqlite.org on behalf of > kmedc...@dessus.com> wrote: > > Actually, EOF (0xFF) *is* part of a text file,

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
On 1/26/18, 12:12 PM, "sqlite-users on behalf of Keith Medcalf" wrote: > Actually, EOF (0xFF) *is* part of a text file, and is the byte in an ASCII > byte-stream that indicates end-of-file. In the "old days" the bytes > following the last-byte in a stream and the end of a storage block > (se

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Keith Medcalf
:) ). --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of Peter Da Silva >Sent: Friday, 26 Janua

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
On Fri, Jan 26, 2018 at 5:55 AM, Peter Da Silva < peter.dasi...@flightaware.com> wrote: > What is the goal of this discussion? Changing the string terminator SQLite > uses? I think it's almost 50 years too late for that, but I'm sure that if > Unicode and UTF8 had been a thing in 1970 then C would

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
On 1/26/18, 8:24 AM, "sqlite-users on behalf of Gary R. Schmidt" wrote: > But how would you differentiate EOF??? (Let me guess, 0. :-) ) End of file is not part of the contents of the file or a string. It's metadata. ___ sqlite-users mailing li

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Gary R. Schmidt
On 27/01/2018 00:55, Peter Da Silva wrote: What is the goal of this discussion? Changing the string terminator SQLite uses? I think it's almost 50 years too late for that, but I'm sure that if Unicode and UTF8 had been a thing in 1970 then C would have selected FF as the string terminator. But

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
What is the goal of this discussion? Changing the string terminator SQLite uses? I think it's almost 50 years too late for that, but I'm sure that if Unicode and UTF8 had been a thing in 1970 then C would have selected FF as the string terminator. __

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Clemens Ladisch
J Decker wrote: > U+009C 156 String Terminator ST "ST is used as the closing delimiter of a control string opened by APPLICATION PROGRAM COMMAND (APC), DEVICE CONTROL STRING (DCS), OPERATING SYSTEM COMMAND (OSC), PRIVACY MESSAGE (PM), or START OF STRING (SOS)." Regards, Clemens _

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
https://en.wikipedia.org/wiki/List_of_Unicode_characters#Control_codes Even the Control codes within unicode aren't FF. U+009C 156 String Terminator ST literal bytes \xC2\x9c are string terminator ... Was thinking that like APC and ST were higher than that... more in the range of 0xF8-0xFF On

[sqlite] UTF8 and NUL

2018-01-25 Thread J Decker
NUL is a valid utf8 character but FF is never valid. (would be like a 36 bit length specification) and practically anthing more than F8 is invalid utf8 character. Other than BOM https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8 EF BB BF 239 187 191 // EF - 80 | 3b - 80 | 3f ( 0xfeff ) Many W