On Fri, Jan 26, 2018 at 11:41 AM, Peter Da Silva < peter.dasi...@flightaware.com> wrote:
> On 1/26/18, 1:37 PM, "sqlite-users on behalf of J Decker" < > sqlite-users-boun...@mailinglists.sqlite.org on behalf of d3c...@gmail.com> > wrote: > > doesn't get 26 either. 0x1a > > 26 isn't EOF, it's SUB (substitute). It was used to represent > untranslatable characters when converting (for example) EBCDIC to ASCII. > > I gave up ever using "rt" or "wt" because it IS EOF; depending on the system. I bet windows command line tools still use it because copy has /B and /A on windows 10. (interject, edit: The effect of */a* depends on its position in the command-line string. When */a* follows *Source*, *copy* treats the file as an ASCII file and copies data that precedes the first end-of-file character. https://en.wikipedia.org/wiki/End-of-file "In Microsoft's DOS <https://en.wikipedia.org/wiki/DOS> and Windows <https://en.wikipedia.org/wiki/Microsoft_Windows> (and in CP/M <https://en.wikipedia.org/wiki/CP/M> and many DEC <https://en.wikipedia.org/wiki/Digital_Equipment_Corporation> operating systems), reading from the terminal will never produce an EOF. Instead, programs recognize that the source is a terminal (or other "character device") and interpret a given reserved character or sequence as an end-of-file indicator; most commonly this is an *ASCII <https://en.wikipedia.org/wiki/ASCII> Control-Z <https://en.wikipedia.org/wiki/Substitute_character>**, code 26. *Some MS-DOS programs, including parts of the Microsoft MS-DOS shell (COMMAND.COM <https://en.wikipedia.org/wiki/COMMAND.COM>) and operating-system utility programs (such as EDLIN <https://en.wikipedia.org/wiki/EDLIN>), treat a Control-Z in a text file as marking the end of meaningful data, and/or append a Control-Z to the end when writing a text file. This was done for two reasons:" ... ASCII <https://en.wikipedia.org/wiki/ASCII> Control-Z <https://en.wikipedia.org/wiki/Substitute_character>, code 26. .... ) I understand probably 0xFF on punch cards was good because you could just knock out all the holes and make a correction; and that could be an EOF on other systems. unless like O_BINARY was used. so now we just use files as binary and get the length from the system; and don't expect any transformations on our data. ------ More on my point though Sqlite results with values with sqlite3_column_text(stmt,n) and sqlite3_column_bytes(stmt,n) so any data including NUL from bound or otherwise values is returned. strcmp() would have an issue. Even StrNCmp() and really you need a comparison that inludes length of both strings. Strlen is used constantly to find lengths of column, table, and function names for things that should already be known. It's not like there's a lot of copying of those; the net effect is more speec; because it's not even 'strlen' that can be auto-intrinsic-inlined, but a fancy function that sanitizes the length (sqlite3StrLen()). The commands for SQL LENGTH, RTRIM, LTRIM, QUOTE.... that deal with strings... MySQL returns bytes for length. Sqlite returns characters. and all string functions work on characters, which means sqltie has to understand UTF8 characters.... I wouldn't use any of those functions except in a one-of script because they are non portable. But they are non conformant because they do support a basic way of skipping utf characters.... 0x9X arbitrarily is also not a valid UTF8 character (it's a continutation that had no leadin length). SO that makes even the unicode escapes in the range of 0x9X also available to encode as bytes, kinda OOB with the data. Invalid characters (overlong and otherwise) should be replaced with FFFD http://unicode.org/pipermail/unicode/2017-May/005522.html (from this thread, sort of; was on ill formed utf-8, really the past of this thread but didn't find it) https://www.fileformat.info/info/unicode/char/fffd/index.htm Comments used to replace an incoming character whose value is unknown or unrepresentable in Unicode compare the use of U+001A <https://www.fileformat.info/info/unicode/char/001a/index.htm> as a control character to indicate the substitute function (I would have said, "0xFEFF ? ZWNBSP zero width non breaking space(?) EF BB BF " but went and searched and found it was different than I thought ) A quick note about UTF8; every byte has one bit off. My initial impression was that it shouldn't care, being basically a smart storage engine; since what I put in I could get back out. Having patched the input side to escape ' and NUL in string values, I don't need my larger patch. But then having looked through so much of the string handling, the overall effect is still positive. Then there's internal logging and analysis, which should also escape the output for strings, there IS a SQL way to include char(0). can't really change sqlite3_column_text at this point; which means no matter how much it is enforced and made harder to not count 0 as a character, it doesn't matter, because it still will be. ( How are you? Как дела?) sqlite3 test.db create table test(a); insert into test (a) values ('Как дела'); select length(a),a 8|??? ???? (bytes in db) 02 1D 3F 3F 3F 20 3F 3F 3F 3F ◙♂☻↔??? ???? (or done with sqlite3 test.db < test.sql ) where test.sql was the above... (on terminal) 8|╨Ü╨░╨║ ╨┤╨╡╨╗╨░ sqlite3 test.db < test.sql > test.out ) 8|Как дела (bytes in db) 11 0E 02 2B D0 9A D0 B0 D0 BA 20 │ D0 B4 D0 B5 D0 BB D0 B0 ◄♫☻+Как дела _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users