> On Dec 11, 2016, at 7:57 PM, Richard Hipp <d...@sqlite.org> wrote: > > On 12/11/16, Bradford Larsen <brad.lar...@gmail.com> wrote: > >> #endif >> #ifdef SQLITE_EBCDIC >> /* x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xa xb xc xd xe xf */ >> /* 0x */ 27, 27, 27, 27, 27, 7, 27, 27, 27, 27, 27, 27, 7, 7, 27, 27, >> /* 1x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, >> /* 2x */ 27, 27, 27, 27, 27, 7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, >> /* 3x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, >> /* 4x */ 7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 26, 12, 17, 20, 10, >> /* 5x */ 24, 27, 27, 27, 27, 27, 27, 27, 27, 27, 15, 4, 21, 18, 19, 27, >> /* 6x */ 11, 16, 27, 27, 27, 27, 27, 27, 27, 27, 27, 23, 22, 1, 13, 6, >> /* 7x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 8, 5, 5, 5, 8, 14, 8, >> /* 8x */ 27, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 27, 27, 27, 27, 27, >> /* 9x */ 27, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 27, 27, 27, 27, 27, >> /* Ax */ 27, 25, 1, 1, 1, 1, 1, 0, 1, 1, 27, 27, 27, 9, 27, 27, >> /* Bx */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, >> /* Cx */ 27, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 27, 27, 27, 27, 27, >> /* Dx */ 27, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 27, 27, 27, 27, 27, >> /* Ex */ 27, 27, 1, 1, 1, 1, 1, 0, 1, 1, 27, 27, 27, 27, 27, 27, >> /* Fx */ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 27, 27, 27, 27, 27, 27, >> #endif >> }; >> >> These changes fixed the SQL tokenizer problems I was seeing my EBCDIC-based >> system, and resulted in a functioning SQLite there. >> >> Please let me know if more is needed to fix this bug. > > Thanks for the suggested fix. As you probably infer, we do not have > access to an EBCDIC system for testing. > > I agree with most of your changes. But I wonder about moving the > QUOTE2 (the '[' character) value from code 0xba over to 0xad. > According to EBCDIC chart at https://en.wikipedia.org/wiki/EBCDIC the > '[' character should be at 0xba. Is wikipedia wrong? Is there > something else about the '[' character in EBCDIC that we should know > about?
There are many flavors of EBCDIC; the details depend on which code page is used. The mainframes I’ve used have all been configured for code page 1047 (the “Latin 1/Open System” code page). The table in the Wikipedia page for EBCDIC (https://en.wikipedia.org/wiki/EBCDIC <https://en.wikipedia.org/wiki/EBCDIC>) is for code page 37. You’re right that the ‘[‘ character has value 0xba in code page 37, but in code page 1047, the ‘[‘ character has value 0xad (https://en.wikipedia.org/wiki/EBCDIC_1047 <https://en.wikipedia.org/wiki/EBCDIC_1047>). See also https://en.wikipedia.org/wiki/Code_page#EBCDIC-based_code_pages <https://en.wikipedia.org/wiki/Code_page#EBCDIC-based_code_pages>. I’m afraid the entire EBCDIC situation is complicated, and there isn’t a single definition of EBCDIC like there is ASCII — in different EBCDIC code pages, characters will have different values. Code page 1047 may be the most common native code page these days — it’s the code page used by all the mainframes I’ve touched. (It’s the default code page for C source code as understood by the IBM XL C/C++ compiler, though this does not preclude *running* with a different code page than was used for building.) Either way, it looks like (a) the code page used by SQLite on EBCDIC systems is ambiguous, and (b) whatever code page you pick, there seem to be typos in the relevant tables in the SQLite source. The result is a broken SQL tokenizer on EBCDIC-based systems. I think a particular code page should be chosen, and the SQLite sources & docs should be clear about which code page is used. Maybe there's someone else out there who has expertise with EBCDIC who will comment. Best, Brad Larsen _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users