SQLite has built-in support for EBCDIC-based systems, but I discovered that 
it’s been broken since 3.11.0.

In particular, in February 2016, several changes were made to the SQL tokenizer 
for performance to use a character lookup table instead of a switch statement 
based on character literals in the C source; see 
http://www.sqlite.org/src/info/9115baa1919584dc 
<http://www.sqlite.org/src/info/9115baa1919584dc> and 
http://www.sqlite.org/src/info/04f7da77c13925c1 
<http://www.sqlite.org/src/info/04f7da77c13925c1>.  However, the character 
lookup table for EBCDIC appears to have several typos in it, causing several 
ubiquitous characters in SQL input (such as ‘.’) to be classified as invalid 
characters.  This results in internal low-level queries like

select sql from “main”.sqlite_master

to fail to parse, with a non-obvious error message (I guess such low-level 
queries are always expected to succeed!).  This broken character lookup table 
on EBCDIC systems causes pretty much any non-trivial SQL query to fail to 
parse, and causes for example the ‘.schema’ meta command to fail — making 
SQLite totally broken out-of-the-box on EBCDIC systems.

The problem is in the `aiClass` character properties table when `SQLITE_EBCDIC` 
is defined.  This table is defined as follows:

#ifdef SQLITE_EBCDIC
/*         x0  x1  x2  x3  x4  x5  x6  x7  x8  x9  xa  xb  xc  xd  xe  xf */
/* 0x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27,  7,  7, 27, 27,
/* 1x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* 2x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* 3x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* 4x */    7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 12, 17, 20, 10,
/* 5x */   24, 27, 27, 27, 27, 27, 27, 27, 27, 27, 15,  4, 21, 18, 19, 27,
/* 6x */   11, 16, 27, 27, 27, 27, 27, 27, 27, 27, 27, 23, 22,  1, 13,  7,
/* 7x */   27, 27, 27, 27, 27, 27, 27, 27, 27,  8,  5,  5,  5,  8, 14,  8,
/* 8x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* 9x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* 9x */   25,  1,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27, 27, 27, 27,
/* Bx */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27,  9, 27, 27, 27, 27, 27,
/* Cx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* Dx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* Ex */   27, 27,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27, 27, 27, 27,
/* Fx */    3,  3,  3,  3,  3,  3,  3,  3,  3,  3, 27, 27, 27, 27, 27, 27,
#endif

While it’s conceivable that this table was written for a different codepage 
than on the mainframe I was using, it looks more likely that there are typos in 
this table.  For example:

There are two “9x” rows in the table above; there is no “Ax” row
Assuming codepage 1047 (the most commonly used?), there are no entries in this 
table for the CC_DOT or CC_VARNUM #defines (26 and 6 respectively)
Assuming codepage 1047, the entry for the CC_TILDA #define (25) is in the wrong 
place.

(It would be helpful if the SQLite documentation indicated which EBCDIC 
codepage(s) it provided support for.)

To fix this problem, I patched the SQLite sources to change the `aiClass` 
character properties table to this:

#ifdef SQLITE_EBCDIC
/*         x0  x1  x2  x3  x4  x5  x6  x7  x8  x9  xa  xb  xc  xd  xe  xf */
/* 0x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27,  7,  7, 27, 27,
/* 1x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* 2x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* 3x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* 4x */    7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 26, 12, 17, 20, 10,
/* 5x */   24, 27, 27, 27, 27, 27, 27, 27, 27, 27, 15,  4, 21, 18, 19, 27,
/* 6x */   11, 16, 27, 27, 27, 27, 27, 27, 27, 27, 27, 23, 22,  1, 13,  6,
/* 7x */   27, 27, 27, 27, 27, 27, 27, 27, 27,  8,  5,  5,  5,  8, 14,  8,
/* 8x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* 9x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* Ax */   27, 25,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27,  9, 27, 27,
/* Bx */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* Cx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* Dx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* Ex */   27, 27,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27, 27, 27, 27,
/* Fx */    3,  3,  3,  3,  3,  3,  3,  3,  3,  3, 27, 27, 27, 27, 27, 27,
#endif

These changes fixed the SQL tokenizer problems I was seeing on the mainframe, 
and resulted in a functioning SQLite there.

Please let me know if you need more information to fix this problem.

Hope this helps,
Brad Larsen

https://en.wikipedia.org/wiki/EBCDIC_1047
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to