SQLite has built-in support for EBCDIC-based systems, but I discovered that 
it’s been broken since 3.11.0.  If you have an EBCDIC-based system, you can see 
the brokenness by firing up `sqlite` and trying the `.schema` metacommand – 
you’ll get an obscure error.

In detail, in February 2016, several changes were made to the SQL tokenizer for 
performance to use a character lookup table instead of a switch statement based 
on character literals in the C source; see 
http://www.sqlite.org/src/info/9115baa1919584dc 
<http://www.sqlite.org/src/info/9115baa1919584dc> and 
http://www.sqlite.org/src/info/04f7da77c13925c1 
<http://www.sqlite.org/src/info/04f7da77c13925c1>.  However, the character 
lookup table for EBCDIC appears to have several typos in it, causing several 
ubiquitous characters in SQL input (such as ‘.’) to be classified as invalid 
characters.  This results in internal low-level queries like

select sql from “main”.sqlite_master

to fail to parse, with a non-obvious error message (I guess such low-level 
queries are always expected to succeed!).  This broken character lookup table 
on EBCDIC systems causes pretty much any non-trivial SQL query to fail to 
parse, and causes for example the ‘.schema’ meta command to fail — making 
SQLite totally broken out-of-the-box on EBCDIC systems.

The problem is in the `aiClass` character properties table when `SQLITE_EBCDIC` 
is defined.  This table is defined as follows:

static const unsigned char aiClass[] = {
#ifdef SQLITE_ASCII
…
#endif
#ifdef SQLITE_EBCDIC
/*         x0  x1  x2  x3  x4  x5  x6  x7  x8  x9  xa  xb  xc  xd  xe  xf */
/* 0x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27,  7,  7, 27, 27,
/* 1x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* 2x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* 3x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* 4x */    7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 12, 17, 20, 10,
/* 5x */   24, 27, 27, 27, 27, 27, 27, 27, 27, 27, 15,  4, 21, 18, 19, 27,
/* 6x */   11, 16, 27, 27, 27, 27, 27, 27, 27, 27, 27, 23, 22,  1, 13,  7,
/* 7x */   27, 27, 27, 27, 27, 27, 27, 27, 27,  8,  5,  5,  5,  8, 14,  8,
/* 8x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* 9x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* 9x */   25,  1,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27, 27, 27, 27,
/* Bx */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27,  9, 27, 27, 27, 27, 27,
/* Cx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* Dx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* Ex */   27, 27,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27, 27, 27, 27,
/* Fx */    3,  3,  3,  3,  3,  3,  3,  3,  3,  3, 27, 27, 27, 27, 27, 27,
#endif
};

While it’s conceivable that this table was written for a different codepage 
than used by the mainframe I was using, it looks more likely that there are 
typos in this table.  For example:

There are two “9x” rows in the table above; there is no “Ax” row
There are no entries in this table for the CC_DOT or CC_VARNUM #defines (26 and 
6 respectively)
Assuming codepage 1047 (the most commonly used code page?), the entry for the 
CC_TILDA #define (25) is in the wrong place.

To fix this problem, I patched the SQLite sources to change the `aiClass` 
character properties table to this:

static const unsigned char aiClass[] = {
#ifdef SQLITE_ASCII
…
#endif
#ifdef SQLITE_EBCDIC
/*         x0  x1  x2  x3  x4  x5  x6  x7  x8  x9  xa  xb  xc  xd  xe  xf */
/* 0x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27,  7,  7, 27, 27,
/* 1x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* 2x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* 3x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* 4x */    7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 26, 12, 17, 20, 10,
/* 5x */   24, 27, 27, 27, 27, 27, 27, 27, 27, 27, 15,  4, 21, 18, 19, 27,
/* 6x */   11, 16, 27, 27, 27, 27, 27, 27, 27, 27, 27, 23, 22,  1, 13,  6,
/* 7x */   27, 27, 27, 27, 27, 27, 27, 27, 27,  8,  5,  5,  5,  8, 14,  8,
/* 8x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* 9x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* Ax */   27, 25,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27,  9, 27, 27,
/* Bx */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
/* Cx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* Dx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
/* Ex */   27, 27,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27, 27, 27, 27,
/* Fx */    3,  3,  3,  3,  3,  3,  3,  3,  3,  3, 27, 27, 27, 27, 27, 27,
#endif
};

These changes fixed the SQL tokenizer problems I was seeing my EBCDIC-based 
system, and resulted in a functioning SQLite there.

Please let me know if more is needed to fix this bug.

Hope this helps,
Brad Larsen

https://en.wikipedia.org/wiki/EBCDIC_1047 
<https://en.wikipedia.org/wiki/EBCDIC_1047>

P.S.  It would be helpful if the SQLite documentation indicated which EBCDIC 
codepage(s) were supported.
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to