On Wed, Jun 03, 2015 at 03:55:04PM -0700, Darko Volaric wrote:
> I've tackled this problem from a couple of different angles. My goal was to
> allow arbitrary user defined types, based on the builtin types (essentially
> subtypes of the existing types), with a minimum of work and minimum
> disruption of the normal/existing use of the database and API.
> 
> The approaches I considered were:

A UDT can't be first-class without modifying SQLite3.  Without
first-class UDT support the application has to be responsible to some
degree for adhering to whatever encoding conventions it chooses to use.

User-defined functions, collations, and virtual tables can be used to
move some of the burden from the application to the UDFs and VTs, but
you can't move all of it (e.g., because whatever SQLite3 type you use to
encode UDTs will often have other uses, leading to an aliasing problem
that requires the application to avoid it).

> - encoding the user type codes for each data column in a separate column
> dedicated to the purpose. This is a low impact but cumbersome, for instance
> using a function that interprets the user type would have to have the user
> type passed in for each argument, along with the actual data.

It's cumbersome because it creates a referential integrity problem.

> - modifying the data file format to carry user type information. There is
> space in the record/row header where you can encode this information in a
> backwards compatible way, but the source code for data record access is not
> friendly, basically a dense blob of code with a lot of integer literals
> which are all very important, but it's hard to be sure what they entail and
> that you haven't introduced a subtle bug and ultimately data corruption.
> Additionally the user type would have to be passed around internally - for
> example in the sqlite3_value object - and tracking down all of those
> reliably is a bit of work.

Right, you'd basically be talking about adding new first-class types to
SQLite3.  That's quite an understaking and not for the faint of heart.
Even if you tackle this, chances are it'd be very difficult to get the
SQLite3 dev team to accept the changes -- one would be forking SQLite3,
and that requires serious (read: lots of experienced software engineer
time) effort to develop and maintain.

> - using blobs. Although using text representation is friendly when looking
> at the data with standard tools, it's slower and takes up more memory in
> various places. I found that encoding some user types as blobs with a type
> marker at their start (a single byte with extensions) and interpreting them
> was a simple and low impact approach. [...]

Encoding as text or blobs is about your only realistic option.  Enums
can be encoded as numbers too, as can small bitsets.

> The first option isn't very practical. The second option is the fastest and
> most robust solution and my long term approach which I will be going back
> to after development has progressed a bit more. Currently I'm using the
> third approach as an interim measure. I'm supporting arbitrary prec ints
> and reals, arrays and tuples and other types this way.

At that point why not just switch to Postgres?

Nico
-- 

Reply via email to