On Thu, Apr 30, 2015 at 2:52 AM, James K. Lowden <jklowden at schemamania.org>
wrote:

> On Tue, 28 Apr 2015 09:24:56 +0200 Dominique Devienne <ddevienne at gmail.com>
> wrote:
> > On Tue, Apr 28, 2015 at 4:16 AM, James K. Lowden <
> jklowden at schemamania.org> wrote:
> > > A major hurdle is the memory model: because array-programming
> > > libraries normally mandate the data be in contiguous memory,
> > > there's a cost to converting to/from the DBMS's B+ tree.  The more
> > > array-like the physical storage of the DBMS, the more it cedes
> > > transactional and update capability.
> >
> > Well, just look at how Oracle solved that problem. The row data
> > doesn't hold the blob itself, like in SQLite, but an index to
> > separate blob pages. This proverbial level of indirection brings
> > tremendous benefits, because you can then update a blob w/o having to
> > rewrite the whole blob (you copy the "blob" page(s) being updated
> > only, and copy the row into a new "data" page with an updated "blob
> > index" with just a few entries changed to point to the updated pages.
>
> Does Oracle provide access to the BLOB as a contiguous memory image in
> the Oracle server's virtual address space?
>

Of course not. But the same blob APIs any OCI client uses can be used by a
server extension that runs out-of-process (if native), or in-process (if
pure Java inside the Oracle server-side JVM), and the implementation of
those APIs do take advantage of the locality. No one in his right mind
would allow arbitrary 3rd party code to run in your "enterprisy" DB server
(unless "safe" like Java or sandboxed if native; Oracle doesn't do the
latter).

But processing a page of values is still much more efficient individual
values one at a time (coming from rows), and if pages happen to be
contiguous all the better. Think of it in terms of how aggregate functions
work, if you will. Perhaps you can use your BLAS library as-is if not
contiguous, but again given the aggr-func analogy, you can allocate the
continuous buffer in the begin/init block, do efficient page-sized copies
on the step block, and do all the computation in the end/finalize block on
that temp continuous buffer. Having such blob-specific pages would enable
such a more efficient scenario.


> The benefit you're describing reduces the I/O required to update the
> BLOB and to traverse the table without referencing it.


Don't forget it also enables to grow blobs w/o writing them in full, and
update them in place transactionally, both of which SQLite does not support
right now, which is my main issue with SQLite. Blobs are basically second
class citizens in SQLite, and I'd wish that changes in future versions.


> That's a *very* old technique; it was available in Sybase in 1980s.


So? I never made any claims to the originality or age of the technique. I
merely stated that this (well known, thank you for the precision) technique
is not used by SQLite, and that using it would allow to solve the two main
issues with SQLite regarding blobs.


> To support use of
> arbitrary linear algebra functions on those BLOBs, they would have to
> supplied to user-defined functions in linear memory with known
> addresses.


See my aggregate-function analogy above. My point of that some function
will work just fine working incrementally on the blob data in page-sized
(or multiples of it) chunks. Such that you can use the blob APIs to load
the blob in user-allocated buffer, i.e. a scalar user-defined function can
drive the loop over the blob, but instead if you do some kind of "inversion
of control" where its SQLite that drives the loop of blob pages, passing
control to an aggregate-like user-defined functions that operates
specifically over blobs and there pages, SQLite can pass in a read-only
pointer to its internal page cache for example and avoid copies. Just
thinking aloud.

My main argument is that I'd like the two current limitations on blobs
(can't extend w/o fully rewriting, can't incrementally update w/o either
losing transactionality or being forced to again rewrite in full) gone from
SQLite. Because then SQLite can model a proper transactional random-access
file on top of its blobs, and I can then use SQLite blobs as the back end
to a 3rd party VFS (like for HDF5 for example). I can do that (and did)
with Oracle, and I can't do that with SQLite. And I'd really like to be
able to do it. And then blobs would no longer be second class citizens in
SQLite. --DD

Reply via email to