Re: [sqlalchemy] postgresql large objects

Jason Newton Wed, 25 Jun 2014 11:53:46 -0700

Hi,

I've replied inline below.

On Wed, Jun 25, 2014 at 6:46 AM, Mike Bayer <mike...@zzzcomputing.com>
wrote:

>
>  well we just added the OID type in 0.9.5, so you at least have that.
>

I came across the entry on the issue tracker a little bit after
submitting.  As usual for me, it's support wasn't added very long ago.

> The PG LOB feature is very sparsely documented - on PG's docs, they only
> seem to document the C API (
> www.postgresql.org/docs/9.2/static/lo-interfaces.html), feel free to
> point me to better documentation on this.
>

There's also server side functions, they operate almost exactly the same as
client side api: http://www.postgresql.org/docs/9.3/static/lo-funcs.html .
There is no better documentation than those two that I know of, but they
were sufficient for me.

> As far as transparently embedding this into INSERT/UPDATE, my first
> thought is that this might be unwise considering how far removed these
> functions are from any kind of mainstream usage in such statements -
> particularly if separate statements have to be called per value to get at
> OIDs or similar.    That PG's docs barely mention this whole feature much
> less any kind of regular SQL integration is a red flag.   PG's BYTEA type
> is already arbitrarily large so there is probably not much interest in a
> type like this.    If it's the "streaming" feature you're looking for,
> SQLA's usual approach such as that of Oracle's LOB is to "pre-stream" it on
> the result set side (necessary, because fetching a batch of rows requires
> it), and cx_Oracle doesn't provide too much option to stream on the write
> side.   I've dealt a lot with "streaming" datatypes back in the day but
> sometime in the early 2000's everyone just stopped using them.
>

Seems to fly in the face at the point of SQLA although integration
difficulties are appreciated.  Most advanced postgresql drivers in any
language bindings have added support for this type although none of them
can hide that it's file like.  PG's BYTEA is NOT arbitrarily large, it has
quite real practical limits and I've hit them regularly in storing
compressed HDF5 documents in the database as part of a bench testing
framework.  The theoretical limit is 1GB but this limit is far less in
practice (
http://www.postgresql.org/message-id/cafj8pracfkoinp2uxeizod5krx29n2ofsoldh0w6ej7rxko...@mail.gmail.com
) . http://www.postgresql.org/docs/9.3/static/storage-toast.html  I'm not
looking for streaming, retrieve/set binary buffers.  It'd be nice to
translate it transparently to HDF5 python in-memory objects (ie h5py).

>
> As the feature involves SQL functions I don't think you'd be dealing only
> with bind_processor(), the SQL functions themselves would probably be via
> SQL-level processing, see
> http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#applying-sql-level-bind-result-processing.
> It's true these functions aren't given access to the compiler context where
> you'd be able to discern INSERT from UPDATE, so I'd probably use two
> variants of datatype, an InsertLOB and UpdateLOB datatype, then I'd apply
> these to an INSERT or UPDATE statement at runtime probably using a
> @compiles decorator -
> http://docs.sqlalchemy.org/en/rel_0_9/core/compiler.html, that is, take
> the existing LOB type and just swap it out for InsertLOB/UpdateLOB.
> Adding compiler context to TypeEngine is something we could look into as a
> feature as well so that bind_expression() has this available somehow and
> switching around types wouldn't be needed.
>

Separate datatypes is clearly not a good approach to this.

> building this would involve stringing together hooks that are not quite
> set up for this, however for me to really assist here I'd need to see exact
> examples of what INSERT, UPDATE and anything else looks like in conjunction
> with these functions.
>

Everything starts with a transaction block.  Hand crafted inserts use
server side lo_create(-1) (in sql) which allocates a new large object and
returning clause to get the oid in one go back to the client side.  Then I
start using the lobject api on the returned oid.   In patches I've made to
cppdb, in the bind of std::istream I call client side lo_create, lo_open, a
sequence of lo_writes, and finally lo_close.  In retrospect this doesn't
handle updates particularly well and maybe this could be used in sqlalchemy
as is too - an update just allocates a new large object and unreferences
the old one.  This would leave it up to vacuumlo to GC those "updated"
lobs.  In my work load however, my lobs so far are immutable - they are
results of simulations and this is the way I've worked to date.  It
probably won't stay that way forever.

>
> I would note that psycopg2 itself also provides for extension types,
> including custom Connection and Cursor subclasses.   If a lot of things
> have to happen when these types are in play it might be something that can
> occur at that level, PG's type API is obviously a lot more PG specific.
>
>
-Jason

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] postgresql large objects

Reply via email to