Re: [sqlalchemy] postgresql large objects

Mike Bayer Wed, 25 Jun 2014 06:47:36 -0700

On 6/25/14, 2:35 AM, Jason Newton wrote:
> Hi,
>
> I'm wondering how I might get at postgresql's large object type
> (lob).  It's not to be confused with the TOASTED bytea, which are
> limited currently to 1 GiB yet in practice is much lower (for me
> <400MiB)  - it's a special table + api designed to handle very large
> binary objects, like a few hundred MiBs to more recently TiBs. I don't
> see appropriate definitions anywhere and can't find any mention of it
> really with sqlalchemy. psycopg2 has support for it and calls it
> lobject, it provides a file like interface to the lob which is a good
> mapping since with libpgsql you use lo_creat, lo_seek, lo_write,
> lo_read to work with these beasts
>
> I took a look at UserDefinedType but on the bind_processor, this
> doesn't distinguish between inserts and updates.  With inserts, you'd
> use an oid allocated from lo_creat in the transaction.  On updates,
> you'd use lo_trunc/lo_write.  As one more constraint, you must be in a
> transaction before any of these functions are usable.  To reference
> large objects, as they are explicitly an out of table storage, the
> postgresql specific oid is used (which allows garbage collection,
> referential integrity checks etc).
>
> I'll also mention that other tables reference these large objects via
> oids, something like smart pointers in postgres.
>
> It'd be great to plug large objects into sqlalchemy properly - but can
> it be done?
well we just added the OID type in 0.9.5, so you at least have that.


The PG LOB feature is very sparsely documented - on PG's docs, they only
seem to document the C API
(www.postgresql.org/docs/9.2/static/lo-interfaces.html), feel free to
point me to better documentation on this.   

As far as transparently embedding this into INSERT/UPDATE, my first
thought is that this might be unwise considering how far removed these
functions are from any kind of mainstream usage in such statements -
particularly if separate statements have to be called per value to get
at OIDs or similar.    That PG's docs barely mention this whole feature
much less any kind of regular SQL integration is a red flag.   PG's
BYTEA type is already arbitrarily large so there is probably not much
interest in a type like this.    If it's the "streaming" feature you're
looking for, SQLA's usual approach such as that of Oracle's LOB is to
"pre-stream" it on the result set side (necessary, because fetching a
batch of rows requires it), and cx_Oracle doesn't provide too much
option to stream on the write side.   I've dealt a lot with "streaming"
datatypes back in the day but sometime in the early 2000's everyone just
stopped using them.

As the feature involves SQL functions I don't think you'd be dealing
only with bind_processor(), the SQL functions themselves would probably
be via SQL-level processing, see
http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#applying-sql-level-bind-result-processing.
   
It's true these functions aren't given access to the compiler context
where you'd be able to discern INSERT from UPDATE, so I'd probably use
two variants of datatype, an InsertLOB and UpdateLOB datatype, then I'd
apply these to an INSERT or UPDATE statement at runtime probably using a
@compiles decorator -
http://docs.sqlalchemy.org/en/rel_0_9/core/compiler.html, that is, take
the existing LOB type and just swap it out for InsertLOB/UpdateLOB.   
Adding compiler context to TypeEngine is something we could look into as
a feature as well so that bind_expression() has this available somehow
and switching around types wouldn't be needed.

building this would involve stringing together hooks that are not quite
set up for this, however for me to really assist here I'd need to see
exact examples of what INSERT, UPDATE and anything else looks like in
conjunction with these functions.

I would note that psycopg2 itself also provides for extension types,
including custom Connection and Cursor subclasses.   If a lot of things
have to happen when these types are in play it might be something that
can occur at that level, PG's type API is obviously a lot more PG specific.



>
> -Jason
> -- 
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+unsubscr...@googlegroups.com
> <mailto:sqlalchemy+unsubscr...@googlegroups.com>.
> To post to this group, send email to sqlalchemy@googlegroups.com
> <mailto:sqlalchemy@googlegroups.com>.
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] postgresql large objects

Reply via email to