Re: [sqlalchemy] postgresql large objects

Mike Bayer Wed, 25 Jun 2014 12:40:24 -0700

On 6/25/14, 2:53 PM, Jason Newton wrote:
> Seems to fly in the face at the point of SQLA although integration
> difficulties are appreciated.  Most advanced postgresql drivers in any
> language bindings have added support for this type although none of
> them can hide that it's file like. 
SQLA doesn't go out of its way for unusual, very database-specific
features that up til now have demonstrated little to no real-world use
(nobody's ever asked for this feature before and googling about
Postgresql LOBs turns up very little).  There are tons of examples of
features like this across many different database backends.  If they are
easy to add, we add them, or if they are easy to address via a recipe,
we add the recipe up to the wiki.


But if the behavior requires substantial changes to the core and
dialect, and the ratio of complexity of impact to sparseness of need is
really high, it's not worth it and actually kind of damaging to most
users to complicate the library for use cases that are extremely rare
and can just as well be addressed by dropping down to raw DBAPI code. 
Complications/destabiliziations/performance degradations that are
hoisted onto the whole userbase for the benefit of a single feature that
is virtually never needed is the wrong choice to make; I'm presented
with this choice all the time and there's nearly always work to be done
in extricating ill-conceived features and behaviors that went in too
quickly.  I'm pretty confident that this feature won't require any of
that, but that remains to be seen.

Just in case this wasn't apparent, you certainly *can* use psycopg2's
bindings when you're in an otherwise SQLAlchemy app.  Worst case you can
retrieve a raw psycopg2 connection using connection.raw_connection and
do whatever you need.  If you truly have some use for LOBs, SQLAlchemy
isn't preventing you from using it, it's just not providing any niceties
around it.   The fact that these unusual use cases are not ever
prohibited by SQLA further raises the bar to adding first class support
for them.


> PG's BYTEA is NOT arbitrarily large, it has quite real practical
> limits and I've hit them regularly in storing compressed HDF5
> documents in the database as part of a bench testing framework.  The
> theoretical limit is 1GB but this limit is far less in practice
> (http://www.postgresql.org/message-id/cafj8pracfkoinp2uxeizod5krx29n2ofsoldh0w6ej7rxko...@mail.gmail.com
> ) . http://www.postgresql.org/docs/9.3/static/storage-toast.html  I'm
> not looking for streaming, retrieve/set binary buffers.  It'd be nice
> to translate it transparently to HDF5 python in-memory objects (ie h5py).
Python translation is very easy in SQLA, its just if you have special
needs for SQL syntaxes, that's where special behaviors may be needed. 
So far it sounds like the only blocking factor is that bind_sql needs to
distinguish between INSERT and UPDATE.  that's not a terribly tall order
though it is inconvenient in that the API would need a
backwards-compatibility layer.


>  
>
>
>     As the feature involves SQL functions I don't think you'd be
>     dealing only with bind_processor(), the SQL functions themselves
>     would probably be via SQL-level processing, see
>     
> http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#applying-sql-level-bind-result-processing.
>    
>     It's true these functions aren't given access to the compiler
>     context where you'd be able to discern INSERT from UPDATE, so I'd
>     probably use two variants of datatype, an InsertLOB and UpdateLOB
>     datatype, then I'd apply these to an INSERT or UPDATE statement at
>     runtime probably using a @compiles decorator -
>     http://docs.sqlalchemy.org/en/rel_0_9/core/compiler.html, that is,
>     take the existing LOB type and just swap it out for
>     InsertLOB/UpdateLOB.    Adding compiler context to TypeEngine is
>     something we could look into as a feature as well so that
>     bind_expression() has this available somehow and switching around
>     types wouldn't be needed.
>
>
> Separate datatypes is clearly not a good approach to this.

Let me clarify that these separate datatypes would be totally invisible
to the user.   The user would work with a single LOB type.  Translation
to Insert/Update versions would not be explicit and would occur at the
point at which the insert/update construct is compiled.   At the moment
this may be the best approach short of modifying the library (but then
again I can't say much about the approach because i have little idea
what the SQL we're talking about looks like).
 
> Everything starts with a transaction block. 
the psycopg2 DBAPI is implicitly in a transaction block at all times
unless "autocommit" mode is set up, so there shouldn't be any problem here.

> Hand crafted inserts use server side lo_create(-1) (in sql) which
> allocates a new large object and returning clause to get the oid in
> one go back to the client side.  Then I start using the lobject api on
> the returned oid. 
I'd like to see explicit SQL, preferably in the form of a psycopg2
script that illustrates all the operations you wish to support and
specifically how they must interact with the database.


> In patches I've made to cppdb, in the bind of std::istream I call
> client side lo_create, lo_open, a sequence of lo_writes, and finally
> lo_close.  In retrospect this doesn't handle updates particularly well
> and maybe this could be used in sqlalchemy as is too - an update just
> allocates a new large object and unreferences the old one. 
I wonder why the mechanics of these functions can't be more cleanly
wrapped into server-side stored procedures?    If they can be distilled
into simple insert_lob() update_lob() functions that would potentially
be a better separation of duties.


-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] postgresql large objects

Reply via email to