On Feb 12, 2013, at 3:22 PM, Andre Charbonneau 
<andre.charbonn...@nrc-cnrc.gc.ca> wrote:

> Greetings everyone,
> I have a piece of code in a web app where I need to store a large binary
> file (uploaded file stored on disk by Apache server), into an object's
> LargeBinary attribute.
> 
> That's pretty easy to do with a syntax like:
> 
>    myobject.uploaded_file = xyz.file.read()
> 
> 
> The problem is that I don't want to load the entire file into memory
> when I set the LargeBinary attribute.
> 
> 
> 
> If my understanding is correct, the above call will first cause the
> entire content of the uploaded file to be loaded into memory and then
> that is assigned to the myobject.uploaded_file LargeBinary attribute. 
> Correct?
> 
> (Then when sqlalchemy eventually issues the INSERT statement to store
> the data in the database... But then I don't really know how the data
> transfer is done...)
> 
> 
> I have tried to find another way of passing the data to the LargeBinary
> object that would not have to load the entire file into memory at once,
> but stream it in chunks during the INSERT statement, but I was not able
> to find anything. :-(

In the old days these streaming binary interfaces were common, but as memory 
has become plentiful you don't see them used anymore.  Even systems like 
Oracle, you see client libraries reliant upon having to set the allowed size of 
memory to be bigger than the largest value you need to store.

psycopg2 does work with "buffer()" and "memoryview()" objects as the input to a 
"bytea" column, and you could send these in as arguments where SQLAlchemy 
should pass them through (or if not, its easy to make a custom type that passes 
it through).  Though these objects don't appear to work around having to load 
the data into memory, they just make memory usage more efficient by removing 
the need for it to be copied internally.     I'm not intimately familiar with 
them enough to know if they support some way to "stream" from a file handle or 
not.

There's also a facility I've not previously heard of in Postgresql and psycopg2 
called the "large object" system, which appears to be an entirely separate 
table "pg_largeobject" that stores them.  Dropping into psycopg2, you can store 
and retrieve these objects using the object interface:

http://initd.org/psycopg/docs/connection.html#connection.lobject

as far as how to get that data into your table, it seems like you'd need to 
link to the OID of your large object, rather than using bytea: 
http://www.postgresql.org/docs/current/static/lo-funcs.html .  So you'd need to 
forego the usage of bytea.   Again SQLAlchemy types could be created which 
transparently perform these tasks against the OID.

I'd ask on the psycopg2 list as to which feature they recommend, and I'm 
betting they will likely say that memory is very cheap and plentiful these days 
and you should just assume the data will be fit into memory.


-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to