On Feb 12, 2013, at 3:22 PM, Andre Charbonneau <andre.charbonn...@nrc-cnrc.gc.ca> wrote:
> Greetings everyone, > I have a piece of code in a web app where I need to store a large binary > file (uploaded file stored on disk by Apache server), into an object's > LargeBinary attribute. > > That's pretty easy to do with a syntax like: > > myobject.uploaded_file = xyz.file.read() > > > The problem is that I don't want to load the entire file into memory > when I set the LargeBinary attribute. > > > > If my understanding is correct, the above call will first cause the > entire content of the uploaded file to be loaded into memory and then > that is assigned to the myobject.uploaded_file LargeBinary attribute. > Correct? > > (Then when sqlalchemy eventually issues the INSERT statement to store > the data in the database... But then I don't really know how the data > transfer is done...) > > > I have tried to find another way of passing the data to the LargeBinary > object that would not have to load the entire file into memory at once, > but stream it in chunks during the INSERT statement, but I was not able > to find anything. :-( In the old days these streaming binary interfaces were common, but as memory has become plentiful you don't see them used anymore. Even systems like Oracle, you see client libraries reliant upon having to set the allowed size of memory to be bigger than the largest value you need to store. psycopg2 does work with "buffer()" and "memoryview()" objects as the input to a "bytea" column, and you could send these in as arguments where SQLAlchemy should pass them through (or if not, its easy to make a custom type that passes it through). Though these objects don't appear to work around having to load the data into memory, they just make memory usage more efficient by removing the need for it to be copied internally. I'm not intimately familiar with them enough to know if they support some way to "stream" from a file handle or not. There's also a facility I've not previously heard of in Postgresql and psycopg2 called the "large object" system, which appears to be an entirely separate table "pg_largeobject" that stores them. Dropping into psycopg2, you can store and retrieve these objects using the object interface: http://initd.org/psycopg/docs/connection.html#connection.lobject as far as how to get that data into your table, it seems like you'd need to link to the OID of your large object, rather than using bytea: http://www.postgresql.org/docs/current/static/lo-funcs.html . So you'd need to forego the usage of bytea. Again SQLAlchemy types could be created which transparently perform these tasks against the OID. I'd ask on the psycopg2 list as to which feature they recommend, and I'm betting they will likely say that memory is very cheap and plentiful these days and you should just assume the data will be fit into memory. -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.