Re: [sqlalchemy] Handling big Python objects

Andrea Gavana Wed, 03 Dec 2014 15:41:59 -0800

Hi,

On Thursday, December 4, 2014 12:02:42 AM UTC+1, Ams Fwd wrote:
>
> On 12/3/14 2:23 PM, Andrea Gavana wrote: 
> > 
> > 
> > On Wednesday, December 3, 2014 10:42:27 PM UTC+1, Jonathan Vanasco 
> wrote: 
> > 
> > 
> > 
> >     On Wednesday, December 3, 2014 4:23:31 PM UTC-5, Ams Fwd wrote: 
> > 
> >         I would recommend just storing them on disk and let the OS VMM 
> >         deal with 
> >         caching for speed. If you are not constrained for space I would 
> >         recommend not zlib-ing it either. 
> > 
> > 
> >     I'll second storing them to disk.  Large object support in all the 
> >     databases is a pain and not very optimal.  Just pickle/unpickle a 
> >     file and use the db to manage that file. 
> > 
> > 
> > 
> > Thanks to all of you who replied. A couple of issues that I'm sure I 
> > will encounter by letting the files on disk: 
> > 
> > 1. Other users can easily delete/overwrite/rename the files on disk, 
> > which is something we really, really do not want; 
>
> If this is windows group policies are your friends :). If this is linux, 
> permissions with a secondary service to access the files are a decent 
> choice. 
>



Yes, this is Windows, but no, I can't go around and tell the users that the 
simulation they just saved is not accessible anymore. The database is part 
of a much larger user interface application, where users can generate 
simulations and then decide whether or not they are relevant enough to be 
stored in the database. At a rate of 300 MB per simulation (or more), it 
gets quickly to the "size issue".



> > 2. The whole point of a database was to have everything centralized in 
> > one place, not leaving the simulation files scattered around like a 
> > mess in the whole network drive; 
>
> The last time I did it a post processing step in my data pipeline 
> organized the files based on a multi-level folder structure based on the 
> first x-characters of their sha1. 
>


Again, I am dealing with non-Python people - and in general with people who 
are extremely good at what they do but they don't care about the overall IT 
architecture - as long as it works and it is recognizable from a Windows 
Explorer point of view. A file-based approach is unfortunately not a good 
option in the current setup.

 

>
> > 3. As an aside, not zlib-ing the files saves about 5 
> > seconds/simulation (over a 20 seconds save) but increases the database 
> > size by 4 times. I'll have to check if this is OK. 
> > 
> To use compression or not depends on your needs. If the difference in 
> time consumed is so stark, I would highly recommend compression. 
>


I will probably go that way, 5 seconds more or less do not make that much 
of a difference overall. I just wish the backends didn't complain when I 
pass them cPickled objects (bytes instead of strings...).


Andrea.
 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Handling big Python objects

Reply via email to