[web2py] Re: Help understanding memory use and selects

Ian W. Scott Wed, 04 Sep 2019 10:54:19 -0700

After some experimenting and refactoring, I'll offer some preliminary 
answers to my own question here. First, I was able to refactor that list 
comprehension so that it uses negligible memory (too low for 
memory_profiler to register). The original version looked like this and 
consumed over 1MiB of memory each time it ran:

p_here = [p for p in cpaths if loc_id in db.steps[int(p['steps'][0])].
locations]

The refactored version looks like this:

    pid_here = [p['path2steps']['path_id'] for p
                in db((db.path2steps.step_id == db.steps.id) &
                      (db.steps.locations.contains(loc_id))
                      ).iterselect(db.path2steps.path_id, db.steps.locations
)
                      if loc_id in p['steps']['locations']
                      ]
    p_here = [p for p in cpaths if p['id'] in pid_here]

It looks less elegant, but it's *much* lighter on memory. Let me break down 
the changes I made. 

   1. I removed the db access (select) from the "if" condition which is 
   called on every iteration of the loop. Instead I access the db once and 
   iterate over the result.
   2. I use iterselect instead of select.
   3. In the iterselect I specified just the fields I'm actually going to 
   use, so that useless data doesn't go into memory.

In order to make these changes I had to reorganize the logic significantly. 
Rather than trying to pinpoint my desired data set in one pass through the 
list comprehension, I first gather a larger (!) data set and then refine it 
in a second step that doesn't require db access. The details aren't 
important here. What surprised me, though, is that it was far more memory 
efficient to iterate over a single, stripped down iterselect than to make 
multiple selects. This is true even though the resulting list is larger and 
has to be pared down in a second stage.

The larger takeaway for me is that db access is generally very expensive in 
terms of memory. It's worth it for me to organize my logic around 
minimizing db calls, even if the result is less elegant code.

On Monday, September 2, 2019 at 3:44:18 PM UTC-4, Ian W. Scott wrote:
>
> I'm trying to lower the memory use of an app and have some general 
> questions about how memory is used in DAL selects:
>
>
>    1. Am I right that the memory used while performing the select isn't 
>    released right away, even if the select isn't assigned to a variable? 
>    2. I'm aware of iterselect. Am I right that with iterselect the memory 
>    used is just enough to store one row of data (instead of the whole 
> selected 
>    set)?
>    3. Does this mean that, generally, you want to perform as few separate 
>    selects as possible, unless you can use iterselect?
>    4. Selects seem to occupy a significant amount of memory, even when 
>    the result set is only one row. Is the memory use for the select 
> determined 
>    by the table size or the result size?
>
> Here's an example of the kind of situation I'm working with. I'm using a 
> list comprehension to loop through a list, performing a select in each loop:
>
>     p_here = [p for p in cpaths if loc_id in db.steps[int(p['steps'][0])].
> locations]
>
>
> When I run a memory profiler, this line results in over 1MB of memory 
> being occupied, and that memory isn't released for at least several 
> minutes. The table "steps" has about 3000 rows, so it's not enormous. The 
> result for each select is a single row and doesn't include a huge amount of 
> data (a few strings, ints, etc.). The "cpaths" list might have 50 or so 
> items. So is the memory issue emerging because (a) the memory use for each 
> select is determined by the table size, and (b) memory is being occupied 
> (and not released) separately for each iteration of the loop? Is there a 
> way to rewrite this so that it uses less memory?
>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/web2py/4c9b338f-95a6-4137-89f2-59651eb5597c%40googlegroups.com.

[web2py] Re: Help understanding memory use and selects

Reply via email to