On 21Aug2019 21:26, Sarah Hembree <sarah12...@gmail.com> wrote:
How do you chunk data? We came up with the below snippet. It works (with
integer list data) for our needs, but it seems so clunky.

   def _chunks(lst: list, size: int) -> list:
       return  [lst[x:x+size] for x in range(0, len(lst), size)]

What do you do? Also, what about doing this lazily so as to keep memory
drag at a minimum?

This looks pretty good to me. But as you say, it constructs the complete list of chunks and returns them all. For many chunks that is both slow and memory hungry.

If you want to conserve memory and return chunks in a lazy manner you can rewrite this as a generator. A first cut might look like this:

  def _chunks(lst: list, size: int) -> list:
      for x in range(0, len(lst), size):
          yield lst[x:x+size]

which causes _chunk() be a generator function: it returns an iterator which yields each chunk one at a time - the body of the function is kept "running", but stalled. When you iterate over the return from _chunk() Python runs that stalled function until it yields a value, then stalls it again and hands you that value.

Modern Python has a thing called a "generator expression". Your original function is a "list comprehension": it constructs a list of values and returns that list. In many cases, particularly for very long lists, that can be both slow and memory hungry. You can rewrite such a thing like this:

   def _chunks(lst: list, size: int) -> list:
       return ( lst[x:x+size] for x in range(0, len(lst), size) )

Omitting the square brackets turns this into a generator expression. It returns an iterator instead of a list, which functions like the generator function I sketched, and generates the chunks lazily.

Cheers,
Cameron Simpson <c...@cskk.id.au>
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to