Re: Question on mappartitionwithsplit

2014-08-17 Thread Chengi Liu
Hi, Thanks for the response.. In the second case f2?? foo will have to be declared globablly??right?? My function is somthing like: def indexing(splitIndex, iterator): count = 0 offset = sum(*offset_lists*[:splitIndex]) if splitIndex else 0 indexed = [] for i, e in enumerate(iterator):

Re: Question on mappartitionwithsplit

2014-08-17 Thread Mohit Singh
Building on what Davies Liu said, How about something like: def indexing(splitIndex, iterator,*offset_lists* ): count = 0 offset = sum(*offset_lists*[:splitIndex]) if splitIndex else 0 indexed = [] for i, e in enumerate(iterator): index = count + offset + i for j, ele in

Re: Question on mappartitionwithsplit

2014-08-17 Thread Davies Liu
On Sun, Aug 17, 2014 at 11:21 AM, Chengi Liu chengi.liu...@gmail.com wrote: Hi, Thanks for the response.. In the second case f2?? foo will have to be declared globablly??right?? My function is somthing like: def indexing(splitIndex, iterator): count = 0 offset =

Re: Question on mappartitionwithsplit

2014-08-17 Thread Josh Rosen
Has anyone tried using functools.partial ( https://docs.python.org/2/library/functools.html#functools.partial) with PySpark? If it works, it might be a nice way to address this use-case. On Sun, Aug 17, 2014 at 7:35 PM, Davies Liu dav...@databricks.com wrote: On Sun, Aug 17, 2014 at 11:21 AM,