On Thu, Aug 7, 2014 at 12:06 AM, Rok Roskar <rokros...@gmail.com> wrote:
> sure, but if you knew that a numpy array went in on one end, you could safely 
> use it on the other end, no? Perhaps it would require an extension of the RDD 
> class and overriding the colect() method.
>
>> Could you give a short example about how numpy array is used in your project?
>>
>
> sure -- basically our main data structure is a container class (acts like a 
> dictionary) that holds various arrays that represent particle data. Each 
> particle has various properties, position, velocity, mass etc. you get at 
> these individual properties by calling something like
>
> s['pos']
>
> where 's' is the container object and 'pos' is the name of the array. A 
> really common use case then is to select particles based on their properties 
> and do some plotting, or take a slice of the particles, e.g. you might do
>
> r = np.sqrt((s['pos']**2).sum(axis=1))
> ind = np.where(r < 5)
> plot(s[ind]['x'], s[ind]['y'])
>
> Internally, the various arrays are kept in a dictionary -- I'm hoping to 
> write a class that keeps them in an RDD instead. To the user, this would have 
> to be transparent, i.e. if the user wants to get at the data for specific 
> particles, she would just have to do
>
> s['pos'][1,5,10]
>
> for example, and the data would be fetched for her from the RDD just like it 
> would be if she were simply using the usual single-machine version. This is 
> why the writing to/from files when retrieving data from the RDD really is a 
> no-go -- can you recommend how this can be circumvented?

RDD is expected as distributed, so accessing the items in RDD by key
or indices directly will not be easy. So I think you can not mapping
this interface to an RDD, or the result will be what user expected,
such as very very slow.

In order to parallelize the computation, most of them should be done
by transformation of RDDs. Finally, fetch the data from RDD by
collect(), then do the plotting stuff. Can this kind of work flow work
for you cases?

Davies

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to