Re: Using Java functions in Spark

2014-06-07 Thread Oleg Proudnikov
Increasing number of partitions on data file solved the problem. On 6 June 2014 18:46, Oleg Proudnikov wrote: > Additional observation - the map and mapValues are pipelined and executed > - as expected - in pairs. This means that there is a simple sequence of > steps - first read from Cassandra

Re: Using Java functions in Spark

2014-06-06 Thread Oleg Proudnikov
Additional observation - the map and mapValues are pipelined and executed - as expected - in pairs. This means that there is a simple sequence of steps - first read from Cassandra and then processing for each value of K. This is the exact behaviour of a normal Java loop with these two steps inside.

Using Java functions in Spark

2014-06-06 Thread Oleg Proudnikov
Hi All, I am passing Java static methods into RDD transformations map and mapValues. The first map is from a simple string K into a (K,V) where V is a Java ArrayList of large text strings, 50K each, read from Cassandra. MapValues does processing of these text blocks into very small ArrayLists. Th