Increasing number of partitions on data file solved the problem.
On 6 June 2014 18:46, Oleg Proudnikov wrote:
> Additional observation - the map and mapValues are pipelined and executed
> - as expected - in pairs. This means that there is a simple sequence of
> steps - first read from Cassandra
Additional observation - the map and mapValues are pipelined and executed -
as expected - in pairs. This means that there is a simple sequence of steps
- first read from Cassandra and then processing for each value of K. This
is the exact behaviour of a normal Java loop with these two steps inside.
Hi All,
I am passing Java static methods into RDD transformations map and
mapValues. The first map is from a simple string K into a (K,V) where V is
a Java ArrayList of large text strings, 50K each, read from Cassandra.
MapValues does processing of these text blocks into very small ArrayLists.
Th