Re: Cassandra number of Tasks

2015-05-12 Thread Vijay Pawnarkar
Thanks!. We can somewhat approximate number of rows returned by where(), as a result we can approximate number of partitions, so repartition approach will work. Lets say if the .where() had resulted in widel varying number of rows, we would not have been to approximate # of partition, that would ca

Re: Cassandra number of Tasks

2015-05-11 Thread ayan guha
Hi I think pushing filter up would be best. Essentially, I would suggest having smallish partitions and filter the data. Then repartition 10k records using numPartition=10 and then write to cassandra. Best Ayan On Mon, May 11, 2015 at 5:03 PM, Akhil Das wrote: > Did you try repartitioning? You

Re: Cassandra number of Tasks

2015-05-11 Thread Akhil Das
Did you try repartitioning? You might end up with a lot of time spending on GC though. Thanks Best Regards On Fri, May 8, 2015 at 11:59 PM, Vijay Pawnarkar wrote: > I am using the Spark Cassandra connector to work with a table with 3 > million records. Using .where() API to work with only a cer

Cassandra number of Tasks

2015-05-08 Thread Vijay Pawnarkar
I am using the Spark Cassandra connector to work with a table with 3 million records. Using .where() API to work with only a certain rows in this table. Where clause filters the data to 1 rows. CassandraJavaUtil.javaFunctions(sparkContext) .cassandraTable(KEY_SPACE, MY_TABLE, CassandraJavaUtil