Re: DataStax Spark driver performance for analytics workload

2017-10-10 Thread Javier GarcĂ­a-Valdecasas Bernal
Hi, The spark-cassandra-connector does pushdown filter when there are valid clauses. Pushdown filters go directly to cassandra so, if your model fits your queries, you won't end up reading or scanning the full table but only those partitions that match your query. You can check which clauses are

Re: DataStax Spark driver performance for analytics workload

2017-10-10 Thread Stone Fang
@kurt greaves doubt that need to read all the data.it is common that there are so many records in cassandra cluster. if loading all the data,how to analyse? On Mon, Oct 9, 2017 at 9:49 AM, kurt greaves wrote: > spark-cassandra-connector will provide the best way to achieve what you > want, howe

Re: DataStax Spark driver performance for analytics workload

2017-10-08 Thread kurt greaves
spark-cassandra-connector will provide the best way to achieve what you want, however under the hood it's still going to result in reading all the data, and because of the way Cassandra works it will essentially read the same SSTables multiple times from random points. You might be able to tune to

DataStax Spark driver performance for analytics workload

2017-10-06 Thread eugene miretsky
Hello, When doing analytics is Spark, a common pattern is to load either the whole table into memory or filter on some columns. This is a good pattern for column-oriented files (Parquet) but seems to be a huge anti-pattern in C*. Most common spark operations will result in one of (a) query without