Spark connector doesn't do the "select * from table;" - it does reads by token ranges, reading the data (see https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/rdd/partitioner/CassandraPartition.scala#L14)
Jacques-Henri Berthemet at "Thu, 25 Jul 2019 14:18:57 +0000" wrote: JB> Hi Asad, JB> That’s because of the way Spark works. Essentially, when you execute a Spark job, it pulls the full content of the datastore (Cassandra JB> in your case) in it RDDs and works with it “in memory”. While Spark uses “data locality” to read data from the nodes that have the JB> required data on its local disks, it’s still reading all data from Cassandra tables. To do so it’s sending ‘select * from Table ALLOW JB> FILTERING’ query to Cassandra. JB> From Spark you don’t have much control on the initial query to fill the RDDs, sometimes you’ll read the whole table even if you only JB> need one row. JB> Regards, JB> Jacques-Henri Berthemet JB> From: "ZAIDI, ASAD A" <az1...@att.com> JB> Reply to: "user@cassandra.apache.org" <user@cassandra.apache.org> JB> Date: Thursday 25 July 2019 at 15:49 JB> To: "user@cassandra.apache.org" <user@cassandra.apache.org> JB> Subject: Performance impact with ALLOW FILTERING clause. JB> Hello Folks, JB> I was going thru documentation and saw at many places saying ALLOW FILTERING causes performance unpredictability. Our developers says JB> ALLOW FILTERING clause is implicitly added on bunch of queries by spark-Cassandra connector and they cannot control it; however at the JB> same time we see unpredictability in application performance – just as documentation says. JB> I’m trying to understand why would a connector add a clause in query when this can cause negative impact on database/application JB> performance. Is that data model that is driving connector make its decision and add allow filtering to query automatically or if there JB> are other reason this clause is added to the code. I’m not a developer though I want to know why developer don’t have any control on JB> this to happen. JB> I’ll appreciate your guidance here. JB> Thanks JB> Asad -- With best wishes, Alex Ott Solutions Architect EMEA, DataStax http://datastax.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org