Re: spark sql and cassandra. spark generate 769 tasks to read 3 lines from cassandra table

2015-06-17 Thread Serega Sheypak
So, there is some input: So the problem could be in spark-sql-thriftserver. When I use spark console to submit SQL query, it takes 10 seconds and reasonable count of tasks. import com.datastax.spark.connector._; val cc = new CassandraSQLContext(sc); cc.sql("select su.user_id from appdata.site_u

Re: spark sql and cassandra. spark generate 769 tasks to read 3 lines from cassandra table

2015-06-17 Thread Serega Sheypak
>version We are on DSE 4.7. (Cassandra 2.1) and spark 1.2.1 >cqlsh select * from site_users returns fast, subsecond, only 3 rows >Can you show some code how you're doing the reads? dse beeline !connect ... select * from site_users --table has 3 rows, several columns in each row. Spark eunts 769 t

Re: spark sql and cassandra. spark generate 769 tasks to read 3 lines from cassandra table

2015-06-17 Thread Yana Kadiyska
Can you show some code how you're doing the reads? Have you successfully read other stuff from Cassandra (i.e. do you have a lot of experience with this path and this particular table is causing issues or are you trying to figure out the right way to do a read). What version of Spark and Cassandra

Re: spark sql and cassandra. spark generate 769 tasks to read 3 lines from cassandra table

2015-06-17 Thread Serega Sheypak
Hi, can somebody suggest me the way to reduce quantity of task? 2015-06-15 18:26 GMT+02:00 Serega Sheypak : > Hi, I'm running spark sql against Cassandra table. I have 3 C* nodes, Each > of them has spark worker. > The problem is that spark runs 869 task to read 3 lines: select bar from > foo. >

spark sql and cassandra. spark generate 769 tasks to read 3 lines from cassandra table

2015-06-15 Thread Serega Sheypak
Hi, I'm running spark sql against Cassandra table. I have 3 C* nodes, Each of them has spark worker. The problem is that spark runs 869 task to read 3 lines: select bar from foo. I've tried these properties: #try to avoid 769 tasks per dummy select foo from bar qeury spark.cassandra.input.split.si