Hi Yana, Not sure whether you already solved this issue. As far as I know, the DataFrame support in Spark Cassandra connector was added in version 1.3. The first milestone release of SCC v1.3 was just announced.
Mohammed From: Yana Kadiyska [mailto:yana.kadiy...@gmail.com] Sent: Tuesday, May 26, 2015 1:31 PM To: user@spark.apache.org Subject: Need some Cassandra integration help Hi folks, for those of you working with Cassandra, wondering if anyone has been successful processing a mix of Cassandra and hdfs data. I have a dataset which is stored partially in HDFS and partially in Cassandra (schema is the same in both places) I am trying to do the following: val dfHDFS = sqlContext.parquetFile("foo.parquet") val cassDF = cassandraContext.sql("SELECT * FROM keyspace.user") dfHDFS.unionAll(cassDF).count This is failing for me with the following - Exception in thread "main" java.lang.AssertionError: assertion failed: No plan for CassandraRelation TableDef(yana_test,test_connector,ArrayBuffer(ColumnDef(key,PartitionKeyColumn,IntType,false)),ArrayBuff er(),ArrayBuffer(ColumnDef(value,RegularColumn,VarCharType,false))), None, None at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:278) at org.apache.spark.sql.catalyst.planning.QueryPlanner$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$anonfun$14.apply(SparkStrategies.scala:292) at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$anonfun$14.apply(SparkStrategies.scala:292) at scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:292) at org.apache.spark.sql.catalyst.planning.QueryPlanner$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$HashAggregation$.apply(SparkStrategies.scala:152) at org.apache.spark.sql.catalyst.planning.QueryPlanner$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:1092) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:1090) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:1096) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:1096) at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:887) at org.apache.spark.sql.DataFrame.count(DataFrame.scala:899) at com.akamai.rum.test.unit.SparkTest.sparkPerfTest(SparkTest.java:123) at com.akamai.rum.test.unit.SparkTest.main(SparkTest.java:37) Is there a way to pull the data out of cassandra on each executor but not try to push logic down into casandra?