Spark with cassandra

lucas Fri, 22 May 2015 01:22:24 -0700

Hello, 

I have an issue. I would like to save some data to Cassandra using Spark.


Firstly i have load data from Elasticsearch to Spark then I obtain this : 

org.elasticsearch.spark.rdd.ScalaEsRDD which contains this kind of
information 


(AU1rN9uN4PGB4YTCSXr7,Map(@timestamp -> 2015-05-19T08:08:41.541Z, @version
-> 1, type -> test-xm, loglevel -> INFO, thread -> 
ajp-crmprod-fr-002%2F10.2.53.38-8009-44, ID_Echange -> 1432022921395,
SessionID -> 2188abc692ad1e0b62cbb6de2b875f91, ProcessID ->
1432022920a560000f00000000009212, IP -> 54.72.65.68, proxy -> 54.72.65.68,
ContactID -> 2221538663, Login -> 54509705, messageType -> <<)

And i have several row like this. I can saveToCassandra in a table which
contains (name text , map<text, text>).
However I can not do some queries on the map column because cassandra do not
do this.
So i did something like this : 

rddvalues.take(200000).foreach( a => {
val collection = sc.parallelize(Seq((a.get("timestamp").get,
a.getOrElse("proxy",null))))
collection.saveToCassandra("test", "sparkes")
}
)

And it is working but it is VERY slow. And when i am trying to do this 

rddvalues.foreach( a => {
     | val collection = sc.parallelize(Seq((a.get("timestamp").get,
a.getOrElse("proxy",null))))
     | collection.saveToCassandra("test", "sparkes")
     | }
     | )

I got this kind of message 

org.apache.spark.SparkException: Task not serializable
        at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
        at
org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
        at org.apache.spark.SparkContext.clean(SparkContext.scala:1622)
        at org.apache.spark.rdd.RDD.foreach(RDD.scala:797)

Do you have any idea ?
To conclude, I would like to but my map on a cassandra table from my
rddvalues org.apache.spark.rdd.RDD[scala.collection.Map[String,Any]]


Best regards, 






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-with-cassandra-tp22994.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark with cassandra

Reply via email to