Hey Jayant, In my previous mail, I have mentioned a github gist *https://gist.github.com/rssvihla/6577359860858ccb0b33 <https://gist.github.com/rssvihla/6577359860858ccb0b33> *which is doing very similar to what I want to do but its using scala language for spark.
Hence my question (reiterating from previous mail): *Is the current approach possible if I write the code in Scala?* Why does javaFunctions() need the SparkContext? Because per row in the RDD, I am making a get call to the data store 'cassandra'. The reason why I am fetching only selective data (that I will update later) because Cassandra doesn't provide range queries so I thought fetching complete data might be expensive. On Thu, Oct 23, 2014 at 11:22 AM, Jayant Shekhar <jay...@cloudera.com> wrote: > +1 to Sean. > > Is it possible to rewrite your code to not use SparkContext in RDD. Or why > does javaFunctions() need the SparkContext. > > On Thu, Oct 23, 2014 at 10:53 AM, Localhost shell < > universal.localh...@gmail.com> wrote: > >> Bang On Sean >> >> Before sending the issue mail, I was able to remove the compilation error >> by making it final but then got the >> Caused by: java.io.NotSerializableException: >> org.apache.spark.api.java.JavaSparkContext (As you mentioned) >> >> Now regarding your suggestion of changing the business logic, >> 1. *Is the current approach possible if I write the code in Scala ?* I >> think probably not but wanted to check with you. >> >> 2. Brief steps of what the code is doing: >> >> 1. Get raw sessions data from datatsore (C*) >> 2. Process the raw sessions data >> 3. Iterate over the processed data(derive from #2) and fetch the >> previously aggregated data from store for those rowkeys >> Add the values from this batch to previous batch values >> 4. Save back the updated values >> >> * This github gist might explain you more >> https://gist.github.com/rssvihla/6577359860858ccb0b33 >> <https://gist.github.com/rssvihla/6577359860858ccb0b33> and it does a >> similar thing in scala.* >> I am trying to achieve a similar thing in Java using Spark Batch with >> C* as the datastore. >> >> I have attached the java code file to provide you some code details. (If >> I was not able to explain you the problem so the code will be handy) >> >> >> The reason why I am fetching only selective data (that I will update >> later) because Cassanbdra doesn't provide range queries so I thought >> fetching complete data might be expensive. >> >> It will be great if you can share ur thoughts. >> >> On Thu, Oct 23, 2014 at 1:48 AM, Sean Owen <so...@cloudera.com> wrote: >> >>> In Java, javaSparkContext would have to be declared final in order for >>> it to be accessed inside an inner class like this. But this would >>> still not work as the context is not serializable. You should rewrite >>> this so you are not attempting to use the Spark context inside an >>> RDD. >>> >>> On Thu, Oct 23, 2014 at 8:46 AM, Localhost shell >>> <universal.localh...@gmail.com> wrote: >>> > Hey All, >>> > >>> > I am unable to access objects declared and initialized outside the >>> call() >>> > method of JavaRDD. >>> > >>> > In the below code snippet, call() method makes a fetch call to C* but >>> since >>> > javaSparkContext is defined outside the call method scope so compiler >>> give a >>> > compilation error. >>> > >>> > stringRdd.foreach(new VoidFunction<String>() { >>> > @Override >>> > public void call(String str) throws Exception { >>> > JavaRDD<String> vals = >>> > javaFunctions(javaSparkContext).cassandraTable("schema", "table", >>> > String.class) >>> > .select("val"); >>> > } >>> > }); >>> > >>> > In other languages I have used closure to do this but not able to >>> achieve >>> > the same here. >>> > >>> > Can someone suggest how to achieve this in the current code context? >>> > >>> > >>> > --Unilocal >>> > >>> > >>> > >>> >> >> >> >> -- >> --Unilocal >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > > -- --Unilocal