Calling JavaPairRDD.first after calling JavaPairRDD.groupByKey results in NullPointerException

2014-06-10 Thread Gaurav Jain
I am getting a strange null pointer exception when trying to list the first entry of a JavaPairRDD after calling groupByKey on it. Following is my code: JavaPairRDD, List> KeyToAppList = KeyToApp.distinct().groupByKey(); // System.out.println("First

Using custom class as a key for groupByKey() or reduceByKey()

2014-06-15 Thread Gaurav Jain
I have a simple Java class as follows, that I want to use as a key while applying groupByKey or reduceByKey functions: private static class FlowId { public String dcxId; public String trxId; public String msgType; pub

Re: Java updateStateByKey

2014-06-16 Thread Gaurav Jain
Probably, not the answer you are looking for, but you can always include the 'key' in each of the 'New Values' itself. Something like: class myVal { T myData; T key; } and in your updateStateByKey function, access the 'key' as val.key (which would be the same for each of the items in List

Accessing the per-key state maintained by updateStateByKey for transformation of JavaPairDStream

2014-06-16 Thread Gaurav Jain
Hello Spark Streaming Experts I have a use-case, where I have a bunch of log-entries coming in, say every 10 seconds (Batch-interval). I create a JavaPairDStream[K,V] from these log-entries. Now, there are two things I want to do with this JavaPairDStream: 1. Use key-dependent state (updated by u

Re: rdd.cache() is not faster?

2014-06-18 Thread Gaurav Jain
KTH talks about this: http://www.diva-portal.org/smash/get/diva2:605106/FULLTEXT01.pdf Best - Gaurav Jain Master's Student, D-INFK ETH Zurich -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/rdd-cache-is-not-faster-tp7804p7835.html Sent from the A

Re: rdd.cache() is not faster?

2014-06-18 Thread Gaurav Jain
eady) to reduce memory usage. Playing around with different Storage levels (MEMORY_ONLY_SER, for example) might also help. Best Gaurav Jain Master's Student, D-INFK ETH Zurich Email: jaing at student dot ethz dot ch - Gaurav Jain Master's Student, D-INFK ETH Zurich -- View this