RE: NullPointerException from '.count.foreachRDD'

2014-08-20 Thread anoldbrain
Thank you for the reply. I implemented my InputDStream to return None when there's no data. After changing it to return empty RDD, the exception is gone. I am curious as to why all other processings worked correctly with my old incorrect implementation, with or without data? My actual codes, witho

Re: NullPointerException from '.count.foreachRDD'

2014-08-20 Thread anoldbrain
Looking at the source codes of DStream.scala > /** >* Return a new DStream in which each RDD has a single element generated > by counting each RDD >* of this DStream. >*/ > def count(): DStream[Long] = { > this.map(_ => (null, 1L)) > .transform(_.union(context.sparkCon

Re: Improving Spark multithreaded performance?

2014-06-27 Thread anoldbrain
I have not used this, only watched a presentation of it in spark summit 2013. https://github.com/radlab/sparrow https://spark-summit.org/talk/ousterhout-next-generation-spark-scheduling-with-sparrow/ Pure conjecture from your high scheduling latency and the size of your cluster, it seems one way

Re: Need help. Spark + Accumulo => Error: java.lang.NoSuchMethodError: org.apache.commons.codec.binary.Base64.encodeBase64String

2014-06-23 Thread anoldbrain
Assuming "this should not happen", I don't want to have to keep building a custom version of spark for every new release, thus preferring the workaround. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-Spark-Accumulo-Error-java-lang-NoSuchMethodErr

Re: Need help. Spark + Accumulo => Error: java.lang.NoSuchMethodError: org.apache.commons.codec.binary.Base64.encodeBase64String

2014-06-23 Thread anoldbrain
found a workaround by adding "SPARK_CLASSPATH=.../commons-codec-xxx.jar" to spark-env.sh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-Spark-Accumulo-Error-java-lang-NoSuchMethodError-org-apache-commons-codec-binary-Base64-eng-tp7667p8117.html S

Re: Need help. Spark + Accumulo => Error: java.lang.NoSuchMethodError: org.apache.commons.codec.binary.Base64.encodeBase64String

2014-06-22 Thread anoldbrain
I used Java Decompiler to check the included "org.apache.commons.codec.binary.Base64" .class file (in spark-assembly jar file) and for both "encodeBase64" and "decodeBase64", there is only (byte []) version and no encodeBase64/decodeBase64(String). I have encountered the reported issue. This confl

Re: Need help. Spark + Accumulo => Error: java.lang.NoSuchMethodError: org.apache.commons.codec.binary.Base64.encodeBase64String

2014-06-22 Thread anoldbrain
I checked the META-INF/DEPENDENCIES file in the spark-assembly jar from official 1.0.0 binary release for CDH4, and found one "commons-codec" entry From: 'The Apache Software Foundation' (http://jakarta.apache.org) - Codec (http://jakarta.apache.org/commons/codec/) commons-codec:commons-codec:ja

Re: How to use FlumeInputDStream in spark cluster?

2014-03-21 Thread anoldbrain
It is my understanding that there is no way to make FlumeInputDStream work in a cluster environment with the current release. Switch to Kafka, if you can, would be my suggestion, although I have not used KafkaInputDStream. There is a big difference between Kafka and Flume InputDstream: KafkaInputDS

Re: How to use FlumeInputDStream in spark cluster?

2014-03-21 Thread anoldbrain
Hi, This is my summary of the gap between expected behavior and actual behavior. FlumeEventCount spark://:7077 Expected: an 'agent' listening on : (bind to). In the context of Spark, this agent should be running on one of the slaves, which should be the slave whose ip/hostname is . Observed:

NullPointerException from 'Count' on DStream

2014-02-25 Thread anoldbrain
Dear all, I encountered NullPointerException running a simple program like below: > val sparkconf = new SparkConf() > .setMaster(master) > .setAppName("myapp") > // and other setups > > val ssc = new StreamingContext(sparkconf, Seconds(30)) > val flume = new FlumeInputDStream(ssc, f