Re: NullPointerException from '.count.foreachRDD'

2014-08-20 Thread anoldbrain
Looking at the source codes of DStream.scala /** * Return a new DStream in which each RDD has a single element generated by counting each RDD * of this DStream. */ def count(): DStream[Long] = { this.map(_ = (null, 1L))

RE: NullPointerException from '.count.foreachRDD'

2014-08-20 Thread anoldbrain
Thank you for the reply. I implemented my InputDStream to return None when there's no data. After changing it to return empty RDD, the exception is gone. I am curious as to why all other processings worked correctly with my old incorrect implementation, with or without data? My actual codes,

Re: Improving Spark multithreaded performance?

2014-06-27 Thread anoldbrain
I have not used this, only watched a presentation of it in spark summit 2013. https://github.com/radlab/sparrow https://spark-summit.org/talk/ousterhout-next-generation-spark-scheduling-with-sparrow/ Pure conjecture from your high scheduling latency and the size of your cluster, it seems one way

Re: Need help. Spark + Accumulo = Error: java.lang.NoSuchMethodError: org.apache.commons.codec.binary.Base64.encodeBase64String

2014-06-23 Thread anoldbrain
I used Java Decompiler to check the included org.apache.commons.codec.binary.Base64 .class file (in spark-assembly jar file) and for both encodeBase64 and decodeBase64, there is only (byte []) version and no encodeBase64/decodeBase64(String). I have encountered the reported issue. This conflicts

Re: Need help. Spark + Accumulo = Error: java.lang.NoSuchMethodError: org.apache.commons.codec.binary.Base64.encodeBase64String

2014-06-23 Thread anoldbrain
found a workaround by adding SPARK_CLASSPATH=.../commons-codec-xxx.jar to spark-env.sh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-Spark-Accumulo-Error-java-lang-NoSuchMethodError-org-apache-commons-codec-binary-Base64-eng-tp7667p8117.html

Re: Need help. Spark + Accumulo = Error: java.lang.NoSuchMethodError: org.apache.commons.codec.binary.Base64.encodeBase64String

2014-06-23 Thread anoldbrain
Assuming this should not happen, I don't want to have to keep building a custom version of spark for every new release, thus preferring the workaround. -- View this message in context:

Re: How to use FlumeInputDStream in spark cluster?

2014-03-21 Thread anoldbrain
Hi, This is my summary of the gap between expected behavior and actual behavior. FlumeEventCount spark://spark_master_hostname:7077 address port Expected: an 'agent' listening on address:port (bind to). In the context of Spark, this agent should be running on one of the slaves, which should be

Re: How to use FlumeInputDStream in spark cluster?

2014-03-21 Thread anoldbrain
It is my understanding that there is no way to make FlumeInputDStream work in a cluster environment with the current release. Switch to Kafka, if you can, would be my suggestion, although I have not used KafkaInputDStream. There is a big difference between Kafka and Flume InputDstream: