Looking at the source codes of DStream.scala
/**
* Return a new DStream in which each RDD has a single element generated
by counting each RDD
* of this DStream.
*/
def count(): DStream[Long] = {
this.map(_ = (null, 1L))
Thank you for the reply. I implemented my InputDStream to return None when
there's no data. After changing it to return empty RDD, the exception is
gone.
I am curious as to why all other processings worked correctly with my old
incorrect implementation, with or without data? My actual codes,
I have not used this, only watched a presentation of it in spark summit 2013.
https://github.com/radlab/sparrow
https://spark-summit.org/talk/ousterhout-next-generation-spark-scheduling-with-sparrow/
Pure conjecture from your high scheduling latency and the size of your
cluster, it seems one way
I used Java Decompiler to check the included
org.apache.commons.codec.binary.Base64 .class file (in spark-assembly jar
file) and for both encodeBase64 and decodeBase64, there is only (byte
[]) version and no encodeBase64/decodeBase64(String).
I have encountered the reported issue. This conflicts
found a workaround by adding SPARK_CLASSPATH=.../commons-codec-xxx.jar to
spark-env.sh
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-Spark-Accumulo-Error-java-lang-NoSuchMethodError-org-apache-commons-codec-binary-Base64-eng-tp7667p8117.html
Assuming this should not happen, I don't want to have to keep building a
custom version of spark for every new release, thus preferring the
workaround.
--
View this message in context:
Hi,
This is my summary of the gap between expected behavior and actual behavior.
FlumeEventCount spark://spark_master_hostname:7077 address port
Expected: an 'agent' listening on address:port (bind to). In the context
of Spark, this agent should be running on one of the slaves, which should be
It is my understanding that there is no way to make FlumeInputDStream work in
a cluster environment with the current release. Switch to Kafka, if you can,
would be my suggestion, although I have not used KafkaInputDStream. There is
a big difference between Kafka and Flume InputDstream: