Have you tried rdd.distinc?
>
> On Sun, Nov 13, 2016 at 8:28 AM, Cody Koeninger <c...@koeninger.org>
> wrote:
>
>> Can you come up with a minimal reproducible example?
>>
>> Probably unrelated, but why are you doing a union of 3 streams?
>>
>> On Sat
led tasks or other errors?
> Output actions like foreach aren't exactly once and will be retried on
> failures.
>
> On Nov 12, 2016 06:36, "dev loper" <spark...@gmail.com> wrote:
>
>> Dear fellow Spark Users,
>>
>> My Spark Streaming application (Spark 2.
Dear fellow Spark Users,
My Spark Streaming application (Spark 2.0 , on AWS EMR yarn cluster)
listens to Campaigns based on live stock feeds and the batch duration is 5
seconds. The applications uses Kafka DirectStream and based on the feed
source there are three streams. As given in the code
uest your help to identify the issue .
On Fri, Apr 29, 2016 at 7:32 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> Please use the following syntax:
>
> --conf
>
> "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///local/file/log4j.properties"
>
> FYI
>
Hi Spark Team,
I have asked the same question on stack overflow , no luck yet.
http://stackoverflow.com/questions/36923949/where-to-find-logs-within-spark-rdd-processing-function-yarn-cluster-mode?noredirect=1#comment61419406_36923949
I am running my Spark Application on Yarn Cluster. No