Hi,
With the current design, eventlogs are not ideal for long running streaming
applications. So, it is better then to disable the eventlogs. There was a
proposal for splitting the eventlogs based on size/Job/query for long
running applications, not sure about the followup for that.
Regards,
Sha
Hi.
There is a workaround for that.
You can disable event logs for Spark Streaming applications.
On Tue, Jul 16, 2019 at 1:08 PM raman gugnani
wrote:
> HI ,
>
> I have long running spark streaming jobs.
> Event log directories are getting filled with .inprogress files.
> Is th
HI ,
I have long running spark streaming jobs.
Event log directories are getting filled with .inprogress files.
Is there fix or work around for spark streaming.
There is also one jira raised for the same by one reporter.
https://issues.apache.org/jira/browse/SPARK-22783
--
Raman Gugnani
85888
t 7:53 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>"
mailto:user@spark.apache.org>>
Subject: share datasets across multiple spark-streaming applications for lookup
Hi,
What is the recommended way to share datasets across multiple spark-streaming
applications, so t
30, 2017 at 7:53 PM
>> To: "user@spark.apache.org"
>> Subject: share datasets across multiple spark-streaming applications for
>> lookup
>>
>>
>>
>> Hi,
>>
>>
>>
>> What is the recommended way to share datas
th multiple Apps doing lookups simultaneously? Are there better
> options? Thank you.
>
>
>
> *From: *roshan joe
> *Date: *Monday, October 30, 2017 at 7:53 PM
> *To: *"user@spark.apache.org"
> *Subject: *share datasets across multiple spark-streaming applications
>
;user@spark.apache.org"
Subject: share datasets across multiple spark-streaming applications for lookup
Hi,
What is the recommended way to share datasets across multiple spark-streaming
applications, so that the incoming data can be looked up against this shared
dataset?
The shared datas
Hi,
What is the recommended way to share datasets across multiple
spark-streaming applications, so that the incoming data can be looked up
against this shared dataset?
The shared dataset is also incrementally refreshed and stored on S3. Below
is the scenario.
Streaming App-1 consumes data from
Thursday 15 December 2016, Divya Gehlot
>> wrote:
>>
>>> It depends on the use case ...
>>> Spark always depends on the resource availability .
>>> As long as you have resource to acoomodate ,can run as many spark/spark
>>> streaming application.
>
;>
>>
>> Thanks,
>> Divya
>>
>> On 15 December 2016 at 08:42, shyla deshpande
>> wrote:
>>
>>> How many Spark streaming applications can be run at a time on a Spark
>>> cluster?
>>>
>>> Is it better to have 1 spar
ark/spark
> streaming application.
>
>
> Thanks,
> Divya
>
> On 15 December 2016 at 08:42, shyla deshpande > wrote:
>
>> How many Spark streaming applications can be run at a time on a Spark
>> cluster?
>>
>> Is it better to have 1 spark stre
It depends on the use case ...
Spark always depends on the resource availability .
As long as you have resource to acoomodate ,can run as many spark/spark
streaming application.
Thanks,
Divya
On 15 December 2016 at 08:42, shyla deshpande
wrote:
> How many Spark streaming applications can
How many Spark streaming applications can be run at a time on a Spark
cluster?
Is it better to have 1 spark streaming application to consume all the Kafka
topics or have multiple streaming applications when possible to keep it
simple?
Thanks
Hello,
We are facing large Scheduling delay in our Spark streaming application.
Not sure how to debug why the delay is happening. We have all the tuning
possible on Spark side.
Can someone advice how to debug the cause of the delay and some tips for
resolving it please?
--
Regards
Hemalatha
Deployed Spark Streaming applications to a standalone cluster, after a cluster
restart, all the deployed applications are gone and I could not see any
applications through the Spark Web UI.
How to make the Spark Streaming applications durable and auto-restart after a
cluster restart
th spark context, but how to
run it from code automatically when a condition comes true without actually
using spark-submit
Is it possible?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Applications-tp16976p17453.html
Sent from the Apache
spork-streaming/blob/master/src/org/apache/pig/backend/hadoop/executionengine/spark_streaming/SparkStreamingLauncher.java#L183>
and all, this is a pretty big project actually.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Applications-
Cc'ing Helena for more information on this.
TD
On Thu, Oct 23, 2014 at 6:30 AM, Saiph Kappa wrote:
> What is the application about? I couldn't find any proper description
> regarding the purpose of killrweather ( I mean, other than just integrating
> Spark with Cassandra). Do you know if the sl
What is the application about? I couldn't find any proper description
regarding the purpose of killrweather ( I mean, other than just integrating
Spark with Cassandra). Do you know if the slides of that tutorial are
available somewhere?
Thanks!
On Wed, Oct 22, 2014 at 6:58 PM, Sameer Farooqui
wr
Hi Saiph,
Patrick McFadin and Helena Edelson from DataStax taught a tutorial at NYC
Strata last week where they created a prototype Spark Streaming + Kafka
application for time series data.
You can see the code here:
https://github.com/killrweather/killrweather
On Tue, Oct 21, 2014 at 4:33 PM,
Hi,
I have been trying to find a fairly complex application that makes use of
the Spark Streaming framework. I checked public github repos but the
examples I found were too simple, only comprising simple operations like
counters and sums. On the Spark summit website, I could find very
interesting
t you seek is what happens "out of the box" (unless I'm
> misunderstanding the question)
>
> On Wed, Oct 1, 2014 at 4:13 AM, Chia-Chun Shih
> wrote:
>
>> Hi,
>>
>> Are there any code examples demonstrating spark streaming applications
>> which depend on states? That is, last-run *updateStateByKey* results are
>> used as inputs.
>>
>> Thanks.
>>
>>
>>
>>
>>
>>
>
13 AM, Chia-Chun Shih
wrote:
> Hi,
>
> Are there any code examples demonstrating spark streaming applications
> which depend on states? That is, last-run *updateStateByKey* results are
> used as inputs.
>
> Thanks.
>
>
>
>
>
>
Hi,
Are there any code examples demonstrating spark streaming applications
which depend on states? That is, last-run *updateStateByKey* results are
used as inputs.
Thanks.
More importantly, why are you asking this question? :)
Also let me generalize the answer by saying that most applications that do
some useful computations use map-like operations. And by map-like
operations I mean simple operations like map, filter, flatMap,
mapPartitions. The only category of appl
Hi,
On Wed, Oct 1, 2014 at 12:20 AM, Saiph Kappa wrote:
> But most applications use transformations, and map in particular, correct?
>
Yes, I would claim that most applications that do some useful computation
use map().
Tobias
.
>
> Thanks,
> Liquan
>
> On Mon, Sep 29, 2014 at 10:15 AM, Saiph Kappa
> wrote:
>
>> Hi,
>>
>> Do all spark streaming applications use the map operation? or the
>> majority of them?
>>
>> Thanks.
>>
>
>
>
> --
> Liquan Pei
> Department of Physics
> University of Massachusetts Amherst
>
Hi Saiph,
Map is used for transformation on your input RDD. If you don't need
transformation of your input, you don't need to use map.
Thanks,
Liquan
On Mon, Sep 29, 2014 at 10:15 AM, Saiph Kappa wrote:
> Hi,
>
> Do all spark streaming applications use the map operation? o
Hi,
Do all spark streaming applications use the map operation? or the majority
of them?
Thanks.
Hi,
by now I understood maybe a bit better how spark-submit and YARN play
together and how Spark driver and slaves play together on YARN.
Now for my usecase, as described on <
https://spark.apache.org/docs/latest/submitting-applications.html>, I would
probably have a end-user-facing gateway that
Hi,
On Thu, Sep 4, 2014 at 10:33 AM, Tathagata Das
wrote:
> In the current state of Spark Streaming, creating separate Java processes
> each having a streaming context is probably the best approach to
> dynamically adding and removing of input sources. All of these should be
> able to to use a Y
In the current state of Spark Streaming, creating separate Java processes
each having a streaming context is probably the best approach to
dynamically adding and removing of input sources. All of these should be
able to to use a YARN cluster for resource allocation.
On Wed, Sep 3, 2014 at 6:30 PM
Hi,
I am not sure if "multi-tenancy" is the right word, but I am thinking about
a Spark application where multiple users can, say, log into some web
interface and specify a data processing pipeline with streaming source,
processing steps, and output.
Now as far as I know, there can be only one St
33 matches
Mail list logo