It depends on how much memory is available and how much data you are
processing. Please provide data size and cluster details to help.
On Fri, Aug 14, 2020 at 12:54 AM km.santanu wrote:
> Hi
> I am using Kafka stateless structure streaming.i have enabled watermark as
> 1
> hour.after long runnin
ckaged as part of my main jar.
> It worked well both on driver and locally, but not on executors.
>
> Regards,
>
> Bryan Jeffrey
>
> Get Outlook for Android <https://aka.ms/ghei36>
>
> ------
> *From:* Srinivas V
> *Sent:* Saturday,
6>
>
> --
> *From:* Srinivas V
> *Sent:* Friday, June 26, 2020 9:47:52 PM
> *To:* Bryan Jeffrey
> *Cc:* user
> *Subject:* Re: Metrics Problem
>
> It should work when you are giving hdfs path as long as your jar exists in
> the path.
> Your
It should work when you are giving hdfs path as long as your jar exists in
the path.
Your error is more security issue (Kerberos) or Hadoop dependencies missing
I think, your error says :
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation
On Fri, Jun 26, 2020 at 8:44 PM Brya
Cool. Are you not using watermark ?
Also, is it possible to start listening offsets from a specific date time ?
Regards
Srini
On Sat, Jun 27, 2020 at 6:12 AM Eric Beabes
wrote:
> My apologies... After I set the 'maxOffsetsPerTrigger' to a value such as
> '20' it started working. Hopefully
Does stateful structured streaming work on a stand-alone spark cluster with
few nodes? Does it need hdfs ? If not how to get it working without hdfs ?
Regards
Srini
icn+1,")
> .load()
>
> After acquiring the DataFrame, you can union them and treat all the data
> with a single process.
>
> val unifiedData = df_cluster1.union(df_cluster2)
> // apply further transformations on `unifiedData`
>
> kr, Gerard.
>
>
> :
>
>
&
Thanks for the quick reply. This may work but I have like 5 topics to
listen to right now, I am trying to keep all topics in an array in a
properties file and trying to read all at once. This way it is dynamic and
you have one code block like below and you may add or delete topics from
the config f
Hello,
In Structured Streaming, is it possible to have one spark application with
one query to consume topics from multiple kafka clusters?
I am trying to consume two topics each from different Kafka Cluster, but it
gives one of the topics as an unknown topic and the job keeps running
without com
hing <
> mailinglist...@gmail.com> wrote:
>
>> Right, this is exactly how I've it right now. Problem is in the cluster
>> mode 'myAcc' does NOT get distributed. Try it out in the cluster mode & you
>> will see what I mean.
>>
>>
; > > > .mapGroupsWithState(timeoutConf =
>> > > > GroupStateTimeout.ProcessingTimeTimeout)(func = mappingFunc)
>> > > >
>> > > > val query = wordCounts.writeStream
>> > > > .outputMode(OutputMode.Update)
>> > > > ...
>> > &g
elUpdate call(String productId, Iterator
> eventsIterator, GroupState state) {
> }
> }
>
> On Fri, May 29, 2020 at 1:08 PM Srinivas V wrote:
>
>>
>> Yes, accumulators are updated in the call method of StateUpdateTask. Like
>> when state times out or when the data
gt;
>
> On Fri, May 29, 2020 at 6:51 AM Srinivas V wrote:
>
>> Yes it is application specific class. This is how java Spark Functions
>> work.
>> You can refer to this code in the documentation:
>> https://github.com/apache/spark/blob/master/examples/src/main/java/o
plication specific class. Does it
> have 'updateState' method or something? I googled but couldn't find any
> documentation about doing it this way. Can you please direct me to some
> documentation. Thanks.
>
> On Thu, May 28, 2020 at 4:43 AM Srinivas V wrote:
>
>&g
ue)
> Thread.sleep(5 * 1000)
> }
>
> //query.awaitTermination()
> ```
>
> And the accumulator value updated can be found from driver stdout.
>
> --
> Cheers,
> -z
>
> On Thu, 28 May 2020 17:12:48 +0530
> Srinivas V wrote:
>
> > yes, I
'Stateful Structured Streaming'. In other streaming
> applications it works as expected.
>
>
>
> On Wed, May 27, 2020 at 9:01 AM Srinivas V wrote:
>
>> Yes, I am talking about Application specific Accumulators. Actually I am
>> getting the values printed in
in
> your code? I am talking about the Application Specific Accumulators. The
> other standard counters such as 'event.progress.inputRowsPerSecond' are
> getting populated correctly!
>
> On Mon, May 25, 2020 at 8:39 PM Srinivas V wrote:
>
>> Hello,
>> Even for me
Hello,
Even for me it comes as 0 when I print in OnQueryProgress. I use
LongAccumulator as well. Yes, it prints on my local but not on cluster.
But one consolation is that when I send metrics to Graphana, the values are
coming there.
On Tue, May 26, 2020 at 3:10 AM Something Something <
mailinglis
Hello,
I am listening to a kaka topic through Spark Structured Streaming [2.4.5].
After processing messages for few mins, I am getting below
NullPointerException.I have three beans used here 1.Event 2.StateInfo
3.SessionUpdateInfo. I am suspecting that the problem is with StateInfo,
when it is wri
If you are talking about total number of objects the state can hold, that
depends on the executor memory you have on your cluster apart from rest of
the memory required for processing. The state is stored in hdfs and
retrieved while processing the next events.
If you maintain million objects with e
Anyone else can answer below questions on performance tuning Structured
streaming?
@Jacek?
On Sun, May 3, 2020 at 12:07 AM Srinivas V wrote:
> Hi Alex, read the book , it is a good one but i don’t see things which I
> strongly want to understand.
> You are right on the partition and t
ith Kafka, every partition in Kafka is mapped into Spark
> partition. And in Spark, every partition is mapped into task. But you can
> use `coalesce` to decrease the number of Spark partitions, so you'll have
> less tasks...
>
> Srinivas V at "Sat, 18 Apr 2020 10:32:33 +0
of
> each partition is a separate task that need to be executed, so you need to
> plan number of cores correspondingly.
>
> Srinivas V at "Thu, 16 Apr 2020 22:49:15 +0530" wrote:
> SV> Hello,
> SV> Can someone point me to a good video or document which takes about
&g
Hello,
Can someone point me to a good video or document which takes about
performance tuning for structured streaming app?
I am looking especially for listening to Kafka topics say 5 topics each
with 100 portions .
Trying to figure out best cluster size and number of executors and cores
required.
Check if your broker details are correct, verify if you have network
connectivity to your client box and Kafka broker server host.
On Fri, Apr 10, 2020 at 11:04 PM Debabrata Ghosh
wrote:
> Hi,
> I have a spark streaming application where Kafka is producing
> records but unfortunately spa
non getter methods of the fields defined? Still how is that
causing the state object get corrupt so much?
On Sat, Mar 28, 2020 at 7:46 PM Srinivas V wrote:
> Ok, I will try to create some simple code to reproduce, if I can. Problem
> is that I am adding this code in an existing big projec
of the business logic which you may want to
> redact, and provide full of source code which reproduces the bug?
>
> On Sat, Mar 28, 2020 at 8:11 PM Srinivas V wrote:
>
>> Sorry for typos , correcting them below
>>
>> On Sat, Mar 28, 2020 at 4:39 PM Srinivas V wrot
Sorry for typos , correcting them below
On Sat, Mar 28, 2020 at 4:39 PM Srinivas V wrote:
> Sorry I was just changing some names not to send exact names. Please
> ignore that. I am really struggling with this since couple of days. Can
> this happen due to
> 1. some of the values b
consistent across multiple JVM
>> runs to make it work properly, but I suspect it doesn't retain the order.
>>
>> On Fri, Mar 27, 2020 at 10:28 PM Srinivas V wrote:
>>
>>> I am listening to Kafka topic with a structured streaming application
>>> with
I am listening to Kafka topic with a structured streaming application with
Java, testing it on my local Mac.
When I retrieve back GroupState object with
state.get(), it is giving some random values for the fields in the object,
some are interchanging some are default and some are junk values.
See
from failure or killing , the group id changes, but the
> starting offset will be the last one you consumed last time .
>
> Srinivas V 于2020年3月19日周四 下午12:36写道:
>
>> Hello,
>> 1. My Kafka consumer name is randomly being generated by spark structured
>> streaming. Can I ov
Hello,
1. My Kafka consumer name is randomly being generated by spark structured
streaming. Can I override this?
2. When testing in development, when I stop my streaming job for Kafka
consumer job for couple of days and try to start back again, the job keeps
failing for missing offsets as the offse
Anyone using this combination for prod? I am planning to use for a use case
with 15000 events per second from few Kafka topics. Through events are big,
I would just have to take the businessIds, frequency, first and last event
timestamp and save this into mapGroupWithState. I need to keep them for
33 matches
Mail list logo