There is just one copy in memory. No different than if you have to
variables pointing to the same dict.
On Mon, May 3, 2021 at 7:54 AM Bode, Meikel, NMA-CFD <
meikel.b...@bertelsmann.de> wrote:
> Hi all,
>
>
>
> when broadcasting a large dict containing several million entries to
> executors what
That’s the max size of a byte array in Java, limited by the length which is
defined as integer, and in most JVMS arrays can’t hold more than
Int.MaxValue - 8 elements. Other way to overcome this is to create multiple
broadcast variables
On Sunday, August 5, 2018, klrmowse wrote:
> i don't need m
i don't need more, per se... i just need to watch the size of the variable;
then, if it's within the size limit, go ahead and broadcast it; if not, then
i won't broadcast...
so, that would be a yes then? (2 GB, or which is it exactly?)
--
Sent from: http://apache-spark-user-list.1001560.n3.nabb
I think if you need more then you should anyway think about something different
than broadcast variable ...
> On 5. Aug 2018, at 16:51, klrmowse wrote:
>
> is it currently still ~2GB (Integer.MAX_VALUE) ??
>
> or am i misinformed, since that's what google-search and scouring this
> mailing lis
broadcast_var is only defined in foo(), I think you should have `global` for it.
def foo():
global broadcast_var
broadcast_var = sc.broadcast(var)
On Fri, May 13, 2016 at 3:53 PM, abi wrote:
> def kernel(arg):
> input = broadcast_var.value + 1
> #some processing with input
>
> def
Hi,
How is the ContextCleaner different from spark.cleaner.ttl?Is
spark.cleaner.ttl when there is ContextCleaner in the Streaming job?
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/broadcast-variable-get-cleaned-by-ContextCleaner-unexpect
I'm glad that I could help :)
19 sie 2015 8:52 AM "Shenghua(Daniel) Wan"
napisał(a):
> +1
>
> I wish I have read this blog earlier. I am using Java and have just
> implemented a singleton producer per executor/JVM during the day.
> Yes, I did see that NonSerializableException when I was debugging
+1
I wish I have read this blog earlier. I am using Java and have just
implemented a singleton producer per executor/JVM during the day.
Yes, I did see that NonSerializableException when I was debugging the code
...
Thanks for sharing.
On Tue, Aug 18, 2015 at 10:59 PM, Tathagata Das wrote:
> I
Its a cool blog post! Tweeted it!
Broadcasting the configuration necessary for lazily instantiating the
producer is a good idea.
Nitpick: The first code example has an extra `}` ;)
On Tue, Aug 18, 2015 at 10:49 PM, Marcin Kuthan
wrote:
> As long as Kafka producent is thread-safe you don't need
As long as Kafka producent is thread-safe you don't need any pool at all.
Just share single producer on every executor. Please look at my blog post
for more details. http://allegro.tech/spark-kafka-integration.html
19 sie 2015 2:00 AM "Shenghua(Daniel) Wan"
napisał(a):
> All of you are right.
>
>
All of you are right.
I was trying to create too many producers. My idea was to create a pool(for
now the pool contains only one producer) shared by all the executors.
After I realized it was related to the serializable issues (though I did
not find clear clues in the source code to indicate the b
Why are you even trying to broadcast a producer? A broadcast variable is
some immutable piece of serializable DATA that can be used for processing
on the executors. A Kafka producer is neither DATA nor immutable, and
definitely not serializable.
The right way to do this is to create the producer in
I wouldn't expect a kafka producer to be serializable at all... among other
things, it has a background thread
On Tue, Aug 18, 2015 at 4:55 PM, Shenghua(Daniel) Wan wrote:
> Hi,
> Did anyone see java.util.ConcurrentModificationException when using
> broadcast variables?
> I encountered this exce
1. Same way, using static fields in a class.
2. Yes, same way.
3. Yes, you can do that. To differentiate from "first time" v/s "continue",
you have to build your own semantics. For example, if the location in HDFS
you are suppose to store the offsets does not have any data, that means its
probably
1.How to do it in java?
2.Can broadcast objects also be created in same way after checkpointing.
3.Is it safe If I disable checkpoint and write offsets at end of each batch
to hdfs in mycode and somehow specify in my job to use this offset for
creating kafkastream at first time. How can I specify
Rather than using accumulator directly, what you can do is something like
this to lazily create an accumulator and use it (will get lazily recreated
if driver restarts from checkpoint)
dstream.transform { rdd =>
val accum = SingletonObject.getOrCreateAccumulator() // single object
method to
That's great! Thanks
El martes, 28 de julio de 2015, Ted Yu escribió:
> If I understand correctly, there would be one value in the executor.
>
> Cheers
>
> On Tue, Jul 28, 2015 at 4:23 PM, Jonathan Coveney > wrote:
>
>> i am running in coarse grained mode, let's say with 8 cores per executor.
>
If I understand correctly, there would be one value in the executor.
Cheers
On Tue, Jul 28, 2015 at 4:23 PM, Jonathan Coveney
wrote:
> i am running in coarse grained mode, let's say with 8 cores per executor.
>
> If I use a broadcast variable, will all of the tasks in that executor
> share the
Could you be using by any chance the getOrCreate for the StreamingContext
creation?
I've seen this happen when I tried to first create the Spark context, then
create the broadcast variables, and then recreate the StreamingContext from
the checkpoint directory. So the worker process cannot find the
Ah, sorry, sorry, my brain just damaged….. sent some wrong information
not “spark.cores.max” but the minPartitions in sc.textFile()
Best,
--
Nan Zhu
On Monday, July 21, 2014 at 7:17 PM, Tathagata Das wrote:
> That is definitely weird. spark.core.max should not affect thing when they
>
That is definitely weird. spark.core.max should not affect thing when they
are running local mode.
And, I am trying to think of scenarios that could cause a broadcast
variable used in the current job to fall out of scope, but they all seem
very far fetched. So i am really curious to see the code w
Hi, TD,
I think I got more insights to the problem
in the Jenkins test file, I mistakenly pass a wrong value to spark.cores.max,
which is much larger than the expected value
(I passed master address as local[6], and spark.core.max as 200)
If I set a more consistent value, everything goes
Hi, TD,
Thanks for the reply
I tried to reproduce this in a simpler program, but no luck
However, the program has been very simple, just load some files from HDFS and
write them to HBase….
---
It seems that the issue only appears when I run the unit test in Jenkins (not
fail every time, i
The ContextCleaner cleans up data and metadata related to RDDs and
broadcast variables, only when those variables are not in scope and get
garbage-collected by the JVM. So if the broadcast variable in question is
probably somehow going out of scope even before the job using the broadcast
variable i
Hi Praveen:
It may be easier for other people to help you if you provide more details about
what you are doing. It may be worthwhile to also mention which spark version
you are using. And if you can share the code which doesn't work for you, that
may also give others more clues as to what you a
25 matches
Mail list logo