Re: Broadcast Variable

2021-05-03 Thread Sean Owen
There is just one copy in memory. No different than if you have to variables pointing to the same dict. On Mon, May 3, 2021 at 7:54 AM Bode, Meikel, NMA-CFD < meikel.b...@bertelsmann.de> wrote: > Hi all, > > > > when broadcasting a large dict containing several million entries to > executors what

Re: Broadcast variable size limit?

2018-08-05 Thread Vadim Semenov
That’s the max size of a byte array in Java, limited by the length which is defined as integer, and in most JVMS arrays can’t hold more than Int.MaxValue - 8 elements. Other way to overcome this is to create multiple broadcast variables On Sunday, August 5, 2018, klrmowse wrote: > i don't need m

Re: Broadcast variable size limit?

2018-08-05 Thread klrmowse
i don't need more, per se... i just need to watch the size of the variable; then, if it's within the size limit, go ahead and broadcast it; if not, then i won't broadcast... so, that would be a yes then? (2 GB, or which is it exactly?) -- Sent from: http://apache-spark-user-list.1001560.n3.nabb

Re: Broadcast variable size limit?

2018-08-05 Thread Jörn Franke
I think if you need more then you should anyway think about something different than broadcast variable ... > On 5. Aug 2018, at 16:51, klrmowse wrote: > > is it currently still ~2GB (Integer.MAX_VALUE) ?? > > or am i misinformed, since that's what google-search and scouring this > mailing lis

Re: broadcast variable not picked up

2016-05-16 Thread Davies Liu
broadcast_var is only defined in foo(), I think you should have `global` for it. def foo(): global broadcast_var broadcast_var = sc.broadcast(var) On Fri, May 13, 2016 at 3:53 PM, abi wrote: > def kernel(arg): > input = broadcast_var.value + 1 > #some processing with input > > def

Re: broadcast variable get cleaned by ContextCleaner unexpectedly ?

2015-09-10 Thread swetha
Hi, How is the ContextCleaner different from spark.cleaner.ttl?Is spark.cleaner.ttl when there is ContextCleaner in the Streaming job? Thanks, Swetha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/broadcast-variable-get-cleaned-by-ContextCleaner-unexpect

Re: broadcast variable of Kafka producer throws ConcurrentModificationException

2015-08-19 Thread Marcin Kuthan
I'm glad that I could help :) 19 sie 2015 8:52 AM "Shenghua(Daniel) Wan" napisał(a): > +1 > > I wish I have read this blog earlier. I am using Java and have just > implemented a singleton producer per executor/JVM during the day. > Yes, I did see that NonSerializableException when I was debugging

Re: broadcast variable of Kafka producer throws ConcurrentModificationException

2015-08-18 Thread Shenghua(Daniel) Wan
+1 I wish I have read this blog earlier. I am using Java and have just implemented a singleton producer per executor/JVM during the day. Yes, I did see that NonSerializableException when I was debugging the code ... Thanks for sharing. On Tue, Aug 18, 2015 at 10:59 PM, Tathagata Das wrote: > I

Re: broadcast variable of Kafka producer throws ConcurrentModificationException

2015-08-18 Thread Tathagata Das
Its a cool blog post! Tweeted it! Broadcasting the configuration necessary for lazily instantiating the producer is a good idea. Nitpick: The first code example has an extra `}` ;) On Tue, Aug 18, 2015 at 10:49 PM, Marcin Kuthan wrote: > As long as Kafka producent is thread-safe you don't need

Re: broadcast variable of Kafka producer throws ConcurrentModificationException

2015-08-18 Thread Marcin Kuthan
As long as Kafka producent is thread-safe you don't need any pool at all. Just share single producer on every executor. Please look at my blog post for more details. http://allegro.tech/spark-kafka-integration.html 19 sie 2015 2:00 AM "Shenghua(Daniel) Wan" napisał(a): > All of you are right. > >

Re: broadcast variable of Kafka producer throws ConcurrentModificationException

2015-08-18 Thread Shenghua(Daniel) Wan
All of you are right. I was trying to create too many producers. My idea was to create a pool(for now the pool contains only one producer) shared by all the executors. After I realized it was related to the serializable issues (though I did not find clear clues in the source code to indicate the b

Re: broadcast variable of Kafka producer throws ConcurrentModificationException

2015-08-18 Thread Tathagata Das
Why are you even trying to broadcast a producer? A broadcast variable is some immutable piece of serializable DATA that can be used for processing on the executors. A Kafka producer is neither DATA nor immutable, and definitely not serializable. The right way to do this is to create the producer in

Re: broadcast variable of Kafka producer throws ConcurrentModificationException

2015-08-18 Thread Cody Koeninger
I wouldn't expect a kafka producer to be serializable at all... among other things, it has a background thread On Tue, Aug 18, 2015 at 4:55 PM, Shenghua(Daniel) Wan wrote: > Hi, > Did anyone see java.util.ConcurrentModificationException when using > broadcast variables? > I encountered this exce

Re: broadcast variable and accumulators issue while spark streaming checkpoint recovery

2015-07-29 Thread Tathagata Das
1. Same way, using static fields in a class. 2. Yes, same way. 3. Yes, you can do that. To differentiate from "first time" v/s "continue", you have to build your own semantics. For example, if the location in HDFS you are suppose to store the offsets does not have any data, that means its probably

Re: broadcast variable and accumulators issue while spark streaming checkpoint recovery

2015-07-29 Thread Shushant Arora
1.How to do it in java? 2.Can broadcast objects also be created in same way after checkpointing. 3.Is it safe If I disable checkpoint and write offsets at end of each batch to hdfs in mycode and somehow specify in my job to use this offset for creating kafkastream at first time. How can I specify

Re: broadcast variable and accumulators issue while spark streaming checkpoint recovery

2015-07-29 Thread Tathagata Das
Rather than using accumulator directly, what you can do is something like this to lazily create an accumulator and use it (will get lazily recreated if driver restarts from checkpoint) dstream.transform { rdd => val accum = SingletonObject.getOrCreateAccumulator() // single object method to

Re: broadcast variable question

2015-07-28 Thread Jonathan Coveney
That's great! Thanks El martes, 28 de julio de 2015, Ted Yu escribió: > If I understand correctly, there would be one value in the executor. > > Cheers > > On Tue, Jul 28, 2015 at 4:23 PM, Jonathan Coveney > wrote: > >> i am running in coarse grained mode, let's say with 8 cores per executor. >

Re: broadcast variable question

2015-07-28 Thread Ted Yu
If I understand correctly, there would be one value in the executor. Cheers On Tue, Jul 28, 2015 at 4:23 PM, Jonathan Coveney wrote: > i am running in coarse grained mode, let's say with 8 cores per executor. > > If I use a broadcast variable, will all of the tasks in that executor > share the

Re: broadcast variable get cleaned by ContextCleaner unexpectedly ?

2014-09-23 Thread RodrigoB
Could you be using by any chance the getOrCreate for the StreamingContext creation? I've seen this happen when I tried to first create the Spark context, then create the broadcast variables, and then recreate the StreamingContext from the checkpoint directory. So the worker process cannot find the

Re: broadcast variable get cleaned by ContextCleaner unexpectedly ?

2014-07-21 Thread Nan Zhu
Ah, sorry, sorry, my brain just damaged….. sent some wrong information not “spark.cores.max” but the minPartitions in sc.textFile() Best, -- Nan Zhu On Monday, July 21, 2014 at 7:17 PM, Tathagata Das wrote: > That is definitely weird. spark.core.max should not affect thing when they >

Re: broadcast variable get cleaned by ContextCleaner unexpectedly ?

2014-07-21 Thread Tathagata Das
That is definitely weird. spark.core.max should not affect thing when they are running local mode. And, I am trying to think of scenarios that could cause a broadcast variable used in the current job to fall out of scope, but they all seem very far fetched. So i am really curious to see the code w

Re: broadcast variable get cleaned by ContextCleaner unexpectedly ?

2014-07-21 Thread Nan Zhu
Hi, TD, I think I got more insights to the problem in the Jenkins test file, I mistakenly pass a wrong value to spark.cores.max, which is much larger than the expected value (I passed master address as local[6], and spark.core.max as 200) If I set a more consistent value, everything goes

Re: broadcast variable get cleaned by ContextCleaner unexpectedly ?

2014-07-21 Thread Nan Zhu
Hi, TD, Thanks for the reply I tried to reproduce this in a simpler program, but no luck However, the program has been very simple, just load some files from HDFS and write them to HBase…. --- It seems that the issue only appears when I run the unit test in Jenkins (not fail every time, i

Re: broadcast variable get cleaned by ContextCleaner unexpectedly ?

2014-07-21 Thread Tathagata Das
The ContextCleaner cleans up data and metadata related to RDDs and broadcast variables, only when those variables are not in scope and get garbage-collected by the JVM. So if the broadcast variable in question is probably somehow going out of scope even before the job using the broadcast variable i

Re: Broadcast variable in Spark Java application

2014-07-07 Thread Cesar Arevalo
Hi Praveen: It may be easier for other people to help you if you provide more details about what you are doing. It may be worthwhile to also mention which spark version you are using. And if you can share the code which doesn't work for you, that may also give others more clues as to what you a