Re: Spark 3.1.2 full thread dumps

2022-02-04 Thread Mich Talebzadeh
Indeed. Apologies for going on a tangent. view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on

Re: Spark 3.1.2 full thread dumps

2022-02-04 Thread Maksim Grinman
Not that this discussion is not interesting (it is), but this has strayed pretty far from my original question. Which was: How do I prevent spark from dumping huge Java Full Thread dumps when an executor appears to not be doing anything (in my case, there's a loop where it sleeps waiting for a

Re: Spark 3.1.2 full thread dumps

2022-02-04 Thread Mich Talebzadeh
OK basically, do we have a scenario where Spark or for that matter any cluster manager can deploy a new node (after the loss of an existing node) with the view of running the failed tasks on the new executor(s) deployed on that newly spun node? view my Linkedin profile

Re: Spark 3.1.2 full thread dumps

2022-02-04 Thread Holden Karau
We don’t block scaling up after node failure in classic Spark if that’s the question. On Fri, Feb 4, 2022 at 6:30 PM Mich Talebzadeh wrote: > From what I can see in auto scaling setup, you will always need a min of > two worker nodes as primary. It also states and I quote "Scaling primary >

Re: Spark 3.1.2 full thread dumps

2022-02-04 Thread Mich Talebzadeh
>From what I can see in auto scaling setup, you will always need a min of two worker nodes as primary. It also states and I quote "Scaling primary workers is not recommended due to HDFS limitations which result in instability while scaling. These limitations do not exist for secondary workers". So

Re: Spark 3.1 Json4s-native jar compatibility

2022-02-04 Thread Amit Sharma
Thanks Sean/Martin, my bad, Spark version was 3.0.1 so after using json 3.6.6 it fixed the issue. Thanks Amit On Fri, Feb 4, 2022 at 3:37 PM Sean Owen wrote: > My guess is that something else you depend on is actually bringing in a > different json4s, or you're otherwise mixing library/Spark

Re: Spark 3.1 Json4s-native jar compatibility

2022-02-04 Thread Sean Owen
My guess is that something else you depend on is actually bringing in a different json4s, or you're otherwise mixing library/Spark versions. Use mvn dependency:tree or equivalent on your build to see what you actually build in. You probably do not need to include json4s at all as it is in Spark

Re: Spark 3.1 Json4s-native jar compatibility

2022-02-04 Thread Amit Sharma
Martin Sean, changed it to 3.7.0-MS still getting the below error. I am still getting the same issue Exception in thread "streaming-job-executor-0" java.lang.NoSuchMethodError: org.json4s.ShortTypeHints$.apply$default$2()Ljava/lang/String; Thanks Amit On Fri, Feb 4, 2022 at 9:03 AM Martin

Re: how can I remove the warning message

2022-02-04 Thread Martin Grigorov
Hi, This is a JVM warning, as Sean explained. You cannot control it via loggers. You can disable it by passing --illegal-access=permit to java. Read more about it at https://softwaregarden.dev/en/posts/new-java/illegal-access-in-java-16/ On Sun, Jan 30, 2022 at 4:32 PM Sean Owen wrote: > This

Re: Spark 3.1 Json4s-native jar compatibility

2022-02-04 Thread Martin Grigorov
Hi, Amit said that he uses Spark 3.1, so the link should be https://github.com/apache/spark/blob/branch-3.1/pom.xml#L879 (3.7.0-M5) @Amit: check your classpath. Maybe there are more jars of this dependency. On Thu, Feb 3, 2022 at 10:53 PM Sean Owen wrote: > You can look it up: >

Re: Python performance

2022-02-04 Thread Sean Owen
Yes, in the sense that any transformation that can be expressed in the SQL-like DataFrame API will push down to the JVM, and take advantage of other optimizations, avoiding the data movement to/from Python and more. But you can't do this if you're expressing operations that are not in the

Re: Spark 3.1.2 full thread dumps

2022-02-04 Thread Sean Owen
I have not seen stack traces under autoscaling, so not even sure what the error in question is. There is always delay in acquiring a whole new executor in the cloud as it usually means a new VM is provisioned. Spark treats the new executor like any other, available for executing tasks. On Fri,

Re: Spark on K8s : property simillar to yarn.max.application.attempt

2022-02-04 Thread Mich Talebzadeh
Not as far as I know. If your driver pod fails, then you need to resubmit the job. I cannot see what else can be done? HTH view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss,

Re: Spark 3.1.2 full thread dumps

2022-02-04 Thread Mich Talebzadeh
Thanks for the info. My concern has always been on how Spark handles autoscaling (adding new executors) when the load pattern changes.I have tried to test this with setting the following parameters (Spark 3.1.2 on GCP) spark-submit --verbose \ ... --conf

Spark on K8s : property simillar to yarn.max.application.attempt

2022-02-04 Thread Pralabh Kumar
Hi Spark Team I am running spark on K8s and looking for a property/mechanism similar to yarn.max.application.attempt . I know this is not really a spark question , but i thought if anyone have faced the similar issue, Basically I want if my driver pod fails , it should be retried on a different

Re: Python performance

2022-02-04 Thread Bitfox
Please see my this test: https://blog.cloudcache.net/computing-performance-comparison-for-words-statistics/ Don’t use Python RDD, using dataframe instead. Regards On Fri, Feb 4, 2022 at 5:02 PM Hinko Kocevar wrote: > I'm looking into using Python interface with Spark and came across this >

Python performance

2022-02-04 Thread Hinko Kocevar
I'm looking into using Python interface with Spark and came across this [1] chart showing some performance hit when going with Python RDD. Data is ~ 7 years and for older version of Spark. Is this still the case with more recent Spark releases? I'm trying to understand what to expect from