Re: DropNa in Spark for Columns

2021-02-26 Thread Vitali Lupusor
Hello Chetan, I don’t know about Scala, but in PySpark there is no elegant way of dropping NAs on column axis. Here is a possible solution to your problem: >>> data = [(None, 1, 2), (0, None, 2), (0, 1, 2)] >>> columns = ('A', 'B', 'C') >>> data = [(None, 1, 2), (0, None, 2), (0, 1, 2)] >>> d

DropNa in Spark for Columns

2021-02-26 Thread Chetan Khatri
Hi Users, What is equivalent of *df.dropna(axis='columns'**) *of Pandas in the Spark/Scala? Thanks

unsubscribe

2021-02-26 Thread Roland Johann
unsubscribe signature.asc Description: Message signed with OpenPGP

Re: Spark closures behavior in local mode in IDEs

2021-02-26 Thread Sheel Pancholi
Thanks Owen Agreed! The only explanation that I "made peace with" is that static/singleton Scala "object" being static/singleton natively does not require any serialization and would be available across the threads within the jvm and would require serialization only when this singleton would need

Re: Spark closures behavior in local mode in IDEs

2021-02-26 Thread Sean Owen
Yeah this is a good question. It is certainly to do with executing within the same JVM, but even I'd have to dig into the code to explain why the spark-sql version operates differently, as that also appears to be local. To be clear this 'shouldn't' work, just happens to not fail in local execution.

Re: Spark closures behavior in local mode in IDEs

2021-02-26 Thread Sheel Pancholi
I am afraid that might at best be partially true. What would explain spark-shell in local mode also throwing the same error! It should hv run fine by that logic. In digging more, it was apparent why this was happening. When you run your code simply adding libraries to your code and running in loca

Re: Spark closures behavior in local mode in IDEs

2021-02-26 Thread Lalwani, Jayesh
Yes, as you found, in local mode, Spark won’t serialize your objects. It will just pass the reference to the closure. This means that it is possible to write code that works in local mode, but doesn’t when you run distributed. From: Sheel Pancholi Date: Friday, February 26, 2021 at 4:24 AM To:

AW: Issue after change to 3.0.2

2021-02-26 Thread Bode, Meikel, NMA-CFD
Hi Sean. You are right. So we are using docker images for our spark cluster. The generation of the worker image did no succeed and therefore the old 3.0.1 image was still in use. Thanks, Best, Meikel Von: Sean Owen Gesendet: Freitag, 26. Februar 2021 10:29 An: Bode, Meikel, NMA-CFD Cc: user

Re: Issue after change to 3.0.2

2021-02-26 Thread Sean Owen
That looks to me like you have two different versions of Spark in use somewhere here. Like the cluster and driver versions aren't quite the same. Check your classpaths? On Fri, Feb 26, 2021 at 2:53 AM Bode, Meikel, NMA-CFD < meikel.b...@bertelsmann.de> wrote: > Hi All, > > > > After changing to 3

Re: Issue after change to 3.0.2

2021-02-26 Thread Mich Talebzadeh
So you have upgraded to Spark 3.0.2? How are you running your pyspark? Is this through python virtual env or spark-submit? Sounds like it cannot create executor Can you run it in local mode? spark-submit --master local[1] --deploy-mode client Check also values for PYSPARK_PYTHON and PYSPARK_D

Spark closures behavior in local mode in IDEs

2021-02-26 Thread Sheel Pancholi
Hi , I am observing weird behavior of spark and closures in local mode on my machine v/s a 3 node cluster (Spark 2.4.5). Following is the piece of code object Example { val num=5 def myfunc={ sc.parallelize(1 to 4).map(_+num).foreach(println) } } I expected this to fail regardless since

Issue after change to 3.0.2

2021-02-26 Thread Bode, Meikel, NMA-CFD
Hi All, After changing to 3.0.2 I face the following issue. Thanks for any hint on that issue. Best, Meikel df = self.spark.read.json(path_in) File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 300, in json File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_