[ https://issues.apache.org/jira/browse/SPARK-23842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
holdenk resolved SPARK-23842. ----------------------------- Resolution: Won't Fix Not supported by the current design, alternatives do exist though. > accessing java from PySpark lambda functions > -------------------------------------------- > > Key: SPARK-23842 > URL: https://issues.apache.org/jira/browse/SPARK-23842 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 2.2.0, 2.2.1, 2.3.0 > Reporter: Ruslan Dautkhanov > Priority: Major > Labels: cloudpickle, py4j, pyspark > > Copied from https://github.com/bartdag/py4j/issues/311 but it seems to be > more of a Spark issue than py4j.. > |We have a third-party Java library that is distributed to Spark executors > through {{--jars}} parameter. > We want to call a static Java method in that library on executor's side > through Spark's {{map()}} or create an object of that library's class through > {{mapPartitions()}} call. > None of the approaches worked so far. It seems Spark tries to serialize > everything it sees in a lambda function, distribute to executors etc. > I am aware of an older py4j issue/question > [#171|https://github.com/bartdag/py4j/issues/171] but looking at that > discussion isn't helpful. > We thought to create a reference to that "class" through a call like > {{genmodel = spark._jvm.hex.genmodel}}and then operate through py4j to expose > functionality of that library in pyspark executors' lambda functions. > We don't want Spark to try to serialize spark session variables "spark" nor > its reference to py4j gateway {{spark._jvm}} (because it leads to expected > non-serializable exceptions), so tried to "trick" Spark not to try to > serialize those by nested the above {{genmodel = spark._jvm.hex.genmodel}} > into {{exec()}} call. > It led to another issue that {{spark}} (spark session) nor {{sc}} (spark > context) variables seems not available in spark executors' lambda functions. > So we're stuck and don't know how to call a generic java class through py4j > on executor's side (from within {{map}} or {{mapPartitions}} lambda > functions). > It would be an easier adventure from Scala/ Java for Spark as those can > directly call that 3rd-party libraries methods, but our users ask to have a > way to do the same from PySpark.| -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org