Caused by: org.apache.spark.SparkException: Task not serializable That's the answer :)
What are you trying to save? Is it empty or None / null? On Wed, Jan 10, 2018 at 4:58 PM, Liana Napalkova < liana.napalk...@eurecat.org> wrote: > Hello, > > > Has anybody faced the following problem in PySpark? (Python 2.7.12): > > df.show() # works fine and shows the first 5 rows of DataFrame > > df.write.parquet(outputPath + '/data.parquet', mode="overwrite") # > throws the error > > The last line throws the following error: > > py4j.protocol.Py4JJavaError: An error occurred while calling o794.parquet. > : org.apache.spark.SparkException: Job aborted. > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:215) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:173) > > Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:123) > at > org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:248) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) > > Caused by: org.apache.spark.SparkException: Task not serializable > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298) > at > org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2287) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:794) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:793) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > > Caused by: java.lang.IllegalArgumentException > at java.nio.Buffer.position(Buffer.java:244) > at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:153) > at java.nio.ByteBuffer.get(ByteBuffer.java:715) > > Caused by: java.nio.BufferUnderflowException > > at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151) > at java.nio.ByteBuffer.get(ByteBuffer.java:715) > at > org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.getBytes(Binary.java:405) > at > org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.getBytesUnsafe(Binary.java:414) > at > org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.writeObject(Binary.java:484) > at sun.reflect.GeneratedMethodAccessor48.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > > Thanks. > > L. > > ------------------------------ > DISCLAIMER: Aquest missatge pot contenir informació confidencial. Si vostè > no n'és el destinatari, si us plau, esborri'l i faci'ns-ho saber > immediatament a la següent adreça: le...@eurecat.org Si el destinatari > d'aquest missatge no consent la utilització del correu electrònic via > Internet i la gravació de missatges, li preguem que ens ho comuniqui > immediatament. > > DISCLAIMER: Este mensaje puede contener información confidencial. Si usted > no es el destinatario del mensaje, por favor bórrelo y notifíquenoslo > inmediatamente a la siguiente dirección: le...@eurecat.org Si el > destinatario de este mensaje no consintiera la utilización del correo > electrónico vía Internet y la grabación de los mensajes, rogamos lo ponga > en nuestro conocimiento de forma inmediata. > > DISCLAIMER: Privileged/Confidential Information may be contained in this > message. If you are not the addressee indicated in this message you should > destroy this message, and notify us immediately to the following address: > le...@eurecat.org. If the addressee of this message does not consent to > the use of Internet e-mail and message recording, please notify us > immediately. > ------------------------------ > > >