Hello,

Has anybody faced the following problem in PySpark? (Python 2.7.12):

    df.show() # works fine and shows the first 5 rows of DataFrame

    df.write.parquet(outputPath + '/data.parquet', mode="overwrite")  # throws 
the error

The last line throws the following error:


py4j.protocol.Py4JJavaError: An error occurred while calling o794.parquet.
: org.apache.spark.SparkException: Job aborted.
        at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:215)
        at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
        at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
        at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:173)

Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult:
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
        at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:123)
        at 
org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:248)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)

Caused by: org.apache.spark.SparkException: Task not serializable
        at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
        at 
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
        at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
        at org.apache.spark.SparkContext.clean(SparkContext.scala:2287)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:794)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:793)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)

Caused by: java.lang.IllegalArgumentException
        at java.nio.Buffer.position(Buffer.java:244)
        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:153)
        at java.nio.ByteBuffer.get(ByteBuffer.java:715)

Caused by: java.nio.BufferUnderflowException

        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151)
        at java.nio.ByteBuffer.get(ByteBuffer.java:715)
        at 
org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.getBytes(Binary.java:405)
        at 
org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.getBytesUnsafe(Binary.java:414)
        at 
org.apache.parquet.io.api.Binary$ByteBufferBackedBinary.writeObject(Binary.java:484)
        at sun.reflect.GeneratedMethodAccessor48.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)

Thanks.

L.

________________________________
DISCLAIMER: Aquest missatge pot contenir informació confidencial. Si vostè no 
n'és el destinatari, si us plau, esborri'l i faci'ns-ho saber immediatament a 
la següent adreça: le...@eurecat.org Si el destinatari d'aquest missatge no 
consent la utilització del correu electrònic via Internet i la gravació de 
missatges, li preguem que ens ho comuniqui immediatament.

DISCLAIMER: Este mensaje puede contener información confidencial. Si usted no 
es el destinatario del mensaje, por favor bórrelo y notifíquenoslo 
inmediatamente a la siguiente dirección: le...@eurecat.org Si el destinatario 
de este mensaje no consintiera la utilización del correo electrónico vía 
Internet y la grabación de los mensajes, rogamos lo ponga en nuestro 
conocimiento de forma inmediata.

DISCLAIMER: Privileged/Confidential Information may be contained in this 
message. If you are not the addressee indicated in this message you should 
destroy this message, and notify us immediately to the following address: 
le...@eurecat.org. If the addressee of this message does not consent to the use 
of Internet e-mail and message recording, please notify us immediately.
________________________________


Reply via email to