Re: No space left on device

2018-08-21 Thread Gourav Sengupta
Hi, The best part about Spark is that it is showing you which configuration to tweak as well. In case you are using EMR, try to see that the configuration points to the right location in the cluster "spark.local.dir". If a disk is mounted across all the systems with a common path (you can do that

CBO not predicting cardinality on partition columns for Parquet tables

2018-08-21 Thread rajat mishra
Hi All, I have an external table in spark whose underlying data files are in parquet format. The table is partitioned. When I try to computed the statistics for a query where partition column is in where clause, the statistics returned contains only the sizeInBytes and not the no of rows count.

Insert a pyspark dataframe in postgresql

2018-08-21 Thread dimitris plakas
Hello everyone here is a case that i am facing, i have a pyspark application that as it's last step is to create a pyspark dataframe with two columns (column1, column2). This dataframe has only one row and i want this row to be inserted in a postgres db table. In every run this line in the datafra

Re: Structured Streaming on Kubernetes

2018-08-21 Thread puneetloya
Thanks for putting a comprehensive observation about Spark on Kubernetes. In mesos Spark deployment, it has a property called spark.mesos.extra.cores. The property means: * Set the extra number of cores for an executor to advertise. This does not result in more cores allocated. It instead means tha

Re: No space left on device

2018-08-21 Thread Vitaliy Pisarev
The other time when I encountered this I solved it by throwing more resources at it (stronger cluster). I was not able to understand the root cause though. I'll be happy to hear deeper insight as well. On Mon, Aug 20, 2018 at 7:08 PM, Steve Lewis wrote: > > We are trying to run a job that has pr