Re: Static partitioning in partitionBy()

2019-05-07 Thread Felix Cheung
You could df.filter(col(“c”) = “c1”).write().partitionBy(“c”).save It could get some data skew problem but might work for you From: Burak Yavuz Sent: Tuesday, May 7, 2019 9:35:10 AM To: Shubham Chaurasia Cc: dev; u...@spark.apache.org Subject: Re: Static

Need guidance on Spark Session Termination.

2019-05-07 Thread Nasrulla Khan Haris
Hi fellow Spark-devs, I am pretty new to spark core and I am looking for some answers to my use case. I have a datasource v2 api connector, In my connector we create temporary files on the blob storage. Can you please suggest places where I can look if I want to delete the temporary files on

Re: Hive Hash in Spark

2019-05-07 Thread Bruce Robbins
Mildly off-topic: >From a *correctness* perspective only, it seems Spark can read bucketed Hive tables just fine. I am ignoring the fact that Spark doesn't take advantage of the bucketing. Is that a fair assessment? Or is it more complicated than that? Also, Spark has code to prevent an

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-07 Thread Bobby Evans
I am +! On Tue, May 7, 2019 at 1:37 PM Thomas graves wrote: > Hi everyone, > > I'd like to call for another vote on SPARK-27396 - SPIP: Public APIs > for extended Columnar Processing Support. The proposal is to extend > the support to allow for more columnar processing. We had previous > vote

[VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-07 Thread Thomas graves
Hi everyone, I'd like to call for another vote on SPARK-27396 - SPIP: Public APIs for extended Columnar Processing Support. The proposal is to extend the support to allow for more columnar processing. We had previous vote and discussion threads and have updated the SPIP based on the comments to

Re: Static partitioning in partitionBy()

2019-05-07 Thread Burak Yavuz
It depends on the data source. Delta Lake (https://delta.io) allows you to do it with the .option("replaceWhere", "c = c1"). With other file formats, you can write directly into the partition directory (tablePath/c=c1), but you lose atomicity. On Tue, May 7, 2019, 6:36 AM Shubham Chaurasia

Static partitioning in partitionBy()

2019-05-07 Thread Shubham Chaurasia
Hi All, Is there a way I can provide static partitions in partitionBy()? Like: df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save Above code gives following error as it tries to find column `c=c1` in df. org.apache.spark.sql.AnalysisException: Partition column `c=c1`

Re: [METRICS] Metrics names inconsistent between executions

2019-05-07 Thread Stavros Kontopoulos
Hi, With jmx_exporter and Prometheus you can always re-write the metrics patterns on the fly. Btw if you use Grafana its easy to filter things even without the re-write. If this is a custom dashboard you can always group metrics based on the