Re: [Spark SQL]: Aggregate Push Down / Spark 3.2

2021-11-01 Thread Kapoor, Rohit
Hi Huaxin, Thanks a lot for your response. Do I need to write a custom data source reader (in my case, for PostgreSql) using the Spark DS v2 APIs, instead of the standard spark.read.format(“jdbc”) ? Thanks, Rohit From: huaxin gao Date: Monday, 1 November 2021 at 11:32 PM To: Kapoor, Rohit C

Re: [Spark SQL]: Aggregate Push Down / Spark 3.2

2021-11-01 Thread huaxin gao
Hi Rohit, Thanks for testing this. Seems to me that you are using DS v1. We only support aggregate push down in DS v2. Could you please try again using DS v2 and let me know how it goes? Thanks, Huaxin On Mon, Nov 1, 2021 at 10:39 AM Chao Sun wrote: > > > -- Forwarded message -

[Spark SQL]: Aggregate Push Down / Spark 3.2

2021-11-01 Thread Kapoor, Rohit
Hi, I am testing the aggregate push down for JDBC after going through the JIRA - https://issues.apache.org/jira/browse/SPARK-34952 I have the latest Spark 3.2 setup in local mode (laptop). I have PostgreSQL v14 locally on my laptop. I am trying a basic aggregate query on “emp” table that has 10

[Spark DataFrame]: How to solve data skew after repartition?

2021-11-01 Thread ly
When Spark loads data into object storage systems like HDFS, S3 etc, it can result in large number of small files. To solve this problem, a common method is to repartition before writing the results. However, this may cause data skew. If the number of distinct value of the repartitioned key is l