Hi Huaxin,
Thanks a lot for your response. Do I need to write a custom data source reader
(in my case, for PostgreSql) using the Spark DS v2 APIs, instead of the
standard spark.read.format(“jdbc”) ?
Thanks,
Rohit
From: huaxin gao
Date: Monday, 1 November 2021 at 11:32 PM
To: Kapoor, Rohit
C
Hi Rohit,
Thanks for testing this. Seems to me that you are using DS v1. We only
support aggregate push down in DS v2. Could you please try again using DS
v2 and let me know how it goes?
Thanks,
Huaxin
On Mon, Nov 1, 2021 at 10:39 AM Chao Sun wrote:
>
>
> -- Forwarded message -
Hi,
I am testing the aggregate push down for JDBC after going through the JIRA -
https://issues.apache.org/jira/browse/SPARK-34952
I have the latest Spark 3.2 setup in local mode (laptop).
I have PostgreSQL v14 locally on my laptop. I am trying a basic aggregate query
on “emp” table that has 10
When Spark loads data into object storage systems like HDFS, S3 etc, it can
result in large number of small files. To solve this problem, a common method
is to repartition before writing the results. However, this may cause data
skew. If the number of distinct value of the repartitioned key is l