You can always list the S3 output path, of course.
On Thu, Jun 25, 2020 at 7:52 AM Tzahi File wrote:
> Hi,
>
> I'm using pyspark to write df to s3, using the following command:
> "df.write.partitionBy("day","hour","country").mode("overwrite").parquet(s3_output)".
>
> Is there any way to get the
You can use catalog apis see following
https://stackoverflow.com/questions/54268845/how-to-check-the-number-of-partitions-of-a-spark-dataframe-without-incurring-the/54270537
On Thu, Jun 25, 2020 at 6:19 AM Tzahi File wrote:
> I don't want to query with a distinct on the partitioned columns, the
I don't want to query with a distinct on the partitioned columns, the df
contains over 1 Billion of records.
I just want to know the partitions that were created..
On Thu, Jun 25, 2020 at 4:04 PM Jörn Franke wrote:
> By doing a select on the df ?
>
> Am 25.06.2020 um 14:52 schrieb Tzahi File :
>
By doing a select on the df ?
> Am 25.06.2020 um 14:52 schrieb Tzahi File :
>
>
> Hi,
>
> I'm using pyspark to write df to s3, using the following command:
> "df.write.partitionBy("day","hour","country").mode("overwrite").parquet(s3_output)".
>
> Is there any way to get the partitions create
Hi,
I'm using pyspark to write df to s3, using the following command:
"df.write.partitionBy("day","hour","country").mode("overwrite").parquet(s3_output)".
Is there any way to get the partitions created?
e.g.
day=2020-06-20/hour=1/country=US
day=2020-06-20/hour=2/country=US
..
--
Tzahi File