Not sure I follow. If my output is my/path/output then the spark metadata
will be written to my/path/output/_spark_metadata. All my data will also be
stored under my/path/output so there's no way to split it?
On Thu, Apr 13, 2023 at 1:14 PM "Yuri Oleynikov (יורי אולייניקוב)" <
yur...@gmail.
Yeah but can’t you use following?1 . For data files: My/path/part-2. For partitioned data: my/path/partition=Best regardsOn 13 Apr 2023, at 12:58, Yuval Itzchakov wrote:The problem is that specifying two lifecycle policies for the same path, the one with the shorter retention wins :(https://docs.
The problem is that specifying two lifecycle policies for the same path,
the one with the shorter retention wins :(
https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html#lifecycle-config-conceptual-ex4
"You might specify an S3 Lifecycle configuration in which
My naïve assumption that specifying lifecycle policy for _spark_metadata with
longer retention will solve the issue
Best regards
> On 13 Apr 2023, at 11:52, Yuval Itzchakov wrote:
>
>
> Hi everyone,
>
> I am using Sparks FileStreamSink in order to write files to S3. On the S3
> bucket, I
Hi everyone,
I am using Sparks FileStreamSink in order to write files to S3. On the S3
bucket, I have a lifecycle policy that deletes data older than X days back
from the bucket in order for it to not infinitely grow. My problem starts
with Spark jobs that don't have frequent data. What will happe