+1, thank you Dongjoon
Thanks
Jie Yang
On 2025/08/26 02:52:35 Kent Yao wrote:
> +1, thank you Dongjoon
>
> Cheng Pan 于2025年8月26日周二 10:15写道:
>
> > +1, thank you for driving this.
> >
> > Thanks,
> > Cheng Pan
> >
> >
> >
> > On Aug 26, 2025, at 00:31, Dongjoon Hyun wrote:
> >
> > Hi, All.
> >
+1, thank you Dongjoon
Cheng Pan 于2025年8月26日周二 10:15写道:
> +1, thank you for driving this.
>
> Thanks,
> Cheng Pan
>
>
>
> On Aug 26, 2025, at 00:31, Dongjoon Hyun wrote:
>
> Hi, All.
>
> Since the Apache Spark 4.0.0 tag was created in May, more than three
> months have passed.
>
> https://g
+1, thank you for driving this.
Thanks,
Cheng Pan
> On Aug 26, 2025, at 00:31, Dongjoon Hyun wrote:
>
> Hi, All.
>
> Since the Apache Spark 4.0.0 tag was created in May, more than three months
> have passed.
>
> https://github.com/apache/spark/releases/tag/v4.0.0 (2025-05-19)
>
> So f
Hi Pedro,
Hi Pedro,
Glad it helped
A couple of quick hints while you implement:
1) Configurable padding + N manifests
- Add two knobs (defaults shown):
- stateStore.rocksdb.gc.paddingMs = 12 (HDFS: 60–120s; S3/GCS:
120–300s)
- stateStore.rocksdb.gc.protectedVersions = 3 (union o
+1 Thank you, @Dongjoon Hyun
man. 25. aug. 2025 kl. 18:32 skrev Dongjoon Hyun :
> Hi, All.
>
> Since the Apache Spark 4.0.0 tag was created in May, more than three
> months have passed.
>
> https://github.com/apache/spark/releases/tag/v4.0.0 (2025-05-19)
>
> So far, 124 commits (mostly bug f
Thanks. I think a relatively simple fix can be to include the zip file's
modification time in the filtering condition too. If the SST's modification
timestamp is earlier than any version x's zip file modification time, it is
kept.
Thanks,
Siying
On Mon, Aug 25, 2025 at 11:29 AM Pedro Miguel Duar
Hi Siying, thanks for your reply.
We currently run with "spark.speculation: false" so it is not speculative
execution. This is because the partition gets assigned to two different
executors on subsequent stages. In StateStore.scala in the doMaintenance()
function provide.doMaintenance() is called
Thanks for your reply!
Yes this helps. I think adding a time padding will help prevent deleting
files that are incorrectly labeled as orphaned in the current
implementation. This only happens if two executors run maintenance at
nearly the exact same time. I'll look into implementing a fix.
On Mon
I suspect that this problem will be mitigated with checkpoint structure V2
( https://issues.apache.org/jira/browse/SPARK-49374
https://github.com/apache/spark/blob/bc36a7db43f287af536bb2767d7d9f1d70bc799f/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2656
). The motivatio
In your statement
"*Instead of simply older, should there be some padding to allow for
maintenance being executed simultaneously on two executors? Something like
at least 60s older than the oldest tracked file."*
*What you need to do is to add a time padding before deleting orphans which
is a goo
Hi, All.
Since the Apache Spark 4.0.0 tag was created in May, more than three months
have passed.
https://github.com/apache/spark/releases/tag/v4.0.0 (2025-05-19)
So far, 124 commits (mostly bug fixes) have been merged into the branch-4.0
branch.
$ git log --oneline v4.0.0...HEAD | wc -
Hi all,
I opened a sub-task (SPARK-53368) of SPARK-51162 to track future discussions.
Here's a link[1] to the new JIRA issue. I created a subtask of SPARK-51162
instead of SPARK-51342 since the latter is already a subtask.
Thanks for taking the time to consider this enhancement!
Best Regards,
12 matches
Mail list logo