[GitHub] [beam] iemejia commented on pull request #13021: [BEAM-10670] Make Spark by default execute Read.Bounded using SplittableDoFn.

2020-10-09 Thread GitBox
iemejia commented on pull request #13021: URL: https://github.com/apache/beam/pull/13021#issuecomment-705812846 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [beam] iemejia commented on pull request #13021: [BEAM-10670] Make Spark by default execute Read.Bounded using SplittableDoFn.

2020-10-08 Thread GitBox
iemejia commented on pull request #13021: URL: https://github.com/apache/beam/pull/13021#issuecomment-705822583 > @iemejia How the number of partitions is calculated different during the SDF initial split then what we do with SourceRDD. If I understood correctly the initial split is

[GitHub] [beam] iemejia commented on pull request #13021: [BEAM-10670] Make Spark by default execute Read.Bounded using SplittableDoFn.

2020-10-08 Thread GitBox
iemejia commented on pull request #13021: URL: https://github.com/apache/beam/pull/13021#issuecomment-705812846 Yes definitely! For the watermark part of my comment what I was eexpecting is that because we now need to [deal with WatermarkEstimator + ProcessContinuation](https://gith

[GitHub] [beam] iemejia commented on pull request #13021: [BEAM-10670] Make Spark by default execute Read.Bounded using SplittableDoFn.

2020-10-07 Thread GitBox
iemejia commented on pull request #13021: URL: https://github.com/apache/beam/pull/13021#issuecomment-704825742 I am comparing the results of current master vs this PR in batch mode and the improvements are so big that I am even confused of how can it be so different, is it partitioning le

[GitHub] [beam] iemejia commented on pull request #13021: [BEAM-10670] Make Spark by default execute Read.Bounded using SplittableDoFn.

2020-10-07 Thread GitBox
iemejia commented on pull request #13021: URL: https://github.com/apache/beam/pull/13021#issuecomment-704808515 Run Spark Runner Nexmark Tests This is an automated message from the Apache Git Service. To respond to the messag

[GitHub] [beam] iemejia commented on pull request #13021: [BEAM-10670] Make Spark by default execute Read.Bounded using SplittableDoFn.

2020-10-06 Thread GitBox
iemejia commented on pull request #13021: URL: https://github.com/apache/beam/pull/13021#issuecomment-704528075 Run Spark Runner Nexmark Tests This is an automated message from the Apache Git Service. To respond to the messag