[GitHub] [beam] lukecwik commented on pull request #12603: [WIP][BEAM-10670] Make SparkRunner opt-out for using an SDF powered Read transform.

2020-10-06 Thread GitBox
lukecwik commented on pull request #12603: URL: https://github.com/apache/beam/pull/12603#issuecomment-704435983 > I see, so it is the full switch from Read.Bounded/Unbounded to SDF by default. Can you get this one green so we can test it and then merge it, I would like to see if there is

[GitHub] [beam] lukecwik commented on pull request #12603: [WIP][BEAM-10670] Make SparkRunner opt-out for using an SDF powered Read transform.

2020-10-05 Thread GitBox
lukecwik commented on pull request #12603: URL: https://github.com/apache/beam/pull/12603#issuecomment-703777328 > > @iemejia Since streaming is effectively broken due to lack of support for watermark holds. What do you think about enabling SDF for Spark and it only working in batch? >

[GitHub] [beam] lukecwik commented on pull request #12603: [WIP][BEAM-10670] Make SparkRunner opt-out for using an SDF powered Read transform.

2020-10-01 Thread GitBox
lukecwik commented on pull request #12603: URL: https://github.com/apache/beam/pull/12603#issuecomment-702422811 @iemejia Since streaming is effectively broken due to lack of support for watermark holds. What do you think about enabling SDF for Spark and it only working in batch?

[GitHub] [beam] lukecwik commented on pull request #12603: [WIP][BEAM-10670] Make SparkRunner opt-out for using an SDF powered Read transform.

2020-10-01 Thread GitBox
lukecwik commented on pull request #12603: URL: https://github.com/apache/beam/pull/12603#issuecomment-702422068 > The phenomenon of microbatches producing results early I noticed it too in the past when trying to enable the Read.Unbounded tests. I could not understand why, and I thought

[GitHub] [beam] lukecwik commented on pull request #12603: [WIP][BEAM-10670] Make SparkRunner opt-out for using an SDF powered Read transform.

2020-09-18 Thread GitBox
lukecwik commented on pull request #12603: URL: https://github.com/apache/beam/pull/12603#issuecomment-695156605 @iemejia I figured out that the issue is that watermark holds aren't implemented for spark so the first batch completes which computes new watermarks so the watermark hold that

[GitHub] [beam] lukecwik commented on pull request #12603: [WIP][BEAM-10670] Make SparkRunner opt-out for using an SDF powered Read transform.

2020-09-17 Thread GitBox
lukecwik commented on pull request #12603: URL: https://github.com/apache/beam/pull/12603#issuecomment-694659139 @iemejia I have updated the code and added a `SparkProcessedKeyedElements` using `updateStateByKey` to evaluate a splittable DoFn. I based the logic off of the

[GitHub] [beam] lukecwik commented on pull request #12603: [WIP][BEAM-10670] Make SparkRunner opt-out for using an SDF powered Read transform.

2020-08-17 Thread GitBox
lukecwik commented on pull request #12603: URL: https://github.com/apache/beam/pull/12603#issuecomment-675089898 Run Spark Runner Nexmark Tests This is an automated message from the Apache Git Service. To respond to the

[GitHub] [beam] lukecwik commented on pull request #12603: [WIP][BEAM-10670] Make SparkRunner opt-out for using an SDF powered Read transform.

2020-08-17 Thread GitBox
lukecwik commented on pull request #12603: URL: https://github.com/apache/beam/pull/12603#issuecomment-675058103 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [beam] lukecwik commented on pull request #12603: [WIP][BEAM-10670] Make SparkRunner opt-out for using an SDF powered Read transform.

2020-08-17 Thread GitBox
lukecwik commented on pull request #12603: URL: https://github.com/apache/beam/pull/12603#issuecomment-675057856 Spark Runner Nexmark Tests This is an automated message from the Apache Git Service. To respond to the message,