Re: Returning multiple PCollections from a PTransform

2020-08-04 Thread Robert Bradshaw
Yes, this is explicitly supported. You can return named tuples and dictionaries (with PCollections as values) as well. On Tue, Aug 4, 2020 at 5:00 PM Harrison Green wrote: > > Hi all, > > I've run into a situation where I would like to return two PCollections > during a PTransform. I am aware

Returning multiple PCollections from a PTransform

2020-08-04 Thread Harrison Green
Hi all, I've run into a situation where I would like to return two PCollections during a PTransform. I am aware of the ParDo.with_outputs construct but in this case, the PCollections are the flattened results of several other transforms and it would be cleaner to just return multiple PCollections

Re: Request Throttling in OSSIO

2020-08-04 Thread Praveen K Viswanathan
Thanks Luke. I will go through them and come back if I have any questions. Regards, Praveen On Tue, Aug 4, 2020 at 3:55 PM Luke Cwik wrote: > Take a look at the WatchGrowthFn[1] and also the in-progress Kafka PR[2]. > > 1: >

Re: Request Throttling in OSSIO

2020-08-04 Thread Luke Cwik
Take a look at the WatchGrowthFn[1] and also the in-progress Kafka PR[2]. 1: https://github.com/apache/beam/blob/6612b24ada9382706373db547b5606d6e0496b02/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L787 2: https://github.com/apache/beam/pull/11749 On Tue, Aug 4, 2020

Re: Request Throttling in OSSIO

2020-08-04 Thread Praveen K Viswanathan
Thanks for the suggestions Luke. As you know, we are just starting and should be able to switch to SplittableDoFn, if that's the future of Beam IO Connectors. The SplittableDoFn page has the design details but it would be great if we can look into an IO connector built using SplittableDoFn for

Re: Stateful Pardo Question

2020-08-04 Thread jmac...@godaddy.com
So, after some additional digging, it appears that Beam does not consistently check for timer expiry before calling process. The result is that it may be the case that the watermark has moved beyond your timer expiry, and if youre counting on the timer callback happening at the time you set it

Re: Unknown accumulator coder error when running cross-language SpannerIO Write

2020-08-04 Thread Boyuan Zhang
Hi Piotr, Are you using the beam master head to dev? Can you share your code? The x-lang transform can be tested with Flink runner, where SDF is also supported, such as https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/flink_runner_test.py#L205-L261 On Tue,

Re: Needed help identifying a error in running a SDF

2020-08-04 Thread Boyuan Zhang
Hi Mayank, Which runner do you want to run your pipeline? You should add 'beam_fn_api' when you launch the pipeline --experiments=beam_fn_api. In your code: class TestDoFn(beam.DoFn): def process( self, element, restriction_tracker=beam.DoFn.RestrictionParam(

Re: Making reporting bugs/feature request easier

2020-08-04 Thread Alex Amato
May I suggest we print a URL(and a message) you can use to file bugs at, in the command line when you run a beam pipeline. (And in any other user interface we use for beam, some of the runner specific UIs may want to link to this as well). On Tue, Aug 4, 2020 at 9:16 AM Alexey Romanenko wrote:

Needed help identifying a error in running a SDF

2020-08-04 Thread Mayank Ketkar
Hello Team, I was hoping to get anyones help with an error I'm encountering in running SDF. Posted the question imn stack overflow (includes code) https://stackoverflow.com/questions/63252327/error-in-running-apache-beam-python-splittabledofn However I am receiving a error RuntimeError:

Re: Chronically flaky tests

2020-08-04 Thread Robert Bradshaw
I'm in favor of a quarantine job whose tests are called out prominently as "possibly broken" in the release notes. As a follow up, +1 to exploring better tooling to track at a fine grained level exactly how flaky these test are (and hopefully detect if/when they go from flakey to just plain

Re: Git commit history: "fixup" commits

2020-08-04 Thread Udi Meiri
https://github.com/marketplace/actions/gs-commit-message-checker On Tue, Aug 4, 2020 at 10:25 AM Robert Bradshaw wrote: > +1, thanks for the reminder. > > This should be really easy to automate, using > https://developer.github.com/webhooks/event-payloads/#pull_request to > give a warning when

Re: Git commit history: "fixup" commits

2020-08-04 Thread Robert Bradshaw
+1, thanks for the reminder. This should be really easy to automate, using https://developer.github.com/webhooks/event-payloads/#pull_request to give a warning when the change history is not sufficiently "clean." I'm not sure where to host this though (or if it could be integrated into

Re: Git commit history: "fixup" commits

2020-08-04 Thread Rui Wang
+1 thanks Alexey. My apologies that I merged such a case recently (but not intentionally). I tried to use the "squash and merge" button with a consolidated commit message. After clicking the button, github showed "failed to merge" and gave a retry button, and after clicking that retry button,

Re: Unknown accumulator coder error when running cross-language SpannerIO Write

2020-08-04 Thread Piotr Szuberski
Is there a simple way to register the splittable dofn for cross-language usage? It's a bit a black box to me right now. The most meaningful logs for Flink are the ones I pasted and the following: apache_beam.utils.subprocess_server: INFO: b'[grpc-default-executor-0] WARN

Re: Git commit history: "fixup" commits

2020-08-04 Thread Alexey Romanenko
Yes, good point, thanks Valentyn. > On 4 Aug 2020, at 18:29, Valentyn Tymofieiev wrote: > > +1, thanks, Alexey. > > Also a reminder from the contributor guide: do not use the default GitHub > commit message for merge commits, which looks like: > > Merge pull request #1234 from

Could not run side_input in a streaming pipeline

2020-08-04 Thread Chuong Nguyen
Hi dev team, I want to add a side_input into a ParDo transform. Side_input is a table from Bigquery. However, I am facing a weird issue. I could run this pipeline with *DirectRunner* on a local machine but I am not able to run this pipeline with *DataflowRunner. *The error message is: ``` Job did

Re: Git commit history: "fixup" commits

2020-08-04 Thread Valentyn Tymofieiev
+1, thanks, Alexey. Also a reminder from the contributor guide: do not use the default GitHub commit message for merge commits, which looks like: Merge pull request #1234 from some_user/transient_branch_name Instead, add the commit message into the subject line, for example: "Merge pull request

Re: Making reporting bugs/feature request easier

2020-08-04 Thread Alexey Romanenko
Great topic, thanks Griselda for raising this question. I’d prefer to keep Jira as the only one main issue tracker and use other suggested ways, like emails, Git issues, web form or dedicated Slack channel, as different interfaces designed to simplify a way how users can submit an issue. But

Re: Chronically flaky tests

2020-08-04 Thread Tyson Hamilton
On Thu, Jul 30, 2020 at 6:24 PM Ahmet Altay wrote: > I like: > *Include ignored or quarantined tests in the release notes* > *Run flaky tests only in postcommit* (related? *Separate flaky tests into > quarantine job*) > The quarantine job would allow them to run in presubmit still, we would

Re: Chronically flaky tests

2020-08-04 Thread Etienne Chauchot
Hi all, +1 on ping the assigned person. For the flakes I know of (ESIO and CassandraIO), they are due to the load of the CI server. These IOs are tested using real embedded backends because those backends are complex and we need relevant tests. Counter measures have been taken (retrial

Git commit history: "fixup" commits

2020-08-04 Thread Alexey Romanenko
Hi all, I’d like to attract your attention regarding our Git commit history and related issue. A while ago I noticed that it started getting not very clear and quite verbose comparing to how it was before. We have quite significant amount of recent commits like “fix”, “address comments”,

Re: No space left on device - beam-jenkins 1 and 7

2020-08-04 Thread Damian Gadomski
I did a small research on the temporary directories. Seems that there's no one unified way of telling applications to use a specific path. Neither the guarantee that all of them will use the dedicated custom directory. Badly behaving apps could always hardcode `/tmp`, e.g. Java ;) But, we should