Re: [ANNOUNCE] New committer: Hannah Jiang

2020-01-29 Thread Robert Bradshaw
Congratulations, Hannah! On Wed, Jan 29, 2020 at 3:23 PM Chamikara Jayalath wrote: > Congrats Hannah! > > On Wed, Jan 29, 2020 at 9:22 AM Hannah Jiang > wrote: > >> Thanks everyone! >> It is a very rewarding journey and I am happy to be able to achieve a >> mini milestone. :) >> >> >> On Wed,

Re: [DISCUSS] Autoformat python code with Black

2020-01-27 Thread Robert Bradshaw
precommit job that >>> fails if any unformatted code is detected looks like too strict. What do >>> you think? >>> >>> On Thu, Jan 23, 2020 at 8:37 PM Robert Bradshaw wrote: >>>> >>>> Thanks! Now we get to debate what knobs to twiddle :

Re: [DISCUSS] Autoformat python code with Black

2020-01-23 Thread Robert Bradshaw
;> iteration. We will skip some of conversations about code style. >>>>>>>>> >>>> >>>>>>>>> >>>> ... >>>>>>>>> >>>>> >>>>>>>>> >>>>&g

Re: Updating Metrics Counter in user defined thread

2020-01-21 Thread Robert Bradshaw
he thread local setup is that parallelism >> is typically handled by the Beam, rather than introducing a separate >> threading model. Though, perhaps breaking out of this threading model is >> more common than we initially thought. >> >> I hope thats helpful, sorry we d

Re: [DISCUSS] Autoformat python code with Black

2020-01-21 Thread Robert Bradshaw
ires Python 3 to be run. I don’t know how >>>>> big obstacle it would be. >>>>> >>>>> >>>>> I believe there are two options how it would be possible to introduce >>>>> Black. First: just do it, it will hurt but then it would be

Re: [DISCUSS] Integrate Google Cloud AI functionalities

2020-01-21 Thread Robert Bradshaw
The current state is that it works, and a large amount of testing is being added [1], but the public API is still in flux (especially the java-as-callee side [2], and the specification of dependencies [3,4]). It is being actively worked on though. [1] https://github.com/apache/beam/pull/10051 [2]

Re: [VOTE] Release 2.18.0, release candidate #1

2020-01-21 Thread Robert Bradshaw
ranch and tag JIRA > issues with the all relevant releases that should be blocked on it. > > Ahmet > > On Tue, Jan 21, 2020 at 11:36 AM Udi Meiri wrote: >> >> I was not aware of https://issues.apache.org/jira/browse/BEAM-9123 or the PR >> on the release br

Re: [VOTE] Release 2.18.0, release candidate #1

2020-01-21 Thread Robert Bradshaw
The source tarball seems to be missing the commit at https://github.com/apache/beam/commit/a61dfbf4570e3adb30e15315c116751faeda897e On Tue, Jan 21, 2020 at 9:49 AM Ahmet Altay wrote: > > All, could you help with validations and voting? > > On Wed, Jan 15, 2020 at 6:14 PM Ahmet Altay wrote: >>

Re: Updating Metrics Counter in user defined thread

2020-01-17 Thread Robert Bradshaw
Yes, this is an issue with how counters are implemented, and there's no good workaround. (We could use inheritable thread locals in Java, but that assumes the lifetime of the thread does not outlive the lifetime of the DoFn, and would probably work poorly with threadpools). In the meantime, one

Re: Ordering of element timestamp change and window function

2020-01-16 Thread Robert Bradshaw
en the actual timestamp is the same. Semantically, emitting an element or >> a timestamped value with the same timestamp should have the same behaviour. >> >> What do you think? >> >> >> On Wed, Jan 15, 2020 at 4:04 PM Robert Bradshaw wrote: >>> >>

Re: Ordering of element timestamp change and window function

2020-01-15 Thread Robert Bradshaw
If an element is emitted with a timestamp, the window assignment is re-applied at that time. At least that's how it is in Python. You can emit the full windowed value (accepted without checking...), a timestamped value (in which case the window will be computed), or a plain old element (in which

Re: [PROPOSAL] Transition released containers to the official ASF dockerhub organization

2020-01-15 Thread Robert Bradshaw
gt;>> >>> Are there any concerns with this proposal? >>> >>> Thanks, >>> Hannah >>> >>> >>> >>> >>> On Fri, Jan 10, 2020 at 4:19 PM Ahmet Altay wrote: >>>> >>>> >>>> >>>&

Re: No AfterWatermark firings in Dataflow

2020-01-13 Thread Robert Bradshaw
I think AfterWatermark in particular should *alway* produce an ON_TIME pane, regardless of whether there were early panes. (It's less clear with non-watermark triggers like after count or processing time.) This makes it feel like the on time behavior is a property of the trigger, not the windowing

Re: [DISCUSS] Python static type checkers

2020-01-13 Thread Robert Bradshaw
On Mon, Jan 13, 2020 at 5:34 PM Chad Dombrova wrote: >> >> Pytype seems to detect attribute errors that mypy has not, so it acts as a >> kind-of linter in this case. >> Examples: >> https://github.com/apache/beam/pull/10528/files#diff-0cb34b4622b0b7d7256d28b1ee1d52fc >>

Re: Ask about beam pull requests

2020-01-13 Thread Robert Bradshaw
One thing you could do is ask for a history [1] of the file and see if there are any possible candidates (e.g. apache beam comitters [2]). [1] https://github.com/ocworld/beam/blame/259f6174ce52e6317a5b4fe7ed3a126153d3/sdks/python/apache_beam/io/aws/clients/s3/boto3_client.py [2]

Re: Cleaning up SDK docker image tagging

2020-01-10 Thread Robert Bradshaw
e doc you linked already mentions how to customize tags, > maybe we could also recommend the user always makes their own tag whenever > changing a released image. I think we should discourage checking out the code and modifying the docker file in pace, but that's another discussion. > On Fri,

Re: Cleaning up SDK docker image tagging

2020-01-10 Thread Robert Bradshaw
On Fri, Jan 10, 2020 at 12:48 PM Kyle Weaver wrote: > > > Shall we ALSO tag the image with git commit version for local build to keep > > track of obsolete images. > > This would mean we would have to be able to access the git commit from the > source, which might not be trivial (right now the

Re: [PROPOSAL] Transition released containers to the official ASF dockerhub organization

2020-01-10 Thread Robert Bradshaw
One downside is that, unlike many of these projects, we release a dozen or so containers. Is there exactly (and only) one level of namespacing/nesting we can leverage here? (This isn't a blocker, but something to consider.) On Fri, Jan 10, 2020 at 2:06 PM Hannah Jiang wrote: > > Thanks Ahmet for

Re: release scripts as interactive notebooks?

2020-01-10 Thread Robert Bradshaw
+1 to automating more, at least the creation and validation of release artifacts should all be completely automated. However signing should still be done by an individual--that's not something that (semantically) should be automated away. As much as I am a fan of jupyter notebooks, I think the

Re: [DISCUSS] Python static type checkers

2020-01-08 Thread Robert Bradshaw
I am fine with adding this as a linter. I would not want to block either (let alone both) until we have some experience with them. Hopefully, if our code is clean and correctly typed, it should pass both. Where it doesn't, I'm hopeful that the looseness provided by gradual typing will allow us to

Re: Jenkins jobs not running for my PR 10438

2020-01-07 Thread Robert Bradshaw
I agree. If this can't be done, perhaps we could have a basic suite of smoke tests (at least) run on TravisCI. On Tue, Jan 7, 2020 at 2:53 PM Kenneth Knowles wrote: > This new policy seems pretty unwelcoming. I would like to work with INFRA > to see if we can set up a sufficient sandbox that

Re: Python IO Connector

2020-01-06 Thread Robert Bradshaw
On Mon, Jan 6, 2020 at 1:39 PM Chamikara Jayalath wrote: > Regarding cross-language transforms, we need to add better documentation, > but for now you'll have to go with existing examples and tests. For example, > > >

Re: [VOTE] Release 2.17.0, release candidate #2

2020-01-06 Thread Robert Bradshaw
Thanks. That's the right one. The signatures (and everything else) all look good now. Changing my vote to a +1. On Mon, Jan 6, 2020 at 9:13 AM Mikhail Gryzykhin wrote: > KEYS files should be fixed now. > > On Mon, Jan 6, 2020 at 8:29 AM Robert Bradshaw > wrote: > >> Ye

Re: Dropping late data in DirectRunner

2020-01-03 Thread Robert Bradshaw
I agree, in fact we just recently enabled late data dropping to the direct runner in Python to be able to develop better tests for Dataflow. It should be noted, however, that in a distributed runner (absent the quiessence of TestStream) that one can't *count* on late data being dropped at a

Re: [VOTE] Release 2.17.0, release candidate #2

2020-01-02 Thread Robert Bradshaw
(Other than that everything looks fine.) On Thu, Jan 2, 2020 at 4:44 PM Robert Bradshaw wrote: > > -1 > > I'm having trouble verifying the signatures on the release artifacts. > When I try to import the key from > https://dist.apache.org/repos/dist/release/beam/KEYS I get >

Re: [VOTE] Release 2.17.0, release candidate #2

2020-01-02 Thread Robert Bradshaw
-1 I'm having trouble verifying the signatures on the release artifacts. When I try to import the key from https://dist.apache.org/repos/dist/release/beam/KEYS I get pub rsa4096 2019-10-22 [SC] 79552F5C2FD869A08E097F96841855FB73AFFC7F uid [ unknown] Mikhail Gryzykhin (mikhail)

Re: [BEAM-9015] Adding pyXX-cloud instead of pyXX-gcp and pyXX-aws

2019-12-23 Thread Robert Bradshaw
Makes sense to me. On Mon, Dec 23, 2019 at 3:33 PM Pablo Estrada wrote: > > Hi all, > a couple of contributors [1][2] have been kind enough to add support for s3 > filesystem[3] for the Python SDK. Part of this involved adding a tox task > called py37-aws, to install the relevant dependencies

Re: [PROPOSAL] python precommit timeouts

2019-12-20 Thread Robert Bradshaw
On Fri, Dec 20, 2019 at 3:15 PM Udi Meiri wrote: > > ITs will have a different timeout, but they're still not migrated to pytest > so unaffected at the moment. > > So I created a PR and already seemed to find an issue. One test timed out > while scanning the local filesystem. > It seems that it

Re: Is org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders supposed to be supported ?

2019-12-20 Thread Robert Bradshaw
The problem here is that T and Nullable are two different types, but are not distinguished as such in the Java type system (and hence are treated interchangeably there), modulo certain cases where one can use a @Nullable annotation). They also have incompatible encodings. In my view, it is the

Re: Performance drops in Python PortableRunner tests

2019-12-20 Thread Robert Bradshaw
Yes, it is possible that this had an influence--Reads are now all implemented as SDFs and Creates involve a reshuffle to better redistribute data. This much of a change is quite surprising. Where is the pipeline for, say, "Python | ParDo | 2GB, 100 byte records, 10 iterations | Batch" and how does

Re: Artifact staging in cross-language pipelines

2019-12-17 Thread Robert Bradshaw
gt;> >>> > >> It's rather high-level. We may want to add more details once >>> we have >>> > >> finalized the design. Feel free to make comments and edits. >>> > > >>> > > >>> >

Re: Root logger configuration

2019-12-17 Thread Robert Bradshaw
The generally expected behavior is that if you don't do anything, logging goes to stderr. Logging to non-root loggers breaks this. (Arguably it's a bug in the Python logging libraries to have this inconsistency, but so be it...) On the other hand, if you do set something up, that is respected. I

Re: [DISCUSS] BIP reloaded

2019-12-16 Thread Robert Bradshaw
Additional process is a two-edged sword: it can help move stuff forward, to the correct decision, but it can also add significant overhead. I think there are many proposals for which the existing processes of deriving consensus (over email, possibly followed by a formal vote or lazy consensus)

Re: Root logger configuration

2019-12-13 Thread Robert Bradshaw
ers do? > Best > -P. > > On Fri, Dec 13, 2019 at 10:55 AM Robert Bradshaw wrote: >> >> Thanks for looking into this. >> >> I'm not sure unconditionally calling logging.basicConfig() on module >> import is the correct solution--this prevents modules that wi

Re: [VOTE] Beam's Mascot will be the Firefly (Lampyridae)

2019-12-13 Thread Robert Bradshaw
+1 (binding) On Fri, Dec 13, 2019 at 10:23 AM Pablo Estrada wrote: > > +1 (binding) > > On Fri, Dec 13, 2019 at 8:47 AM Maximilian Michels wrote: >> >> +1 (binding) >> >> On 13.12.19 17:10, Jeff Klukas wrote: >> > +1 (non-binding) >> > >> > On Thu, Dec 12, 2019 at 11:58 PM Kenneth Knowles > >

Re: Root logger configuration

2019-12-13 Thread Robert Bradshaw
Thanks for looking into this. I'm not sure unconditionally calling logging.basicConfig() on module import is the correct solution--this prevents modules that wish to set up handlers in place of the default handler from being able to do so. (This is why logging.basicConfig is lazily done at the

Re: Poor Python 3.x performance on Dataflow?

2019-12-06 Thread Robert Bradshaw
This is very surprising--I would expect the times to quite similar. Do you have profiles for where the (difference in) time is spent? With differences like these, I wonder if there are issues with container setup (e.g. some things not being installed or cached) for Python 3. On Fri, Dec 6, 2019

Re: [RELEASE] Tracking 2.18

2019-12-05 Thread Robert Bradshaw
Yeah, so I saw... On Thu, Dec 5, 2019 at 4:31 PM Udi Meiri wrote: > > Sorry Robert the release was already cut yesterday. > > > > On Thu, Dec 5, 2019 at 8:37 AM Ismaël Mejía wrote: >> >> Colm, I just merged your PR and cherry picked it into 2.18.0 >> https://github.com/apache/beam/pull/10296 >>

Re: Request for review of PR [Beam-8564]

2019-12-03 Thread Robert Bradshaw
Is there a way to wrap this up as an optional dependency with multiple possible providers, if there's no good library satisfying all of the conditions (in particular (1))? On Tue, Dec 3, 2019 at 9:47 AM Luke Cwik wrote: > > I was hoping that someone in the community would provide some

Re: Cleaning up Approximate Algorithms in Beam

2019-11-26 Thread Robert Bradshaw
9 at 06:01, Ahmet Altay wrote: > >> >> >> On Mon, Nov 18, 2019 at 10:57 AM Robert Bradshaw >> wrote: >> >>> On Sun, Nov 17, 2019 at 5:16 PM Reza Rokni wrote: >>> >>>> *Ahmet: FWIW, There is a python implementation only for this >>>>

Re: consurrent PRs

2019-11-26 Thread Robert Bradshaw
On Tue, Nov 26, 2019 at 6:15 AM Etienne Chauchot wrote: > > Hi guys, > > I wanted your opinion about something: > > I have 2 concurrent PRs that do the same: > > https://github.com/apache/beam/pull/10010 > > https://github.com/apache/beam/pull/10025 > > The first one is a bit better because it

Re: Beam Testing Tools FAQ

2019-11-26 Thread Robert Bradshaw
Thanks! On Tue, Nov 26, 2019 at 7:43 AM Łukasz Gajowy wrote: > > Hi all, > > our documentation (either confluence or the website docs) describes how to > create various integration and performance tests - there already are core > operations tests, nexmark and IO test documentation pages.

Re: [Portability] Turn off artifact staging?

2019-11-25 Thread Robert Bradshaw
ninterruptibles.java:196) >> at >> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2312) >> >> This happens when I use /opt/apache/beam/boot to start the worker in process >> environment, as it

Re: [VOTE] Beam Mascot animal choice: vote for as many as you want

2019-11-22 Thread Robert Bradshaw
On Thu, Nov 21, 2019 at 7:05 PM David Cavazos wrote: > > > I created this Google Form > > if everyone is okay with it to make it easier to both vote and view the > results :) >

Re: Side inputs not working in CombineGlobally (Python)

2019-11-21 Thread Robert Bradshaw
Thanks for the report. You can work around this by specifying without_defaults() on the global combine (as the default is computed at pipeline construction time). Note that even in the cases where it works, it disables combiner lifting, so side inputs in combiners is generally discouraged. And

Re: Default values not supported in Combine.globally() if not windowed by GlobalWindows

2019-11-21 Thread Robert Bradshaw
und. >> >> Kenn >> >> On Thu, Nov 21, 2019 at 10:32 AM Reuven Lax wrote: >>> >>> In particular, since windows can be data based (e.g. session windows) the >>> set of windows is not always knowable in advance. >>> >>> On Thu, No

Re: Default values not supported in Combine.globally() if not windowed by GlobalWindows

2019-11-21 Thread Robert Bradshaw
TIMESTAMP, MAX_TIMESTAMP) if start < end], though admittedly most have no data :) The statement is, however, true in general. > On Thu, Nov 21, 2019 at 10:29 AM Robert Bradshaw wrote: >> >> The semantics are a bit undefined--the sane extension of the model to >> support thi

Re: Default values not supported in Combine.globally() if not windowed by GlobalWindows

2019-11-21 Thread Robert Bradshaw
The semantics are a bit undefined--the sane extension of the model to support this is that the default value would show up in every window (otherwise which window would the default value belong to), but we don't have support for enumerating windows (or truncating infinite collections in batch

Re: [VOTE] Beam Mascot animal choice: vote for as many as you want

2019-11-20 Thread Robert Bradshaw
On Tue, Nov 19, 2019 at 6:43 PM Kenneth Knowles wrote: > > Please cast your votes of approval [1] for animals you would support as Beam > mascot. The animal with the most approval will be identified as the favorite. > > *** Vote for as many as you like, using this checklist as a template >

Re: Improve container support

2019-11-19 Thread Robert Bradshaw
Good to know. I added an icon and link. Good pointer about trying to see what we can do to make these official. On Tue, Nov 19, 2019 at 11:16 AM Kyle Weaver wrote: > > We do link from https://beam.apache.org/documentation/runtime/environments/. > > On Tue, Nov 19, 2019 at 10:

Re: Improve container support

2019-11-19 Thread Robert Bradshaw
We should probably add a link to these from our site as well, for visibility. On Tue, Nov 19, 2019 at 10:56 AM Kyle Weaver wrote: > > +1 Thanks for bringing that up Chad, I had the same problem locating the > docker images on Docker hub (searching "apachebeam" is the only way that > seems to

Re: Cleaning up Approximate Algorithms in Beam

2019-11-18 Thread Robert Bradshaw
or possibly only required during update until we figure out a good way for the runner to plug this in appropreately. Rob/Kenn: On Combiner discussion, should we tie action items from the needs > of this thread to this larger discussion? > > Cheers > Reza > > On Fri, 15 Nov 2019 at 08:32

Re: Cleaning up Approximate Algorithms in Beam

2019-11-14 Thread Robert Bradshaw
On Thu, Nov 14, 2019 at 1:06 AM Kenneth Knowles wrote: > Wow. Nice summary, yes. Major calls to action: > > 0. Never allow a combiner that does not include the format of its state > clear in its name/URN. The "update compatibility" problem makes their > internal accumulator state essentially

Re: Python Precommit duration pushing 2 hours

2019-11-14 Thread Robert Bradshaw
>>> [interactive] dependencies as extra dependencies in tests_require: >>> https://github.com/apache/beam/pull/10068 >>> >>> On Mon, Nov 11, 2019 at 2:15 PM Robert Bradshaw wrote: >>>> >>>> On Fri, Nov 8, 2019 at 5:45 PM Ahmet Altay wr

Re: [Discuss] Beam mascot

2019-11-13 Thread Robert Bradshaw
#37 from the sketches was the cuttlefish, which would put it at (with 4 votes) the most popular so far. I do like the firefly too. On Wed, Nov 13, 2019 at 12:03 PM Gris Cuevas wrote: > > Hi everyone, so exciting to see this convo taking off! > > I loved Alex's firefly! -- it can have so many

Re: Type of builtin PTransform/PCollection metrics

2019-11-13 Thread Robert Bradshaw
On Wed, Nov 13, 2019 at 10:56 AM Maximilian Michels wrote: > > > Are you referring specifically to? > > * beam:metric:element_count:v1 > > * beam:metric:pardo_execution_time:start_bundle_msecs:v1 > > * beam:metric:pardo_execution_time:process_bundle_msecs:v1 > > *

Re: [discuss] Using a logger hierarchy in Python

2019-11-13 Thread Robert Bradshaw
I would be in favor of using module-level loggers as well. I think per-class would be overkill and unlike Java not everything is in a class, as well as being more conventional in Python (where modules are generally seen as the unit of compilation, vs. Java where classes are the unit of compilation

Re: On processing event streams

2019-11-12 Thread Robert Bradshaw
One concern with (1) is that it may not be cheap to do for all runners. There also seems to be the implication that in batch elements would be 100% in order but in streaming kind-of-in-order is OK, which would lead to pipelines being developed/tested against stronger guarantees than are generally

Re: [Portability] Turn off artifact staging?

2019-11-12 Thread Robert Bradshaw
is probably shipping dependencies in another way anyways. On Tue, Nov 12, 2019 at 5:03 PM Robert Bradshaw wrote: > > Certainly there's a lot to be re-thought in terms of artifact staging, > especially when it comes to cross-langauge pipelines. I think it would > makes sense to ha

Re: [Portability] Turn off artifact staging?

2019-11-12 Thread Robert Bradshaw
Certainly there's a lot to be re-thought in terms of artifact staging, especially when it comes to cross-langauge pipelines. I think it would makes sense to have a special retrieval token for the "empty" manifest, which would mean a staging directory would never have to be set up if no artifacts

Re: Behavior of TimestampCombiner?

2019-11-12 Thread Robert Bradshaw
t; Thanks for confirming. >> >> Since it is unexpected behavior, I shall look into jira if it is already on >> radar, if not, will create one. >> >> On Mon, Nov 11, 2019 at 6:11 PM Robert Bradshaw wrote: >>> >>> The END_OF_WINDOW is indeed 9.99 (or

Re: [Discuss] Beam mascot

2019-11-12 Thread Robert Bradshaw
5) - (37), (51), (48), (53) go into the direction of cuttlefish. >> >> From the new ones I like (52) because of the eyes. (53) If we want to >> move into the direction of a water animal, the small ones are quite >> recognizable. Also, (23) and (36) are kinda cute. >> >

Re: Date/Time Ranges & Protobuf

2019-11-12 Thread Robert Bradshaw
I agree about it being a tagged union in the model (together with actual_time(...) - epsilon). It's not just a performance hack though, it's also (as discussed elsewhere) a question of being able to find an embedding into existing datetime libraries. The real question here is whether we should

Re: Behavior of TimestampCombiner?

2019-11-11 Thread Robert Bradshaw
The END_OF_WINDOW is indeed 9.99 (or, in Java, 9.999000), but the results for LATEST and EARLIEST should be 9 and 0 respectively. On Mon, Nov 11, 2019 at 5:34 PM Ruoyun Huang wrote: > > Hi, Folks, > > I am trying to understand the behavior of TimestampCombiner. I have a > test like

Re: Any chance I could help do code reviews in beam

2019-11-11 Thread Robert Bradshaw
That'd be great. go/github so people can find/identify you. On Mon, Nov 11, 2019 at 11:05 AM Luke Cwik wrote: > > What is your github user id so people could tag you as a reviewer? > > On Mon, Nov 11, 2019 at 11:02 AM Brandon Pollack wrote: >> >> I might have some off time once and an while and

Re: [Discuss] Beam mascot

2019-11-11 Thread Robert Bradshaw
Cuttlefish are cool, but I don't know how recognizable they are, and they don't scream "fast" or "stream-y" or "parallel processing" to me (not that that's a requirement...) I like that firefly, nice working the logo into the trailing beam of light. On Mon, Nov 11, 2019 at 5:03 PM Udi Meiri

Re: Python Precommit duration pushing 2 hours

2019-11-11 Thread Robert Bradshaw
rentee the tests specifically run *without* installing gcp and *without* compiling with Cython.) > On Fri, Nov 8, 2019 at 11:09 AM Robert Bradshaw wrote: >> >> Just saw another 2-hour timeout: >> https://builds.apache.org/job/beam_PreCommit_Python_Commit/9440/ , so >>

Re: Key encodings for state requests

2019-11-11 Thread Robert Bradshaw
nd KV come to mind) that all Runners/SDKs are required to understand, and (3) runners properly coerce coders they do not understand into coders that they do if they need to pull out and act on the bytes. The more coders the runner/SDK understands, the less often it needs to do this. > jincheng sun 于2019年

Re: Cython unit test suites running without Cythonized sources

2019-11-08 Thread Robert Bradshaw
On Thu, Nov 7, 2019 at 6:25 PM Chad Dombrova wrote: > > Hi, > Answers inline below, > >>> It's unclear from the nose source[1] whether it's calling build_py and >>> build_ext, or just build_ext. It's also unclear whether the result of that >>> build is actually used. When python setup.py

Re: Python Precommit duration pushing 2 hours

2019-11-08 Thread Robert Bradshaw
2019 at 1:55 PM Ahmet Altay wrote: >>>> >>>> PR for the proposed change: https://github.com/apache/beam/pull/9985 >>>> >>>> On Mon, Nov 4, 2019 at 1:35 PM Udi Meiri wrote: >>>>> >>>>> +1 >>>>&g

Re: [discuss] More dimensions for the Capability Matrix

2019-11-08 Thread Robert Bradshaw
On Fri, Nov 8, 2019 at 9:46 AM Brian Hulette wrote: > > > Does it make sense to do this? > I think this makes a lot of sense. Plus it's a good opportunity to refresh > the UX of [1]. > > > what's a good way of doing it? Should we expand the existing Capability > > Matrix to support SDKs as

Re: Detecting resources to stage

2019-11-08 Thread Robert Bradshaw
Note that resources are more properly tied to specific operations and stages, not to the entire pipeline. This is especially true in the face of libraries (which should have the ability to declare their own resources) and cross-language. On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy wrote: > > I

Re: Key encodings for state requests

2019-11-08 Thread Robert Bradshaw
uld just replicate the implicit LengthPrefixCoder behavior >> we have for general wire transfer also for state requests. Option (2) I >> suppose is the most implicit and runner-specific, should probably be >> avoided in the long run. >> >> So I'd probably opt for (1) and I w

Re: Cython unit test suites running without Cythonized sources

2019-11-07 Thread Robert Bradshaw
Does python setup.py nosetests invoke build_ext (or, more generally, build)? It's possible cython is present, but the build step is not invoked which would explain the skip for slow_coders_test. The correct test is being used in

Re: [DISCUSS] Avoid redundant encoding and decoding between runner and harness

2019-11-07 Thread Robert Bradshaw
need to > add ValueOnlyWindowedValueCoder to the StandardCoders and all the SDK harness > should be aware of this coder. There is no much changes actually. > > Please feel free to correct me if there is anyting incorrect. :) > > Besides, I'm not quite clear about the consistency

Re: Key encodings for state requests

2019-11-07 Thread Robert Bradshaw
t;coder that the runner does not understand." This knowledge is only in the runner. Also has the downside of (2). > Option (2) seems like the most practical. > > -Max > > On 06.11.19 17:26, Robert Bradshaw wrote: > > On Wed, Nov 6, 2019 at 2:55 AM Maximilian Michels wrote

Re: Deprecate some or all of TestPipelineOptions?

2019-11-06 Thread Robert Bradshaw
+1 to all of these are probably obsolete at this point and would be nice to remove. On Wed, Nov 6, 2019 at 3:00 PM Kenneth Knowles wrote: > > Good find. I think TestPipelineOptions is from very early days. It makes > sense to me that these are all obsolete. Some guesses, though I haven't dug

Re: [DISCUSS] Avoid redundant encoding and decoding between runner and harness

2019-11-06 Thread Robert Bradshaw
Yes, the portability framework is designed to support this, and possibly even more efficient transfers of data than element-by-element as per the wire coder specified in the IO port operators. I left some comments on the doc as well, and would also prefer approach 2. On Wed, Nov 6, 2019 at 11:03

Re: Key encodings for state requests

2019-11-06 Thread Robert Bradshaw
ng in runner/SDK interactions, > > requiring length-prefix only when there is an opaque or dynamic-length > > value? I assume you mean that at runtime the worker for a given engine > > does not know? > > > > Kenn > > > > On Tue, Nov 5, 2019 at 3:19 PM

Re: RFC: python static typing PR

2019-11-05 Thread Robert Bradshaw
Sounds like we have consensus. Let's move forward. I'll follow up with the discussions on the PRs themselves. On Wed, Oct 30, 2019 at 2:38 PM Robert Bradshaw wrote: > > On Wed, Oct 30, 2019 at 1:26 PM Chad Dombrova wrote: > > > >> Do you believe that a future mypy plugin c

Re: Key encodings for state requests

2019-11-05 Thread Robert Bradshaw
The Coder used for State/Timers in a StatefulDoFn is pulled out of the input PCollection. If a Runner needs to partition by this coder, it should ensure the coder of this PCollection matches with the Coder used to create the serialized bytes that are used for partitioning (whether or not this is

Re: Embedding expansion service for cross language in the runner

2019-11-05 Thread Robert Bradshaw
solution right now but it's not a very clean solution in > our case. Yeah, it's pretty ugly. > @Robert Bradshaw, yes, one cannot construct the "whole" pipeline first and > pass it to the runner, but can't we easily combine the job server and > expansion serv

Re: Embedding expansion service for cross language in the runner

2019-11-04 Thread Robert Bradshaw
available at pipeline construction time, but there is > no need to run a separate service. > > Thomas > > On Mon, Nov 4, 2019 at 12:03 PM Robert Bradshaw wrote: >> >> On Mon, Nov 4, 2019 at 11:54 AM Chamikara Jayalath >> wrote: >> > >> > O

Re: Embedding expansion service for cross language in the runner

2019-11-04 Thread Robert Bradshaw
On Mon, Nov 4, 2019 at 11:54 AM Chamikara Jayalath wrote: > > On Mon, Nov 4, 2019 at 11:01 AM Hai Lu wrote: >> >> Hi, >> >> We're looking into leveraging the cross language pipeline feature in our >> Beam pipelines on Samza runner. While the feature seems to work well, the >> PTransform

Re: aggregating over triggered results

2019-11-01 Thread Robert Bradshaw
en all the data, or late, in case the watermark was wrong (watermarks can be heuristic as perfect certainty might be to slow/expensive)). > On Wed, Oct 30, 2019 at 5:37 PM Robert Bradshaw wrote: >> >> On Tue, Oct 29, 2019 at 7:01 PM Aaron Dixon wrote: >> > >> &g

Re: Python SDK timestamp precision

2019-11-01 Thread Robert Bradshaw
hat breaking change, because > >> outputs of windows would become "3:00:00.000" instead of "2:59:59.999" > >> (but I like the first one much more! :)) > > Yes, this is the "minus epsilon" idea, but assigning this as a bit on > > the Window

Re: Rethinking the Flink Runner modes

2019-10-31 Thread Robert Bradshaw
Yes. If someone starts up the job server manually, they would have to manually specify LOOPBACK if they want it. Python's FlinkRunner does not use a pre-configured job server, it starts one up itself (making the default scenario simple). On Thu, Oct 31, 2019 at 10:19 AM Thomas Weise wrote: > >

Re: Python SDK timestamp precision

2019-10-31 Thread Robert Bradshaw
the batching DoFn that batches up elements (with their respective metadata), calls an external service on the full batch, and then emits the results (with the appropriate, cached, metadata). > On 10/30/19 10:32 PM, Robert Bradshaw wrote: > > On Wed, Oct 30, 2019 at 2:00 AM Jan Lukavský wr

Re: Rethinking the Flink Runner modes

2019-10-30 Thread Robert Bradshaw
seems we could guard using LOOPBACK it on this flag + [local] or [auto]. > Another option would > be, to only support it when the mode is set to "[local]". Well, I'd really like to support it by default... > On 30.10.19 21:05, Robert Bradshaw wrote: > > One more questi

Re: aggregating over triggered results

2019-10-30 Thread Robert Bradshaw
st 10-minute intervals that have no event then there are further optimizations one can do. > Thanks for help in understanding these details. I want to make good use of > Beam and hope to contribute back at some point (docs/writing etc), once I can > come to terms with all of these pieces. > >

Re: RFC: python static typing PR

2019-10-30 Thread Robert Bradshaw
On Wed, Oct 30, 2019 at 1:26 PM Chad Dombrova wrote: > >> Do you believe that a future mypy plugin could replace pipeline type checks >> in Beam, or are there limits to what it can do? > > mypy will get us quite far on its own once we completely annotate the beam > code. That said, my PR does

Re: Python SDK timestamp precision

2019-10-30 Thread Robert Bradshaw
for an element is needed (e.g. one can have elements in a single window, especially the global window that have different timestamps), but this could be interesting to explore. It could definitely get rid of the "minus epsilon" weirdness, though I don't think it completely solves the granu

Re: Rethinking the Flink Runner modes

2019-10-30 Thread Robert Bradshaw
One more question: https://issues.apache.org/jira/browse/BEAM-8396 still seems valuable, but with [auto] as the default, how should we detect whether LOOPBACK is safe to enable from Python? On Wed, Oct 30, 2019 at 11:53 AM Robert Bradshaw wrote: > > Sounds good to me. > > One t

Re: Rethinking the Flink Runner modes

2019-10-30 Thread Robert Bradshaw
Sounds good to me. One thing I don't understand is what it means for "CLI or REST API context [to be] present." Where does this context come from? A config file in a standard location on the user's machine? Or is this something that is only present when a user uploads a jar and then Flink runs it

Re: Python SDK timestamp precision

2019-10-29 Thread Robert Bradshaw
as nanos under the hood leaks out (and I have trouble seeing how it won't) I'd lean towards just using them directly (e.g. Java Instant) rather than wrapping it. > Having a cleverly resolution-independent system is interesting and maybe > extremely future proof but maybe preparing for a very dist

Re: aggregating over triggered results

2019-10-29 Thread Robert Bradshaw
No matter how the problem is structured, computing 30 day aggregations for every 10 minute window requires storing at least 30day/10min = ~4000 sub-aggregations. In Beam, the elements themselves are not stored in every window, only the intermediate aggregates. I second Luke's suggestion to try it

Re: Python Precommit duration pushing 2 hours

2019-10-29 Thread Robert Bradshaw
https://github.com/apache/beam/pull/9925 On Tue, Oct 29, 2019 at 10:24 AM Udi Meiri wrote: > > I don't have the bandwidth right now to tackle this. Feel free to take it. > > On Tue, Oct 29, 2019 at 10:16 AM Robert Bradshaw wrote: >> >> The Python SDK does as well. The

Re: Rethinking the Flink Runner modes

2019-10-28 Thread Robert Bradshaw
Thanks for bringing this to the list. Some comments below, though it would be good to get additional feedback beyond those that have been participating on the PR, if any. I'd like to see this issue resolved before 2.17 as changing the public API once it's released will be harder. On Mon, Oct 28,

Re: RFC: python static typing PR

2019-10-28 Thread Robert Bradshaw
Thanks, Chad, this has been a herculean task. I'm excited for the additional tooling and documentation explicit types can bring to our code, even if tooling such as mypy isn't able to do as much inference for obvious cases as I would like. This will, of course, put another burden on developers in

Re: JIRA priorities explaination

2019-10-25 Thread Robert Bradshaw
s who will rely "Fix Version" to > find which release actually fixes the issue. A pass over open bugs with a Fix > Version set to next release (as currently done by a release manager) helps to > make sure that unfixed bugs won't have Fix Version tag of the upcoming > rel

<    1   2   3   4   5   6   7   8   9   10   >