Re: [PROPOSAL] Preparing for Beam 2.15.0 release

2019-07-17 Thread Kyle Weaver
+1 Thanks for taking the lead on this! > concurrent votes on three releases Don't worry, concurrency is our specialty :) Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com | +1650203 On Wed, Jul 17, 2019 at 7:28 PM Kenneth Knowles wrote: > +1 thanks for keeping up

Re: [PROPOSAL] Preparing for Beam 2.15.0 release

2019-07-17 Thread Kenneth Knowles
+1 thanks for keeping up the cadence I hope everyone is also ready for concurrent votes on three releases... :-) On Wed, Jul 17, 2019 at 5:03 PM Ahmet Altay wrote: > +1 Thank you for keeping the cadence! > > On Wed, Jul 17, 2019 at 2:00 PM Rui Wang wrote: > >> +1. Thanks Yifan to take it

Re: [PROPOSAL] Preparing for Beam 2.15.0 release

2019-07-17 Thread Ahmet Altay
+1 Thank you for keeping the cadence! On Wed, Jul 17, 2019 at 2:00 PM Rui Wang wrote: > +1. Thanks Yifan to take it over! > > > Rui > > On Wed, Jul 17, 2019 at 1:56 PM Alan Myrvold wrote: > >> +1 Thanks for keeping the release cadence going. I like to see regular >> releases happening. >> >>

About to run a seed job on PR/9093.

2019-07-17 Thread Valentyn Tymofieiev
I will need to run a seed job to test Jenkins job config changes. This can cause some disruption in Python postcommit test suites. Feel free to reach out if you happen to be affected and it blocks your work. Thank you, Valentyn

Re: Proposal: Add permanent url to community metrics dashboard

2019-07-17 Thread Mikhail Gryzykhin
Thank you Alan, that's an interesting link. Latest Grafana version in docker is v6.2.5, so issues on that list are not applicable. We should be fine on this front. Should update container version of grafana running on service though. @Pablo I feel it's best for PMC to start conversation with

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-17 Thread Kyle Weaver
+1 to faster Go SDK iteration! Well-deserved, Rebo Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com | +1650203 On Wed, Jul 17, 2019 at 2:44 PM Robert Burke wrote: > Thanks all! Hopefully this does mean reduced latency to merge when folks > send me Go SDK reviews.

Re: Proposal: Add permanent url to community metrics dashboard

2019-07-17 Thread Alan Myrvold
Are all of the CVE issues fixed at the version in use? https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=grafana XSS isn't much of a concern until there is a hostname associated. On Wed, Jul 17, 2019 at 2:17 PM Pablo Estrada wrote: > I'd like to move this forward. Mikhail, would you be

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-17 Thread Robert Burke
Thanks all! Hopefully this does mean reduced latency to merge when folks send me Go SDK reviews. Let's get Beam GOing! On Wed, Jul 17, 2019, 11:22 AM Melissa Pashniak wrote: > > Congratulations! > > > On Wed, Jul 17, 2019 at 6:06 AM Alexey Romanenko > wrote: > >> Congratulations, Robert! >> >>

Re: Proposal: Add permanent url to community metrics dashboard

2019-07-17 Thread Pablo Estrada
I'd like to move this forward. Mikhail, would you be interested in filing an issue with Infra to see if it's possible? I can do it if you prefer. It seems that the concerns related to these dashboards showing up in search results have been addressed. Does the community have any other concern

Re: [PROPOSAL] Preparing for Beam 2.15.0 release

2019-07-17 Thread Rui Wang
+1. Thanks Yifan to take it over! Rui On Wed, Jul 17, 2019 at 1:56 PM Alan Myrvold wrote: > +1 Thanks for keeping the release cadence going. I like to see regular > releases happening. > > On Wed, Jul 17, 2019 at 11:01 AM Yifan Zou wrote: > >> Hello Beam community! >> >> Beam 2.15 release

Re: [PROPOSAL] Preparing for Beam 2.15.0 release

2019-07-17 Thread Alan Myrvold
+1 Thanks for keeping the release cadence going. I like to see regular releases happening. On Wed, Jul 17, 2019 at 11:01 AM Yifan Zou wrote: > Hello Beam community! > > Beam 2.15 release branch cut date is July 31 according to the release > calendar [1]. I would like to volunteer myself to do

Re: [DISCUSS] Reconciling SetState in Java and Python

2019-07-17 Thread Rakesh Kumar
Here is the PR (https://github.com/apache/beam/pull/9090) for the SetState. Feel free to review it. On Mon, Jul 15, 2019 at 11:01 PM Rakesh Kumar wrote: > Hi, > > I noticed that SetState is implemented in Java SDK but not implemented in > Python SDK. I have filed the jira ticket >

Re: Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-17 Thread sridhar inuog
Thanks, Pablo! Looking forward to it! Hopefully, it will also be recorded as well. On Wed, Jul 17, 2019 at 2:50 PM Pablo Estrada wrote: > Yes! So I will be working on a small feature request for Java's > BigQueryIO: https://issues.apache.org/jira/browse/BEAM-7607 > > Maybe I'll do something for

Re: Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-17 Thread Pablo Estrada
Yes! So I will be working on a small feature request for Java's BigQueryIO: https://issues.apache.org/jira/browse/BEAM-7607 Maybe I'll do something for Python next month. : ) Best -P. On Wed, Jul 17, 2019 at 12:32 PM Rakesh Kumar wrote: > +1, I really appreciate this initiative. It would be

Re: Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-17 Thread Rakesh Kumar
+1, I really appreciate this initiative. It would be really helpful newbies like me. Is it possible to list out what are the things that you are planning to cover? On Tue, Jul 16, 2019 at 11:19 AM Yichi Zhang wrote: > Thanks for organizing this Pablo, it'll be very helpful! > > On Tue, Jul

Re: Discussion/Proposal: support Sort Merge Bucket joins in Beam

2019-07-17 Thread Gleb Kanterov
> > Suppose one assigns a sharding function to a PCollection. Is it lazy, > or does it induce a reshuffle right at that point? In either case, > once the ShardingFn has been applied, how long does it remain in > effect? Does it prohibit the runner (or user) from doing subsequent > resharding

Re: pubsub -> IO

2019-07-17 Thread Eugene Kirpichov
I think full-blown SDF is not needed for this - someone just needs to implement a MongoDbIO.readAll() variant, using a composite transform. The regular pattern for this sort of thing will do (ParDo split, reshuffle, ParDo read). Whether it's worth replacing MongoDbIO.read() with a redirect to

Re: Using the BigQuery Storage API

2019-07-17 Thread sridhar inuog
Hi John, I encountered similar errors while going through the following guide https://cwiki.apache.org/confluence/display/BEAM/Using+IntelliJ+IDE and I missed this step Open *File | Settings...* In left pane, navigate to *Build, Execution, Deployment | Build Tools | Gradle | Runner*.

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-17 Thread Melissa Pashniak
Congratulations! On Wed, Jul 17, 2019 at 6:06 AM Alexey Romanenko wrote: > Congratulations, Robert! > > On 17 Jul 2019, at 14:49, Tim Robertson wrote: > > Congratulations Robert! > > On Wed, Jul 17, 2019 at 2:47 PM Gleb Kanterov wrote: > >> Congratulations, Robert! >> >> On Wed, Jul 17, 2019

Re: Discussion/Proposal: support Sort Merge Bucket joins in Beam

2019-07-17 Thread Robert Bradshaw
On Wed, Jul 17, 2019 at 4:26 PM Gleb Kanterov wrote: > > I find there is an interesting point in the comments brought by Ahmed > Eleryan. Similar to WindowFn, having a concept of ShardingFn, that enables > users to implement a class for sharding data. Each Beam node can have > ShardingFn set,

Re: Discussion/Proposal: support Sort Merge Bucket joins in Beam

2019-07-17 Thread Robert Bradshaw
It is not possible to implement SMB on top of the various top-level SomeFileIO.{write,read} PTransforms. One need the internal details. It seems we should re-use (and expose) the existing FileSinks as a parameter to SMBSink (and also port the old-style sinks to use these). We also need the

Re: Write-through-cache in State logic

2019-07-17 Thread Rakesh Kumar
I checked the python sdk[1] and it has similar implementation as Java SDK. I would agree with Thomas. In case of high volume event stream and bigger cluster size, network call can potentially cause a bottleneck. @Robert I am interested to see the proposal. Can you provide me the link of the

Re: [DISCUSS] Contributor guidelines for iterating on PRs: when to squash commits.

2019-07-17 Thread Robert Bradshaw
Sounds good. I think the high level bit is that whoever merges should *think* about what they're putting in the history, even if it's just a pausing to think "should I swash or merge this PR" rather than just clicking the button. On Wed, Jul 17, 2019 at 4:59 PM Valentyn Tymofieiev wrote: > >

Re: [DISCUSS] Contributor guidelines for iterating on PRs: when to squash commits.

2019-07-17 Thread Valentyn Tymofieiev
Thanks everyone for the discussion and your thoughts. Here's my summary: We don't have to be too prescriptive about who does what and when if we keep these goals in mind: 1. When a PR is being merged, each commit should clearly do something that it states, and a commit should do just one thing.

Re: Slowly changing lookup cache as a Table in BeamSql

2019-07-17 Thread Reza Rokni
*Can we use [slowly changing lookup cache] approach if the source is [HDFS (or) HIVE] (data is changing), where the PCollection cannot fit into Memory in BeamSQL?* Can depend on the runner, in stream mode for the Dataflow runner the sideinput needs to fit into memory. On Wed, 17 Jul 2019 at

Re: Slowly changing lookup cache as a Table in BeamSql

2019-07-17 Thread rahul patwari
Hi, Please add me as a contributor to the Beam Issue Tracker. I would like to work on this feature. My ASF Jira Username: "rahul8383" Thanks, Rahul On Wed, Jul 17, 2019 at 1:06 AM Rui Wang wrote: > Another approach is to let BeamSQL support it natively, as the title of > this thread says:

Re: Discussion/Proposal: support Sort Merge Bucket joins in Beam

2019-07-17 Thread Gleb Kanterov
I find there is an interesting point in the comments brought by Ahmed Eleryan. Similar to WindowFn, having a concept of ShardingFn, that enables users to implement a class for sharding data. Each Beam node can have ShardingFn set, similar to WindowFn (or WindowingStrategy). Sinks and sources are

Re: [Python] Read Hadoop Sequence File?

2019-07-17 Thread Ryan Skraba
Hello! I dug a bit into this (not a FileIO expert), and it looks like LocalFileSystem only matches globs in file names (not directories): https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystem.java#L251 Perhaps related:

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-17 Thread Alexey Romanenko
Congratulations, Robert! > On 17 Jul 2019, at 14:49, Tim Robertson wrote: > > Congratulations Robert! > > On Wed, Jul 17, 2019 at 2:47 PM Gleb Kanterov > wrote: > Congratulations, Robert! > > On Wed, Jul 17, 2019 at 1:50 PM Robert Bradshaw

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-17 Thread Tim Robertson
Congratulations Robert! On Wed, Jul 17, 2019 at 2:47 PM Gleb Kanterov wrote: > Congratulations, Robert! > > On Wed, Jul 17, 2019 at 1:50 PM Robert Bradshaw > wrote: > >> Congratulations! >> >> On Wed, Jul 17, 2019, 12:56 PM Katarzyna Kucharczyk < >> ka.kucharc...@gmail.com> wrote: >> >>>

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-17 Thread Gleb Kanterov
Congratulations, Robert! On Wed, Jul 17, 2019 at 1:50 PM Robert Bradshaw wrote: > Congratulations! > > On Wed, Jul 17, 2019, 12:56 PM Katarzyna Kucharczyk < > ka.kucharc...@gmail.com> wrote: > >> Congratulations! :) >> >> On Wed, Jul 17, 2019 at 12:46 PM Michał Walenia < >>

Re: Circular dependencies between DataflowRunner and google cloud IO

2019-07-17 Thread Michał Walenia
By separate package, I mean a distinct Gradle module with a separate set of dependencies. I know about the movement away from Perfkit and I plan to create Jenkins jobs utilizing only Gradle tasks for the tests. Lukasz, should I add you as a reviewer for the pull request? Thank you for the

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-17 Thread Robert Bradshaw
Congratulations! On Wed, Jul 17, 2019, 12:56 PM Katarzyna Kucharczyk wrote: > Congratulations! :) > > On Wed, Jul 17, 2019 at 12:46 PM Michał Walenia < > michal.wale...@polidea.com> wrote: > >> Congratulations, Robert! :) >> >> On Wed, Jul 17, 2019 at 12:45 PM Łukasz Gajowy >> wrote: >> >>>

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-17 Thread Katarzyna Kucharczyk
Congratulations! :) On Wed, Jul 17, 2019 at 12:46 PM Michał Walenia wrote: > Congratulations, Robert! :) > > On Wed, Jul 17, 2019 at 12:45 PM Łukasz Gajowy wrote: > >> Congratulations! :) >> >> śr., 17 lip 2019 o 04:30 Rakesh Kumar napisał(a): >> >>> Congrats Rob!!! >>> >>> On Tue, Jul 16,

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-17 Thread Łukasz Gajowy
Congratulations! :) śr., 17 lip 2019 o 04:30 Rakesh Kumar napisał(a): > Congrats Rob!!! > > On Tue, Jul 16, 2019 at 10:24 AM Ahmet Altay wrote: > >> Hi, >> >> Please join me and the rest of the Beam PMC in welcoming a new committer: >> Robert >> Burke. >> >> Robert has been contributing to

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-17 Thread Michał Walenia
Congratulations, Robert! :) On Wed, Jul 17, 2019 at 12:45 PM Łukasz Gajowy wrote: > Congratulations! :) > > śr., 17 lip 2019 o 04:30 Rakesh Kumar napisał(a): > >> Congrats Rob!!! >> >> On Tue, Jul 16, 2019 at 10:24 AM Ahmet Altay wrote: >> >>> Hi, >>> >>> Please join me and the rest of the

Re: pubsub -> IO

2019-07-17 Thread Ryan Skraba
Hello! To clarify, you want to do something like this? PubSubIO.read() -> extract mongodb collection and range -> MongoDbIO.read(collection, range) -> ... If I'm not mistaken, it isn't possible with the implementation of MongoDbIO (based on BoundedSource interface, requiring the collection to

Re: pubsub -> IO

2019-07-17 Thread Chaim Turkel
any ideas? On Mon, Jul 15, 2019 at 11:04 PM Rui Wang wrote: > > +u...@beam.apache.org > > > -Rui > > On Mon, Jul 15, 2019 at 6:55 AM Chaim Turkel wrote: >> >> Hi, >> I am looking to write a pipeline that read from a mongo collection. >> I would like to listen to a pubsub that will have a