Re: [EXT] Re: using avro instead of json for BigQueryIO.Write

2019-11-26 Thread Chamikara Jayalath
I don't believe so, please create one (we can dedup if we happen to find another issue). Even better if you can contribute to fix this :) Thanks, Cham On Tue, Nov 26, 2019 at 7:07 PM Chuck Yang wrote: > Has anyone looked into implementing this for the Python SDK? It would > be nice to have it

Re: cython test instability

2019-11-26 Thread Chad Dombrova
yeah, I've excised both test_requires and setup_requires in my test simplification PR: https://github.com/apache/beam/pull/10038 I'm happy to see those go sooner rather than later, as it'll reduce the scope of my PR. The rest of my PR is about ensuring that build dependencies like cython and

Re: [DISCUSS] AWS IOs V1 Deprecation Plan

2019-11-26 Thread Chamikara Jayalath
On Tue, Nov 26, 2019 at 6:17 PM Reza Rokni wrote: > Hi Alexey, > > With regards to @Experimental there are a couple of discussions around its > usage ( or rather over usage! ) on dev@. It is something that we need to > clean up ( some of those IO are now being used on production env for >

Re: cython test instability

2019-11-26 Thread Udi Meiri
I'm not sure about where the error with the simplegeneric, timeloop .eggs directories come from, but I did figure out that they don't get installed as eggs if you add them to the "test" extras in setup.py, e.g.: extras_require={ 'docs': ['Sphinx>=1.5.2,<2.0'], 'test':

Re: [DISCUSS] AWS IOs V1 Deprecation Plan

2019-11-26 Thread Reza Rokni
Hi Alexey, With regards to @Experimental there are a couple of discussions around its usage ( or rather over usage! ) on dev@. It is something that we need to clean up ( some of those IO are now being used on production env for years!). Cheers Reza On Wed, 27 Nov 2019 at 04:50, Luke Cwik

Re: [discuss] Using a logger hierarchy in Python

2019-11-26 Thread Pablo Estrada
Ah I'll try to add this tomorrow before going out for the weekend. -P. On Wed, Nov 20, 2019 at 12:15 PM Valentyn Tymofieiev wrote: > Based on my recent debugging experience for > https://issues.apache.org/jira/browse/BEAM-8651, I think it may be > helpful to include thread IDs, into the log

Re: cython test instability

2019-11-26 Thread Chad Dombrova
Sorry wrong link: https://github.com/apache/beam/pull/9915 On Tue, Nov 26, 2019 at 5:12 PM Udi Meiri wrote: > I looked at #9959 but it doesn't seem to modify setup.py? > The additional eggs for timeloop etc. are troubling though. Not sure where > those come from. > > On Tue, Nov 26, 2019 at

Re: cython test instability

2019-11-26 Thread Udi Meiri
I looked at #9959 but it doesn't seem to modify setup.py? The additional eggs for timeloop etc. are troubling though. Not sure where those come from. On Tue, Nov 26, 2019 at 4:59 PM Chad Dombrova wrote: > Is setup_requires being used somewhere else, because I'm still getting > errors after

Re: cython test instability

2019-11-26 Thread Chad Dombrova
Is setup_requires being used somewhere else, because I'm still getting errors after removing it from sdks/python/setup.py. I removed it from this PR: https://github.com/apache/beam/pull/9959 Here's the gradle scan: https://scans.gradle.com/s/oinh5xpaly3dk/failure#top=0 The error shows up

Re: cython test instability

2019-11-26 Thread Udi Meiri
Chad, I believe the answer is the "setup_requires" line is causing the sdks/python/.eggs directory to be created. This command fails with the setup_requires line (same Errno 17), but succeeds without it: $ \rm -r .eggs/; ../../gradlew installGcpTest [~8 failed tasks] $ ls .eggs

Re: real real-time beam

2019-11-26 Thread Kenneth Knowles
On Tue, Nov 26, 2019 at 1:00 AM Jan Lukavský wrote: > > I will not try to formalize this notion in this email. But I will note > that since it is universally assured, it would be zero cost and > significantly safer to formalize it and add an annotation noting it was > required. It has nothing to

Re: cython test instability

2019-11-26 Thread Chad Dombrova
It seems like the offending packages are those that only have source distributions (i.e. no wheels). But why are the eggs being installed in sdks/python/.eggs instead of into the virtualenv created by setupVirtualenv gradle task or by tox? On Tue, Nov 26, 2019 at 3:59 PM Udi Meiri wrote: >

Re: cython test instability

2019-11-26 Thread Udi Meiri
Basically, I believe what's happening is that a new Gradle task was added that uses setup.py but doesn't have the same dependency on some main setup.py task that all others depend on (list sdist). On Tue, Nov 26, 2019 at 3:49 PM Udi Meiri wrote: > Correction: the error is not gone after

Re: cython test instability

2019-11-26 Thread Udi Meiri
Correction: the error is not gone after removing the line. I get instead: error: [Errno 17] File exists: '/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/dill-0.3.1.1-py2.7.egg' On Tue, Nov 26, 2019 at 3:45 PM Udi Meiri wrote: > I managed to recreate one of the issues with this

Re: cython test instability

2019-11-26 Thread Udi Meiri
I managed to recreate one of the issues with this command: ~/src/beam/sdks/python$ \rm -r .eggs/ && for i in $(seq 2); do echo "python setup.py -q nosetests --tests apache_beam.pipeline_test:DoFnTest.test_incomparable_default &" | sh ; done This reliably gives me: OSError: [Errno 17] File exists:

Re: cython test instability

2019-11-26 Thread Chad Dombrova
Thanks for looking into this. It seems like it might be something to do with data that is cached on the Jenkins slaves between runs, which may be what prevents this from showing up locally? If your theory about setuptools is correct, and it sounds likely, we should be able to lock down the

Re: cython test instability

2019-11-26 Thread Ahmet Altay
I tried to debug but did not make much progress. I cannot reproduce locally, however all python precommits and postcommits are failing. One guess is, setuptools released a new version that does not support eggs a few days ago, that might be the cause (

Re: [DISCUSS] AWS IOs V1 Deprecation Plan

2019-11-26 Thread Luke Cwik
I suggested the wrapper because sometimes the intent of the APIs can be translated easily but this is not always the case. Good to know that it is all marked @Experimental. On Tue, Nov 26, 2019 at 12:30 PM Cam Mach wrote: > Thank you, Alex for sharing the information, and Luke for the

Re: [DISCUSS] AWS IOs V1 Deprecation Plan

2019-11-26 Thread Cam Mach
Thank you, Alex for sharing the information, and Luke for the questions. I like the idea that just depreciate the V1 IOs, and just maintain V2 IOs, so we can support whoever want continue with V1. Just as Alex said, a lot of users, including my teams :-) , use the V1 IOs in production for real

Update on push-down for SQL IOs.

2019-11-26 Thread Kirill Kozlov
Hello everyone! I have been working on a push-down feature and would like to give a brief update on what is done and is still under works. *Things that are done*: General API for SQL IOs to provide information about what filters/projects they support [1]: - *Filter* can be unsupported, supported

Re: Beam Testing Tools FAQ

2019-11-26 Thread Pablo Estrada
Very cool. Thanks Lukasz! On Tue, Nov 26, 2019 at 9:41 AM Alan Myrvold wrote: > Nice, thanks! > > On Tue, Nov 26, 2019 at 8:04 AM Robert Bradshaw > wrote: > >> Thanks! >> >> On Tue, Nov 26, 2019 at 7:43 AM Łukasz Gajowy wrote: >> > >> > Hi all, >> > >> > our documentation (either confluence

Re: Cleaning up Approximate Algorithms in Beam

2019-11-26 Thread Robert Bradshaw
I think this thread is sufficient. On Mon, Nov 25, 2019 at 5:59 PM Reza Rokni wrote: > Hi, > > So do we need a vote for the final list of actions? Or is this thread > enough to go ahead and raise the PR's? > > Cheers > > Reza > > On Tue, 26 Nov 2019 at 06:01, Ahmet Altay wrote: > >> >> >> On

Re: [UPDATE] Preparing for Beam 2.17.0 release

2019-11-26 Thread Mikhail Gryzykhin
Hello everybody, Got release branch green except gradle build that timeout and fails with go tests that look like flake. I'll go over remaining PRs and Jiras today and do final tests validation. Will start RC process afterwards. --Mikhail On Fri, Nov 22, 2019 at 9:29 PM Jan Lukavský wrote: >

Re: Beam Testing Tools FAQ

2019-11-26 Thread Alan Myrvold
Nice, thanks! On Tue, Nov 26, 2019 at 8:04 AM Robert Bradshaw wrote: > Thanks! > > On Tue, Nov 26, 2019 at 7:43 AM Łukasz Gajowy wrote: > > > > Hi all, > > > > our documentation (either confluence or the website docs) describes how > to create various integration and performance tests - there

Re: Contributor Permission for Beam Jira tickets

2019-11-26 Thread Pablo Estrada
I've added you as a contributor! Thanks! -P. On Mon, Nov 25, 2019 at 11:13 PM David Song wrote: > Hi, > > This is David from DataPLS EngProd team (wintermelons@). I am working on > integration tests with some Beam runners over Dataflow. > Can someone add me as a contributor for the Beam's Jira

Re: Failed retrieving service account

2019-11-26 Thread Pablo Estrada
Great catch. Thanks Yifan! On Tue, Nov 26, 2019 at 8:54 AM Tomo Suzuki wrote: > Thank you very much. Looking forward to the next dependency report email. > > Regards, > Tomo > > On Mon, Nov 25, 2019 at 4:17 PM Yifan Zou wrote: > >> Hi, >> >> I've looked into this issue and found that the

Re: cython test instability

2019-11-26 Thread Luke Cwik
I also started to see this on PRs that I'm reviewing. BEAM-8793, BEAM-8653, BEAM-8631, BEAM-8249 mention issues with setup.py and egg_info but this looks different then all of those so I filed BEAM-8831. On Mon, Nov 25, 2019 at 10:27 PM Chad Dombrova wrote: > Actually, it looks like I'm

Re: Failed retrieving service account

2019-11-26 Thread Tomo Suzuki
Thank you very much. Looking forward to the next dependency report email. Regards, Tomo On Mon, Nov 25, 2019 at 4:17 PM Yifan Zou wrote: > Hi, > > I've looked into this issue and found that the default service account was > removed during the weekend for some reason log viewer >

Re: [DISCUSS] AWS IOs V1 Deprecation Plan

2019-11-26 Thread Alexey Romanenko
AFAICT, all AWS SDK V1 IOs (SnsIO, SqsIO, DynamoDBIO, KinesisIO) are marked as "Experimental". So, it should not be a problem to gracefully deprecate and finally remove them. We already did the similar procedure for “HadoopInputFormatIO”, which was renamed to just “HadoopFormatIO” (since it

Re: consurrent PRs

2019-11-26 Thread Robert Bradshaw
On Tue, Nov 26, 2019 at 6:15 AM Etienne Chauchot wrote: > > Hi guys, > > I wanted your opinion about something: > > I have 2 concurrent PRs that do the same: > > https://github.com/apache/beam/pull/10010 > > https://github.com/apache/beam/pull/10025 > > The first one is a bit better because it

Re: Beam Testing Tools FAQ

2019-11-26 Thread Robert Bradshaw
Thanks! On Tue, Nov 26, 2019 at 7:43 AM Łukasz Gajowy wrote: > > Hi all, > > our documentation (either confluence or the website docs) describes how to > create various integration and performance tests - there already are core > operations tests, nexmark and IO test documentation pages.

Re: consurrent PRs

2019-11-26 Thread Maximilian Michels
Hi Etienne, That is hard to tell from the outside. Based on the activity in the PRs, it looks like you already chose the second PR (#10025). You should know best which one to merge. Make a call. Cheers, Max On 26.11.19 15:14, Etienne Chauchot wrote: Hi guys, I wanted your opinion about

Beam Testing Tools FAQ

2019-11-26 Thread Łukasz Gajowy
Hi all, our documentation (either confluence or the website docs) describes how to create various integration and performance tests - there already are core operations tests

consurrent PRs

2019-11-26 Thread Etienne Chauchot
Hi guys, I wanted your opinion about something: I have 2 concurrent PRs that do the same: https://github.com/apache/beam/pull/10010 https://github.com/apache/beam/pull/10025 The first one is a bit better because it addresses a deprecation that the other does not address. Except that they

Re: Full stream-stream join semantics

2019-11-26 Thread David Morávek
Yes, in batch case with long-term historical data, this would be O(n^2) as it basically a bubble sort. If you have large # of updates for a single key, this would be super expensive. Kenn, can this be re-implemented with your solution? On Tue, Nov 26, 2019 at 1:10 PM Jan Lukavský wrote: >

Re: Full stream-stream join semantics

2019-11-26 Thread Jan Lukavský
Functionally yes. But this straightforward solution is not working for me for two main reasons:  - it either blows state in batch case or the time complexity of the sort would be O(n^2) (and reprocessing several years of dense time-series data makes it a no go)  - it is not reusable for

Re: Full stream-stream join semantics

2019-11-26 Thread David Morávek
Hi, I think what Jan has in mind would look something like this , if implemented in user code. Am I right? D. On Tue, Nov 26, 2019 at 10:23 AM Jan Lukavský wrote: > > On 11/25/19 11:45 PM, Kenneth Knowles wrote: > > > > On Mon,

Re: Artifact staging in cross-language pipelines

2019-11-26 Thread Maximilian Michels
Hey Heejong, I don't think so. It would be great to push this forward. Thanks, Max On 26.11.19 02:49, Heejong Lee wrote: Hi, Is anyone actively working on artifact staging extension for cross-language pipelines? I'm thinking I can contribute to it in coming Dec. If anyone has any progress

Re: Full stream-stream join semantics

2019-11-26 Thread Jan Lukavský
On 11/25/19 11:45 PM, Kenneth Knowles wrote: On Mon, Nov 25, 2019 at 1:56 PM Jan Lukavský > wrote: Hi Rui, > Hi Kenn, you think stateful DoFn based join can emit joined rows that never to be retracted because in stateful DoFn case joined rows will be

Re: real real-time beam

2019-11-26 Thread Jan Lukavský
> I will not try to formalize this notion in this email. But I will note that since it is universally assured, it would be zero cost and significantly safer to formalize it and add an annotation noting it was required. It has nothing to do with event time ordering, only trigger firing