Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-10 Thread Kenneth Knowles
It is too soon to argue whether an API is complex or not. There has been no specific API proposed. I think the problem statement is real - you need to be able to read and write bigger-than-memory state. It seems we have multiple runners that don't support it, perhaps because of our API. You might

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-10 Thread Xinyu Liu
If I understand correctly, using weak references will help clean up the Java objects once GC kicks in. In case of kv-store likes rocksDb, the Java iterator is just a JNI interface to the underlying C iterator, so we need to explicitly invoke close to release the in-memory snapshot data, which can b

Re: Documenting Github PR jenkins trigger phrases

2018-05-10 Thread Ankur Goenka
In my experience affect of white space in commit is inconsistent for certain commands they do matter while for others they don't. On Thu, May 10, 2018 at 5:43 PM Valentyn Tymofieiev wrote: > +1 to writing a Beam Jenkins spellbook. > I have observed that Jenkins commands sometimes don't work for

Re: Documenting Github PR jenkins trigger phrases

2018-05-10 Thread Valentyn Tymofieiev
+1 to writing a Beam Jenkins spellbook. I have observed that Jenkins commands sometimes don't work for the first time, why could this be? Do end of lines at the end of command matter? On Thu, May 10, 2018 at 1:24 PM, Andrew Pilloud wrote: > It would be great to have the set of "Run {Java,Python

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-10 Thread Lukasz Cwik
I don't agree. I believe you can track the iterators/iterables that are created and freed by using weak references and reference queues (or other methods). Having a few people work 10x as hard to provide a good implementation is much better then having 100s or 1000s of users suffering through a mor

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-10 Thread Xinyu Liu
Load/evict blocks will help reduce the cache memory footprint, but we still won't be able to release the underlying resources. We can add definitely heuristics to help release the resources as you mentioned, but there is no accurate way to track all the iterators/iterables created and free them up

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-10 Thread Lukasz Cwik
Users won't reliably close/release the resources and forcing them to will make the user experience worse. It will make a lot more sense to use a file format which allows random access and use a cache to load/evict blocks of the state from memory. If that is not possible, use an iterable which frees

Support close of the iterator/iterable created from MapState/SetState

2018-05-10 Thread Xinyu Liu
Hi, folks, I'm in the middle of implementing the MapState and SetState in our Samza runner. We noticed that the state returns the Java Iterable for reading entries, keys, etc. For state backed by file-based kv store like rocksDb, we need to be able to let users explicitly close iterator/iterable t

Re: Pubsub to Beam SQL

2018-05-10 Thread Anton Kedin
Shared the doc. There is already a table provider for Kafka with CSV records. The implementation at the moment doesn't touch the IO itself, just wraps it. Implementing Kafka JSON records can be as easy as wrapping KafkaIO with JsonToRow

Re: Pubsub to Beam SQL

2018-05-10 Thread Reuven Lax
I think even easier for other sources. PubSub is a tricky one (for us at least) because Dataflow overrides the Beam native PubSub source with something different. Kafka is a pure Beam source. On Thu, May 10, 2018 at 1:39 PM Ismaël Mejía wrote: > Hi, Jumping a bit late to this discussion. This so

Re: Pubsub to Beam SQL

2018-05-10 Thread Ismaël Mejía
Hi, Jumping a bit late to this discussion. This sounds super nice. But I could not access the document. How hard would it be to do this for other 'unbounded' sources, e.g. Kafka ? On Sat, May 5, 2018 at 2:56 AM Andrew Pilloud wrote: > I don't think we should jump to adding a extension, but TBLPRO

Re: Documenting Github PR jenkins trigger phrases

2018-05-10 Thread Andrew Pilloud
It would be great to have the set of "Run {Java,Python,Go} PreCommit" documented in the contributors guide as well. Those match up to the jobs auto run on every PR and are the ones I use most. There is no security, anyone can run them including 'Run Seed Job'. That one seems like a good one to docu

Re: Documenting Github PR jenkins trigger phrases

2018-05-10 Thread Jean-Baptiste Onofré
That's a good idea. It could be helpful. Regards JB On 05/10/2018 09:34 PM, Ankur Goenka wrote: > These actions are documented in groovy scripts > like  > https://github.com/apache/beam/blob/b7bfca9b196699a096786506777a49237e0b3776/.test-infra/jenkins/job_beam_PostCommit_Python_ValidatesContainer

Re: Documenting Github PR jenkins trigger phrases

2018-05-10 Thread Kenneth Knowles
It does seem useful to be able to automatically scrape the job definitions to dump out the markdown to publish on the web site. It would probably make the groovy definitions more readable to factor that out into some table of constants with longer descriptions, anyhow. Kenn On Thu, May 10, 2018 a

Re: Documenting Github PR jenkins trigger phrases

2018-05-10 Thread Ankur Goenka
These actions are documented in groovy scripts like https://github.com/apache/beam/blob/b7bfca9b196699a096786506777a49237e0b3776/.test-infra/jenkins/job_beam_PostCommit_Python_ValidatesContainer_Dataflow.groovy#L36 We can create a tool which can just go over these scripts and print the comments eli

Re: Documenting Github PR jenkins trigger phrases

2018-05-10 Thread Kenneth Knowles
It seems extremely useful to have a concise catalog of these commands in the contribution guide. I think new contributors will perhaps not be reading the testing guide but just want Jenkins to do the work for them. Kenn On Thu, May 10, 2018 at 12:27 PM Huygaa Batsaikhan wrote: > Hi devs, > > We

Re: Documenting Github PR jenkins trigger phrases

2018-05-10 Thread Jean-Baptiste Onofré
Hi Huygaa, Kenn already sent some days ago a mail about that. AFAIR, the contribution guide has been updated (or corresponding PR): https://beam.apache.org/contribute/contribution-guide/ Regards JB On 05/10/2018 09:26 PM, Huygaa Batsaikhan wrote: > Hi devs, > > We can run various jenkins com

Documenting Github PR jenkins trigger phrases

2018-05-10 Thread Huygaa Batsaikhan
Hi devs, We can run various jenkins commands (precommit, postcommit, performance tests) directly from Github Pull Request UI by commenting phrases such as "retest this please". Unfortunately, this tool is not documented. I am adding a brief documentation in https://beam.apache.org/contribute/testi

Re: triggers in direct runner

2018-05-10 Thread Kenneth Knowles
Hi Vaclav, Slightly stale but still maybe a good reference is https://s.apache.org/beam-triggers. Triggers are per key and window, and they give the runner permission to fire but do not require it. The runner can thus amortize the cost of output. The bundle is the unit of commit in Beam so firing

triggers in direct runner

2018-05-10 Thread Plajt, Vaclav
Hello Beam Devs, I'm working on DSL-Euphoria. And I found that when GroupByKey transform is executed on direct runner, window triggers are evaluated element-wise (ReduceFnRunner#processElement) but not actually fired element-wise. They are fired (pane is emitted) when whole batch of elements is

Jenkins build is back to normal : beam_Release_Gradle_NightlySnapshot #34

2018-05-10 Thread Apache Jenkins Server
See