Re: Callbacks/other functions run after a PDone/output transform

2017-12-01 Thread Chet Aldrich
So I agree generally with the idea that returning a PCollection makes all of this easier so that arbitrary additional functions can be added, what exactly would write functions be returning in a PCollection that would make sense? The whole idea is that we’ve written to an external source and now

Re: Using JDBC IO read transform, running out of memory on DataflowRunner.

2017-11-30 Thread Chet Aldrich
, set the JDBC fetch size parameter in your client > application. > > On Wed, Nov 29, 2017 at 1:41 PM Chet Aldrich <mailto:chet.aldr...@postmates.com>> wrote: > Hey all, > > I’m running a Dataflow job that uses the JDBC IO transform to pull in a bunch > of data (

Using JDBC IO read transform, running out of memory on DataflowRunner.

2017-11-29 Thread Chet Aldrich
Hey all, I’m running a Dataflow job that uses the JDBC IO transform to pull in a bunch of data (20mm rows, for reference) from Redshift, and I’m noticing that I’m getting an OutofMemoryError on the Dataflow workers once I reach around 4mm rows. It seems like given the code that I’m reading i

Re: Slack Channel

2017-11-16 Thread Chet Aldrich
If you wouldn’t mind I’d like an invite as well. Chet > On Nov 16, 2017, at 4:58 PM, Jacob Marble wrote: > > Me too, if you don't mind. > > Jacob > > On Thu, Nov 9, 2017 at 2:09 PM, Lukasz Cwik > wrote: > Invite sent, welcome. > > On Thu, Nov 9, 2017 at 2:08 PM, Fr

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-16 Thread Chet Aldrich
ta ES IO (es_hadoop) on that particular point. Besides, I think we >> also need to deal with the id at read time not only at write time. I'll give >> some details in the ticket. >> >> Le 15/11/2017 à 20:08, Chet Aldrich a écrit : >>> Given that this seems li

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-15 Thread Chet Aldrich
; bundle, so all the bundle will be retried leading to duplicate documents. > Thanks for raising that! I'm CCing the dev list so that someone could correct > me on the checkpointing mecanism if I'm missing something. Besides I'm > thinking about forcing the user to provide

Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-14 Thread Chet Aldrich
Hello all! So I’ve been using the ElasticSearchIO sink for a project (unfortunately it’s Elasticsearch 5.x, and so I’ve been messing around with the latest RC) and I’m finding that it doesn’t allow for changing the document ID, but only lets you pass in a record, which means that the document