Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-16 Thread Chet Aldrich
Sure, I’d be happy to take it on. My JIRA ID is chetaldrich. We can continue discussion on that ticket. Chet > On Nov 16, 2017, at 7:57 AM, Etienne Chauchot wrote: > > Chet, > FYI, here is the ticket and the design proposal: > https://issues.apache.org/jira/browse/BEAM-3201 >

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-16 Thread Etienne Chauchot
Chet, FYI, here is the ticket and the design proposal: https://issues.apache.org/jira/browse/BEAM-3201. If you still want to code that improvement, give me your jira id and I will assign the ticket to you. Otherwise I can code it as well. Best Etienne Le 16/11/2017 à 09:19, Etienne Chauch

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-16 Thread NerdyNick
I'd add to the idea here with the A solution. What about also supporting a user function to provide the ID given the record. I say this because I'm starting to also look into how to get the ESIO writer to support dynamic index based on information contained within the event. For which just looking

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-16 Thread Jean-Baptiste Onofré
I think it's the most elegant approach: the user should be able to decide the id field he wants to use. Regards JB On 11/16/2017 09:24 AM, Etienne Chauchot wrote: +1, that is what I had in mind, if I recall correctly this is what es_hadoop connector does. Le 15/11/2017 à 20:22, Tim Robertso

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-16 Thread Etienne Chauchot
+1, that is what I had in mind, if I recall correctly this is what es_hadoop connector does. Le 15/11/2017 à 20:22, Tim Robertson a écrit : Hi Chet, I'll be a user of this, so thank you. It seems reasonable although - did you consider letting folk name the document ID field explicitly?  It

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-16 Thread Etienne Chauchot
Hi, Thanks for the offer, I'd be happy to review your PR. Just wait a bit until I have opened a proper ticket for that. I still need to think more about the design. Among other things, I have to check what ES dev team did for other big data ES IO (es_hadoop) on that particular point. Besides,

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-15 Thread Tim Robertson
Hi Chet, I'll be a user of this, so thank you. It seems reasonable although - did you consider letting folk name the document ID field explicitly? It would avoid an unnecessary transformation and might be simpler: // instruct the writer to use a provided document ID ElasticsearchIO.write

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-15 Thread Chet Aldrich
Given that this seems like a change that should probably happen, and I’d like to help contribute if possible, a few questions and my current opinion: So I’m leaning towards approach B here, which is: > b. (a bit less user friendly) PCollection with K as an id. But forces the > user to do a Pard

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-15 Thread Etienne Chauchot
Yes, exactly. Actually, it raised from a discussion we had with Romain about ESIO. Le 15/11/2017 à 10:08, Jean-Baptiste Onofré a écrit : I think it's also related to the discussion Romain raised on the dev mailing list (gap between batch size, checkpointing & bundles). Regards JB On 11/15/2

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-15 Thread Tim Robertson
Hi Chet, +1 for interest in this from me too. If it helps, I'd have expected a) to be the implementation (e.g. something like "_id" being used if present) and handing multiple delivery being a responsibility of the developer. Thanks, Tim On Wed, Nov 15, 2017 at 10:08 AM, Jean-Baptiste Onofré

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-15 Thread Jean-Baptiste Onofré
I think it's also related to the discussion Romain raised on the dev mailing list (gap between batch size, checkpointing & bundles). Regards JB On 11/15/2017 09:53 AM, Etienne Chauchot wrote: Hi Chet, What you say is totally true, docs written using ElasticSearchIO will always have an ES gen

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-15 Thread Etienne Chauchot
Hi Chet, What you say is totally true, docs written using ElasticSearchIO will always have an ES generated id. But it might change in the future, indeed it might be a good thing to allow the user to pass an id. Just in 5 seconds thinking, I see 3 possible designs for that. a.(simplest) use a

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-15 Thread Vilhelm von Ehrenheim
I second this. In my case i have a lot of stateful updates in my stream so it doesnt work at all. I ended up using Kafka between to mitigate the issue but it would be great with an IO connector that writes on keys directly. I think a PR is always welcome. :) / Vilhelm On 15 Nov 2017 05:16, "Chet

Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-14 Thread Chet Aldrich
Hello all! So I’ve been using the ElasticSearchIO sink for a project (unfortunately it’s Elasticsearch 5.x, and so I’ve been messing around with the latest RC) and I’m finding that it doesn’t allow for changing the document ID, but only lets you pass in a record, which means that the document