Hi Wout,

This is great, thank you. I wrote the partial update support you reference
and I'll be happy to mentor you through your first PR - welcome aboard. Can
you please open a Jira to reference this work and we'll assign it to you?

We discussed having the "_xxx" fields in the document and triggering
actions based on that in the partial update jira but opted to avoid
it. Based on that discussion the ActionFn would likely be the preferred
approach.  Would that be possible?

It will be important to provide unit and integration tests as well.

Please be aware that there is a branch and work underway for ES6 already
which is rather different on the write() path so this may become redundant
rather quickly.

Thanks,
Tim

@timrobertson100 on the Beam slack channel



On Fri, Jul 27, 2018 at 2:53 PM, Wout Scheepers <
wout.scheep...@vente-exclusive.com> wrote:

> Hey all,
>
>
>
> A while ago, I patched ElasticsearchIO to be able to do partial updates
> and deletes.
>
> However, I did not consider my patch pull-request-worthy as the json
> parsing was done inefficient (parsed it twice per document).
>
>
>
> Since Beam 2.5.0 partial updates are supported, so the only thing I’m
> missing is the ability to send bulk *delete* requests.
>
> We’re using entity updates for event sourcing in our data lake and need to
> persist deleted entities in elastic.
>
> We’ve been using my patch in production for the last year, but I would
> like to contribute to get the functionality we need into one of the next
> releases.
>
>
>
> I’ve created a gist that works for me, but is still inefficient (parsing
> twice: once to check the ‘_action` field, once to get the metadata).
>
> Each document I want to delete needs an additional ‘_action’ field with
> the value ‘delete’. It doesn’t matter the document still contains the
> redundant field, as the delete action only requires the metadata.
>
> I’ve added the method isDelete() and made some changes to the
> processElement() method.
>
> https://gist.github.com/wscheep/26cca4bda0145ffd38faf7efaf2c21b9
>
>
>
> I would like to make my solution more generic to fit into the current
> ElasticsearchIO and create a proper pull request.
>
> As this would be my first pull request for beam, can anyone point me in
> the right direction before I spent too much time creating something that
> will be rejected?
>
>
>
> Some questions on the top of my mind are:
>
>    - Is it a good idea it to make the ‘action’ part for the bulk api
>    generic?
>    - Should it be even more generic? (e.g.: set an ‘ActionFn’ on the
>    ElasticsearchIO)
>    - If I want to avoid parsing twice, the parsing should be done outside
>    of the getDocumentMetaData() method. Would this be acceptable?
>    - Is it possible to avoid passing the action as a field in the
>    document?
>    - Is there another or better way to get the delete functionality in
>    general?
>
>
>
> All feedback is more than welcome.
>
>
> Cheers,
> Wout
>
>
>
>
>

Reply via email to