Github user nielsbasjes commented on the issue:
https://github.com/apache/flink/pull/2332
FYI:
In close relation to this issue I submitted an enhancement at the HBase
side to support these kinds of usecases much better:
https://issues.apache.org/jira/browse/HBASE-19486
http
Github user nragon commented on the issue:
https://github.com/apache/flink/pull/2332
Indeed, I'm also not aware of how users use it. I would take a similiar
approach than c*, already in made for streaming, seems to use a existing async
api from datastax. But again, I couldn't find acc
Github user ramkrish86 commented on the issue:
https://github.com/apache/flink/pull/2332
So I think at a first level let us have put/delete mutations alone for
Streaming ? Since am not aware of how flink users are currently interacting
with HBase not sure on what HBase ops should be s
Github user nragon commented on the issue:
https://github.com/apache/flink/pull/2332
No, only puts. Aggregation is coming from a reduce which by itself
aggregates, keeping recent values on hbase, more like snapshots I think. During
our analysis sending data to hbase was only reliable
Github user ramkrish86 commented on the issue:
https://github.com/apache/flink/pull/2332
@nragon
I agree. But your use case does it have increments/appends?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pro
Github user nragon commented on the issue:
https://github.com/apache/flink/pull/2332
From my point of view, my sample works fine (my use case)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user ramkrish86 commented on the issue:
https://github.com/apache/flink/pull/2332
I can take this up and come up with a design doc. Reading thro the comments
here and the final decision points I think only puts/deletes can be considered
idempotent. But increments/decrements can
Github user nragon commented on the issue:
https://github.com/apache/flink/pull/2332
I've made a custom solution which works for my use cases. Notice that the
code attached is not working because it's only a skeleton.
This prototype uses asynchbase and tries to manage throttling is
Github user fpompermaier commented on the issue:
https://github.com/apache/flink/pull/2332
anyone working on this? HBase streaming sink would be a very nice
addition...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user nragon commented on the issue:
https://github.com/apache/flink/pull/2332
Currently trying to fix some throttling issues, reported here
https://github.com/OpenTSDB/asynchbase/issues/162
---
If your project is set up for it, you can reply to this email and have your
reply a
Github user delding commented on the issue:
https://github.com/apache/flink/pull/2332
Hi @ramkrish86 , sorry I don't have bandwidth working on it..
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user ramkrish86 commented on the issue:
https://github.com/apache/flink/pull/2332
@delding
Are you still working on this?
@nragon
Let me know how can be of help here? I can work on this PR too since I have
some context on the existing PR though it is stale.
---
I
Github user nragon commented on the issue:
https://github.com/apache/flink/pull/2332
Hi, I'm testing one my self with a third party library,
http://opentsdb.github.io/asynchbase.
I'm following a simliar approach as cassandra sink. Testing it as we speak.
Thanks
---
If your pr
Github user fhueske commented on the issue:
https://github.com/apache/flink/pull/2332
Hi @nragon, the PR seems to be stale.
Are you interested in contributing a streaming HBase sink?
---
If your project is set up for it, you can reply to this email and have your
reply appear on Gi
Github user nragon commented on the issue:
https://github.com/apache/flink/pull/2332
Any updates on this sink?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user nragon commented on the issue:
https://github.com/apache/flink/pull/2332
Hi :)
I needed to use hbase as sink so i decided to take a look at this pull a
use it.
Current changes that might be interesting: Like hbase connector for
dataset, it's possible to define
Github user fhueske commented on the issue:
https://github.com/apache/flink/pull/2332
Thanks for these discussions @delding and @ramkrish86. I think this is
getting a bit hard to follow. What do you think about writing a design doc that
describes the supported HBase features and consi
Github user ramkrish86 commented on the issue:
https://github.com/apache/flink/pull/2332
Ok. Then it should be clearly documented now. that the sink supports only
Puts/Deletes. So in future can the sink be updated with new APIs? I don't know
the procedure here in case new APIs have t
Github user delding commented on the issue:
https://github.com/apache/flink/pull/2332
I agree it might be an overkill. But in case of having an sink that only
supports Put/Delete, it would be better to have ordered execution than not,
after all HBase has this API so there could be som
Github user ramkrish86 commented on the issue:
https://github.com/apache/flink/pull/2332
With in a single row you need guarantee of order of execution? I agree
Append/Increment or non-idempotent in certain failure cases but there is a
Nonce generator for that. I should say I have not
Github user delding commented on the issue:
https://github.com/apache/flink/pull/2332
Hi @ramkrish86 , I'm thinking replace batch() with mutateRow() because it
provides atomic ordered mutations for a single row, but it only supports Put
and Delete which should be fine since only Put a
Github user ramkrish86 commented on the issue:
https://github.com/apache/flink/pull/2332
@delding
Do you have uses cases with Append/Increment?
I think with the batch() API we are not sure of the order of execution of
the batch() in the hbase server but still it would help us
Github user fhueske commented on the issue:
https://github.com/apache/flink/pull/2332
@nielsbasjes I want to avoid to exclude a version that is still in use by
existing Flink users. I do not have in insight in which HBase versions
currently in use. If (basically) everybody is on 1.1.x
Github user nielsbasjes commented on the issue:
https://github.com/apache/flink/pull/2332
@fhueske: What is "older" ?
I would like a clear statement about the (minimal) supported versions of
HBase.
I would see 1.1.x as old enough, or do you see 0.98 still required?
---
If yo
Github user fhueske commented on the issue:
https://github.com/apache/flink/pull/2332
@nielsbasjes We would break compatibility with older HBase versions. Before
we do that, we should start a discussion on the user and dev mailing lists.
---
If your project is set up for it, you can
Github user nielsbasjes commented on the issue:
https://github.com/apache/flink/pull/2332
@fhueske Should the TableInputFormat be updated to use the HBase 1.1.2 api
also? It would make things a bit cleaner.
---
If your project is set up for it, you can reply to this email and have yo
Github user zentol commented on the issue:
https://github.com/apache/flink/pull/2332
@ramkrish86 that is correct.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wish
Github user ramkrish86 commented on the issue:
https://github.com/apache/flink/pull/2332
> This scheme does not provide exactly-once delivery guarantees, however at
any given point in time the table would be in a state as if the updates were
only sent once. This is the same guarantee
Github user nielsbasjes commented on the issue:
https://github.com/apache/flink/pull/2332
Yes, as long as everything is compatible with HBase 1.1.2 its fine for me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user delding commented on the issue:
https://github.com/apache/flink/pull/2332
Thanks for very detailed and helpful comments, let me work on it :-)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project do
Github user zentol commented on the issue:
https://github.com/apache/flink/pull/2332
Don't you loose any guarantees regarding order of mutations the moment you
use asynchronous updates anyway?
The WriteAheadSink should only be used if you want to deal with
non-deterministic p
Github user fhueske commented on the issue:
https://github.com/apache/flink/pull/2332
Thanks for looking into the Cassandra connector. Maybe @zentol (who
implemented the Cassandra connector) can comment how an HBase sink can be made
fault-tolerant and what code could be reused for tha
Github user delding commented on the issue:
https://github.com/apache/flink/pull/2332
Hi @ramkrish86 , CassandraTupleWriteAheadSink extends GenericWriteAheadSink
that deals with the checkpoint mechanism
---
If your project is set up for it, you can reply to this email and have your
Github user ramkrish86 commented on the issue:
https://github.com/apache/flink/pull/2332
I agree with @delding here.
>* Method that does a batch call on Deletes, Gets, Puts, Increments and
Appends.
* The ordering of execution of the actions is not defined. Meaning
Github user ramkrish86 commented on the issue:
https://github.com/apache/flink/pull/2332
` #2330 updates the version of the batch TableInputFormat to HBase 1.1.2. I
think we should use the same version here.`
Valid point. But is it possible to upgrade to even newer version like 1.1
Github user delding commented on the issue:
https://github.com/apache/flink/pull/2332
I can change back to HBase 1.1.2. The reason of using 2.0.0-SNAPSHOT is
because this bug (https://issues.apache.org/jira/browse/HBASE-14963 ). It's
only fixed for version 1.3.0+ and 2.0.0 and my exam
Github user delding commented on the issue:
https://github.com/apache/flink/pull/2332
Hi @fhueske , in HBase writes to a single row have ACID guarantee. The
exactly once semantic can be implemented the way CassandraSink did, storing
input records in a state backend and flushing to HBa
Github user fhueske commented on the issue:
https://github.com/apache/flink/pull/2332
I also noticed that the PR pulls in an HBase SNAPSHOT dependency. This is
not possible. We cannot compile against a development version. #2330 updates
the version of the batch TableInputFormat to HBa
Github user fhueske commented on the issue:
https://github.com/apache/flink/pull/2332
Hi @delding, I'm sorry that I did not mention this earlier, but I just
noticed that the `HBaseSink` does not implement any logic for checkpointing and
fault-tolerance.
The checkpointing log
Github user delding commented on the issue:
https://github.com/apache/flink/pull/2332
Hi @fhueske , I have updated this PR to address your comments. In this
change, only one MutationActions and two ArralyList are needed for entire
stream, MutationActions is now resettable. But I repla
Github user delding commented on the issue:
https://github.com/apache/flink/pull/2332
Hi @ramkrish86 , I squashed my recent commit into the one you reviewed
before, then merged the upstream and pushed --force the commits which is why
you only see two commits. The first commit actually
Github user ramkrish86 commented on the issue:
https://github.com/apache/flink/pull/2332
@delding
Am not able to see the recent changes at all. Under 'commit' tab it shows 2
commits. If I open the merge commit I am not able to see the changes that you
are saying here. Can you poi
Github user delding commented on the issue:
https://github.com/apache/flink/pull/2332
Hi @ramkrish86 , I squashed the commits before pushing.. Sorry about making
it unable to see those recent changes. If you could still review this PR when
you have time?
---
If your project is set u
Github user ramkrish86 commented on the issue:
https://github.com/apache/flink/pull/2332
How should I view your latest commit ? Am not able to see the changes that
you have mentioned above. Thanks @delding
---
If your project is set up for it, you can reply to this email and have y
Github user delding commented on the issue:
https://github.com/apache/flink/pull/2332
Hi, I have updated the PR based on @ramkrish86 's comments.
1. rename MutationActionList to MutationActions which has a public method
createMutations that
return HBase Mutations
2. ma
Github user rmetzger commented on the issue:
https://github.com/apache/flink/pull/2332
Cool. Let me know if you have any questions or if you need any help.
I'll keep an eye on this PR and the Bahir-flink repository.
---
If your project is set up for it, you can reply to this email
Github user delding commented on the issue:
https://github.com/apache/flink/pull/2332
Hi @rmetzger , Thanks for your comment. I am totally ok with contributing
the connector to Apache Bahir :-)
---
If your project is set up for it, you can reply to this email and have your
reply appe
Github user rmetzger commented on the issue:
https://github.com/apache/flink/pull/2332
@delding: Thanks a lot for opening a pull request for a streaming hbase
connector to Flink.
Would you be okay with contributing the connector to Apache Bahir, instead
of Apache Flink?
The re
Github user delding commented on the issue:
https://github.com/apache/flink/pull/2332
Hi @ramkrish86 , thanks for your reviews and detailed comments :-)
I will update this PR soon.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitH
Github user ramkrish86 commented on the issue:
https://github.com/apache/flink/pull/2332
Thanks for the update @delding . I can check this PR more closely tomorrow
my time. Had a glance at it.
---
If your project is set up for it, you can reply to this email and have your
reply appea
Github user delding commented on the issue:
https://github.com/apache/flink/pull/2332
BTW, the example runs well on my machine when I set hbase-client version to
2.0.0-SNAPSHOT, but travis-ci build failed due to "Failed to collect
dependencies at org.apache.habse:hbase-client:jar:2.0.
Github user delding commented on the issue:
https://github.com/apache/flink/pull/2332
Hi, I have updated this PR, adding javadoc and an example as @tzulitai
advised and created Enums for PUT, DELETE, INCREMENT and APPEND as @ramkrish86
advised.
---
If your project is set up for it,
Github user delding commented on the issue:
https://github.com/apache/flink/pull/2332
Hi @tzulitai Thank you for all the comments :-) I will update the PR and
add an example in next a few days
---
If your project is set up for it, you can reply to this email and have your
reply appea
Github user tzulitai commented on the issue:
https://github.com/apache/flink/pull/2332
@delding Thank you for working on this contribution. Since I'm not that
familiar with the HBase client API, I've only skimmed through the code for now.
Will try to review in detail later.
B
Github user delding commented on the issue:
https://github.com/apache/flink/pull/2332
BTW, failing tests are results of this PR
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
en
Github user delding commented on the issue:
https://github.com/apache/flink/pull/2332
Hi, this is my second PR to Flink, any comments or suggestions will be very
appreciated :-)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
56 matches
Mail list logo