You may check if it makes sense to write a coprocessor doing an upsert for
you, if it does not exist already. Maybe phoenix for Hbase supports this
already.

Another alternative, if the records do not have an unique Id, is to put
them into a text index engine, such as Solr or Elasticsearch, which does in
this case a fast matching with relevancy scores.


You can use also Spark and Pig there. However, I am not sure if Spark is
suitable for these one row lookups. Same holds for Pig.


Le mer. 2 sept. 2015 à 23:53, ayan guha <guha.a...@gmail.com> a écrit :

Hello group

I am trying to use pig or spark in order to achieve following:

1. Write a batch process which will read from a file
2. Lookup hbase to see if the record exists. If so then need to compare
incoming values with hbase and update fields which do not match. Else
create a new record.

My questions:
1. Is this a good use case for pig or spark?
2. Is there any way to read hbase for each incoming record in pig without
writing map reduce code?
3. In case of spark I think we have to connect to hbase for every record.
Is thr any other way?
4. What is the best connector for hbase which gives this functionality?

Best

Ayan

Reply via email to