Ayan, Thanks Correct I am not thinking RDBMS terms, i am wearing NoSQL glasses !
On Fri, Jan 6, 2017 at 3:23 PM, ayan guha <guha.a...@gmail.com> wrote: > IMHO you should not "think" HBase in RDMBS terms, but you can use > ColumnFilters to filter out new records > > On Fri, Jan 6, 2017 at 7:22 PM, Chetan Khatri <chetan.opensou...@gmail.com > > wrote: > >> Hi Ayan, >> >> I mean by Incremental load from HBase, weekly running batch jobs takes >> rows from HBase table and dump it out to Hive. Now when next i run Job it >> only takes newly arrived jobs. >> >> Same as if we use Sqoop for incremental load from RDBMS to Hive with >> below command, >> >> sqoop job --create myssb1 -- import --connect >> jdbc:mysql://<hostname>:<port>/sakila --username admin --password admin >> --driver=com.mysql.jdbc.Driver --query "SELECT address_id, address, >> district, city_id, postal_code, alast_update, cityid, city, country_id, >> clast_update FROM(SELECT a.address_id as address_id, a.address as address, >> a.district as district, a.city_id as city_id, a.postal_code as postal_code, >> a.last_update as alast_update, c.city_id as cityid, c.city as city, >> c.country_id as country_id, c.last_update as clast_update FROM >> sakila.address a INNER JOIN sakila.city c ON a.city_id=c.city_id) as sub >> WHERE $CONDITIONS" --incremental lastmodified --check-column alast_update >> --last-value 1900-01-01 --target-dir /user/cloudera/ssb7 --hive-import >> --hive-table test.sakila -m 1 --hive-drop-import-delims --map-column-java >> address=String >> >> Probably i am looking for any tool from HBase incubator family which does >> the job for me, or other alternative approaches can be done through reading >> Hbase tables in RDD and saving RDD to Hive. >> >> Thanks. >> >> >> On Thu, Jan 5, 2017 at 2:02 AM, ayan guha <guha.a...@gmail.com> wrote: >> >>> Hi Chetan >>> >>> What do you mean by incremental load from HBase? There is a timestamp >>> marker for each cell, but not at Row level. >>> >>> On Wed, Jan 4, 2017 at 10:37 PM, Chetan Khatri < >>> chetan.opensou...@gmail.com> wrote: >>> >>>> Ted Yu, >>>> >>>> You understood wrong, i said Incremental load from HBase to Hive, >>>> individually you can say Incremental Import from HBase. >>>> >>>> On Wed, Dec 21, 2016 at 10:04 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>>> Incremental load traditionally means generating hfiles and >>>>> using org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles to load >>>>> the data into hbase. >>>>> >>>>> For your use case, the producer needs to find rows where the flag is 0 >>>>> or 1. >>>>> After such rows are obtained, it is up to you how the result of >>>>> processing is delivered to hbase. >>>>> >>>>> Cheers >>>>> >>>>> On Wed, Dec 21, 2016 at 8:00 AM, Chetan Khatri < >>>>> chetan.opensou...@gmail.com> wrote: >>>>> >>>>>> Ok, Sure will ask. >>>>>> >>>>>> But what would be generic best practice solution for Incremental load >>>>>> from HBASE. >>>>>> >>>>>> On Wed, Dec 21, 2016 at 8:42 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>> >>>>>>> I haven't used Gobblin. >>>>>>> You can consider asking Gobblin mailing list of the first option. >>>>>>> >>>>>>> The second option would work. >>>>>>> >>>>>>> >>>>>>> On Wed, Dec 21, 2016 at 2:28 AM, Chetan Khatri < >>>>>>> chetan.opensou...@gmail.com> wrote: >>>>>>> >>>>>>>> Hello Guys, >>>>>>>> >>>>>>>> I would like to understand different approach for Distributed >>>>>>>> Incremental load from HBase, Is there any *tool / incubactor tool* >>>>>>>> which >>>>>>>> satisfy requirement ? >>>>>>>> >>>>>>>> *Approach 1:* >>>>>>>> >>>>>>>> Write Kafka Producer and maintain manually column flag for events >>>>>>>> and ingest it with Linkedin Gobblin to HDFS / S3. >>>>>>>> >>>>>>>> *Approach 2:* >>>>>>>> >>>>>>>> Run Scheduled Spark Job - Read from HBase and do transformations >>>>>>>> and maintain flag column at HBase Level. >>>>>>>> >>>>>>>> In above both approach, I need to maintain column level flags. such >>>>>>>> as 0 - by default, 1-sent,2-sent and acknowledged. So next time >>>>>>>> Producer >>>>>>>> will take another 1000 rows of batch where flag is 0 or 1. >>>>>>>> >>>>>>>> I am looking for best practice approach with any distributed tool. >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> - Chetan Khatri >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> Best Regards, >>> Ayan Guha >>> >> >> > > > -- > Best Regards, > Ayan Guha >