Hi all, I'm using HBase 0.94.12 with Hadoop 1.0.4.
I'm trying to load ~3GB of data into an HBase table using the csv bulk load (MR). This is very very slow, the MR takes about 5X normal bulk load. I was wondering if that is the best way ? I also wonder if it supports constant pre-splitting - meaning that before each bulk load I add new regions to the table ? Another issue I have with csv bulk load is dynamic columns - I tried with setting null (actually "") in the csv where there is no value but it contradicts the benefits of HBase not saving null values... Do you think using upsert in batches could work better ? can it handle 3GB (uncompressed) ? anyone did it from a MR context (Reducer executing the UPSERT batches) ? Thanks, Amit.
