Hi,

I am trying design schema for some data to be moved from HDFS into HBase for 
real-time access.
Questions -

1. Is the use of new API for bulk upload recommended over old API? If yes, is 
the new API stable and is there sample executable code around ?

2. The data is over time. I need to be able to retrieve the latest records 
before a particular date. Note that I do not know what timestamp that would be.
   I could need a user's profile data from a month or year earlier. How can 
this be achieved using Hbase in terms of schema?

                a. If the column values are small in size, can I use versioning 
for upto 100 values ?

                b. Should I maintain a secondary index for each date and the 
latest date/timestamp when profile data is generated/applicable to that date?   
                                 Use this information to come up with user and 
timestamp key in the main table which would have user_ts as row_key and data in 
the columns ?

                c. for the columns, how do I decide between using multiple 
columns within a column family or multiple column families?


Thanks,
Avani Sharma


Reply via email to