Hi,
I am trying design schema for some data to be moved from HDFS into HBase for
real-time access.
Questions -
1. Is the use of new API for bulk upload recommended over old API? If yes, is
the new API stable and is there sample executable code around ?
2. The data is over time. I need to be able to retrieve the latest records
before a particular date. Note that I do not know what timestamp that would be.
I could need a user's profile data from a month or year earlier. How can
this be achieved using Hbase in terms of schema?
a. If the column values are small in size, can I use versioning
for upto 100 values ?
b. Should I maintain a secondary index for each date and the
latest date/timestamp when profile data is generated/applicable to that date?
Use this information to come up with user and
timestamp key in the main table which would have user_ts as row_key and data in
the columns ?
c. for the columns, how do I decide between using multiple
columns within a column family or multiple column families?
Thanks,
Avani Sharma