I am about to do a bunch of Puts with
int lastcolVal = //get count of columns somehow I think; (How do I get
the column count of a column family from a certain row?)
for(int j = 0; j < 10; j++) {
Put put = new Put("activities", lastcolVal, activityId[j]);
context.write(accountNo, put);
}
I am looking at the source code of Get.java and trying to read in 100
columns, then process, discard, read in next 100 records, process,
etc.(ie. Batching like in hibernate so I don't blow up the memory). I
guess I could read in one at a time...is that expensive(I would tend to
think so for very large sets)?
If I have an account which has activity_id's as columns and I could have
let's say 2 billion activities on one account, is there a way to batch
read in the columns from the column family so I don't blow up the
memory? (ie. Let's say 4 gig RAM and I think 2 billion ints would be
about 8 gig)
To be honest, that for loop is a little of a lie....as we get activites,
we actually will need to insert them so that they are in order by some
kind of date...I am not sure how I am going to do that yet(I definitely
don't want to grab 1 billion ids and sort them each time we reprocess).
Thanks,
Dean
This message and any attachments are intended only for the use of the addressee
and
may contain information that is privileged and confidential. If the reader of
the
message is not the intended recipient or an authorized representative of the
intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication in
error, please notify us immediately by e-mail and delete the message and any
attachments from your system.