What is the best practices here to page and slice columns from a row.

So lets say I have 1,000,000 columns in a row

I read the row but want to have 1 thread read columns 0 - 9999, second
thread (actor in my case) 10000 - 19999 ... and so on so i can have 100
workers processing 10,000 columns for each of my rows.

If there is no API for this then is it something I should a composite key on
and have to populate the rows with a counter

0000000:myoriginalcolumnnameX
0000001:myoriginalcolumnnameY
0000002:myoriginalcolumnnameZ

Going the composite key route and doing a start/end predicate would work but
then it kind of makes the insertion/load of this have to go through a
single synchronized point to generate the columns names... I am not opposed
to this but would prefer both the load of my data and processing of my data
to not be bound by any 1 single lock (even if distributed).

Thanks!!!!

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop
*/

Reply via email to