What is the best practices here to page and slice columns from a row. So lets say I have 1,000,000 columns in a row
I read the row but want to have 1 thread read columns 0 - 9999, second thread (actor in my case) 10000 - 19999 ... and so on so i can have 100 workers processing 10,000 columns for each of my rows. If there is no API for this then is it something I should a composite key on and have to populate the rows with a counter 0000000:myoriginalcolumnnameX 0000001:myoriginalcolumnnameY 0000002:myoriginalcolumnnameZ Going the composite key route and doing a start/end predicate would work but then it kind of makes the insertion/load of this have to go through a single synchronized point to generate the columns names... I am not opposed to this but would prefer both the load of my data and processing of my data to not be bound by any 1 single lock (even if distributed). Thanks!!!! /* Joe Stein http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop */