It's worse than just merging the tablet sources and iterating to find the offset... because the underlying sources may contain deleted records, old versions that are filtered by an iterator, duplicates, and it is further complicated if you are using combiners in the iterator stack.
Your best bet is probably to perform this sort of indexing within an ingest framework that understands a bigger picture of how you will use the data you are ingesting. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, Dec 4, 2012 at 12:45 PM, Josh Elser <[email protected]> wrote: > I was thinking a little more on the subject, and convinced myself that I > was wrong. > > Since many files on disk correspond to a tablet, the best you can get is > the index of a key-value pair in a given file for a tablet. To get a sorted > stream of key-value pairs for this tablet (to compute index offset for a > key in a tablet), a merged read is performed over all of those files. Local > key offset for a file is meaningless as it does not imply the correct > offset for a tablet. > > > On 12/3/12 9:30 PM, Josh Elser wrote: > >> >> Accumulo doesn't expose any internal offsets of Key-Value pairs through >> the API. While it might be able to extrapolate some of this knowledge from >> the underlying structure of Accumulo, that isn't the intent of what >> Accumulo is trying to provide. >> >> >
