On Fri, Jan 9, 2015 at 7:56 PM, Christopher <[email protected]> wrote:
> Another optimization you can try: instead of always seeking to the > computed next, you can advance internally inside your iterator by calling > its source's next method a few times. If you don't reach the next element > that you would have seek'd to in some reasonable number of iterations, you > can then seek. This also is a strategy that is hard to optimize: Do I need > to advance, on average 3 or 20 or 10000000 keys? How many before it would > have been more efficient to just seek? There's no easy answer. > Experimentation helps. > The VersioningIterator has a good example of this approach: https://github.com/apache/accumulo/blob/901d60ef1cf72c2d55c90746fce94e108a992d3b/core/src/main/java/org/apache/accumulo/core/iterators/user/VersioningIterator.java#L95 -Russ > > > -- > Christopher L Tubbs II > http://gravatar.com/ctubbsii > > On Fri, Jan 9, 2015 at 6:54 PM, Eugene Cheipesh <[email protected]> > wrote: > >> That’s would work well enough and is my next choice. >> >> The thought was, rows are stored in increasing order, so as long as I >> know when I walked off the edge, and flag the iterator as empty it’d be >> good. I’m just chasing the optimal in this case, but if it doesn’t exist, >> oh well. >> >> Thank you for the reference link, it’s very helpful. >> >> -- >> Eugene Cheipesh >> >> From: Russ Weeks <[email protected]> <[email protected]> >> Reply: [email protected] <[email protected]>> >> <[email protected]> >> Date: January 9, 2015 at 6:48:47 PM >> To: [email protected] <[email protected]>> >> <[email protected]> >> Subject: Re: Seeking Iterator >> >> Hi, Eugene, >> >> I think the conventional approach is to decompose your search area >> (bounding box?) into a set of scan ranges that introduce minimal extraneous >> curve segments, and then pass all those scan ranges into a BatchScanner. >> The excellent Accumulo Recipes site has an example[1]. Does this approach >> not work for you? >> >> In general, your custom iterator should never try to seek to a row id >> different from the current row id, because that row could be hosted by a >> different tablet server. >> >> -Russ >> >> 1: >> https://github.com/calrissian/accumulo-recipes/blob/master/store/geospatial-store/src/main/java/org/calrissian/accumulorecipes/geospatialstore/support/QuadTreeHelper.java#L33 >> >> On Fri, Jan 9, 2015 at 2:37 PM, Eugene Cheipesh <[email protected]> >> wrote: >> >>> Hello, >>> >>> I am attempting to write an Iterator based on a Z-curve index to search >>> through multi-dimensional data. Essentially, given a record that I have >>> encountered that is in the index range not in the multi-demensional query >>> range I have a way to generate the next candidate record, potentially far >>> ahead of the current point. >>> >>> Ideally I would be able to refine my search range with subsequent calls >>> to seek(). It appears that Accumulo will create an iterator for every RFile >>> (or some split other split point). The beginning of the range argument to >>> seek will be the record at beginning of this split (which is good), however >>> all instances of the iterator have the same, global range end (which is >>> bad). >>> >>> I need to avoid the case where I seek past the range boundary of each >>> individual iterator instance and throw a NullPointerException. Is there any >>> way to get enough information to achieve this? >>> >>> Thank you, >>> >>> -- >>> Eugene Cheipesh >>> >> >> >
