That is the idea I am playing with, optimizing subsequent calls to .next vs 
.reseek. 

Using the VersioningIterator as an I was able to get reseek working, the trick 
turned out to be to check getSource.hasTop a little more carefully. Thank you 
for that pointer. Disappointingly enough there does not appear to be a huge 
difference in performance from a Filter that performs the same checking without 
seeking forward.

I am attempting to use accumulo tracing to come up with an explanation but 
results are spotty. Query runtime is ~2.3s for 200k entries and I am only 
capturing tiny fraction of that through tracing. I am using AccumuloInputFormat 
to pull results into a spark job. Would I get better results if I was creating 
the BatchScanner directly? Otherwise what would be the best method to debug the 
iterator and query performance?

-- 
Eugene Cheipesh

From: Russ Weeks <[email protected]>
Reply: [email protected] <[email protected]>>
Date: January 9, 2015 at 11:32:13 PM
To: [email protected] <[email protected]>>
Subject:  Re: Seeking Iterator  

On Fri, Jan 9, 2015 at 7:56 PM, Christopher <[email protected]> wrote:
Another optimization you can try: instead of always seeking to the computed 
next, you can advance internally inside your iterator by calling its source's 
next method a few times. If you don't reach the next element that you would 
have seek'd to in some reasonable number of iterations, you can then seek. This 
also is a strategy that is hard to optimize: Do I need to advance, on average 3 
or 20 or 10000000  keys? How many before it would have been more efficient to 
just seek? There's no easy answer. Experimentation helps.

The VersioningIterator has a good example of this approach: 
https://github.com/apache/accumulo/blob/901d60ef1cf72c2d55c90746fce94e108a992d3b/core/src/main/java/org/apache/accumulo/core/iterators/user/VersioningIterator.java#L95

-Russ
 


--
Christopher L Tubbs II
http://gravatar.com/ctubbsii

On Fri, Jan 9, 2015 at 6:54 PM, Eugene Cheipesh <[email protected]> wrote:
That’s would work well enough and is my next choice.

 The thought was, rows are stored in increasing order, so as long as I know 
when I walked off the edge, and flag the iterator as empty it’d be good.  I’m 
just chasing the optimal in this case, but if it doesn’t exist, oh well.

Thank you for the reference link, it’s very helpful. 

-- 
Eugene Cheipesh

From: Russ Weeks <[email protected]>
Reply: [email protected] <[email protected]>>
Date: January 9, 2015 at 6:48:47 PM
To: [email protected] <[email protected]>>
Subject:  Re: Seeking Iterator

Hi, Eugene,

I think the conventional approach is to decompose your search area (bounding 
box?) into a set of scan ranges that introduce minimal extraneous curve 
segments, and then pass all those scan ranges into a BatchScanner. The 
excellent Accumulo Recipes site has an example[1]. Does this approach not work 
for you?

In general, your custom iterator should never try to seek to a row id different 
from the current row id, because that row could be hosted by a different tablet 
server.

-Russ

1: 
https://github.com/calrissian/accumulo-recipes/blob/master/store/geospatial-store/src/main/java/org/calrissian/accumulorecipes/geospatialstore/support/QuadTreeHelper.java#L33

On Fri, Jan 9, 2015 at 2:37 PM, Eugene Cheipesh <[email protected]> wrote:
Hello,

I am attempting to write an Iterator based on a Z-curve index to search through 
multi-dimensional data. Essentially, given a record that I have encountered 
that is in the index range not in the multi-demensional query range I have a 
way to generate the next candidate record, potentially far ahead of the current 
point.

Ideally I would be able to refine my search range with subsequent calls to 
seek(). It appears that Accumulo will create an iterator for every RFile (or 
some split other split point). The beginning of the range argument to seek will 
be the record at beginning of this split (which is good), however all instances 
of the iterator have the same, global range end (which is bad).

I need to avoid the case where I seek past the range boundary of each 
individual iterator instance and throw a NullPointerException. Is there any way 
to get enough information to achieve this?

Thank you,

-- 
Eugene Cheipesh



Reply via email to