Hi Mario,
if I understand you correctly, you imagine select() to be a streaming
operation. Actually, it is not - at least not immediately.
When select() is invoked, it creates an object that is a hybrid between a
builder, an Iterable and a Stream. If at any point you invoke an Iterable or
Stream method on it, it loses the other personalities.
The methods such as the following are part of the "builder" personality:
- following(x)
- coveredBy(x)
- covering(x)
- ...
- shifted(y)
- backwards()
- noneOverlapping()
- typePriorities()
- ...
While operating on the "builder" personality, the order of methods has no
effect. E.g. the following calls are all equivalent:
cas.select(Token.class).shifted(-1).following(t3).backwards()
cas.select(Token.class).following(t3).backwards().shifted(-1)
cas.select(Token.class).backwards().shifted(-1).following(t3)
If you try to give conflicting instructions to the builder personality, the
last instruction should be used, e.g.
cas.select(Token.class).following(t3).shifted(-1).preceding(t4)
should be equivalent to
cas.select(Token.class).preceding(t4).shifted(-1)
(... or if there are bugs it might do something unexpected ...)
Methods like coveredBy(x) or covering(x) set up bounds for the iterator
internally created by SelectFS.
I think the initial idea for following(x)/preceding(x) was that they would not
define bounds - but IMHO that doesn't make too much sense. From my perspective
they also define bounds either from the beginning of the document to x
(preceding) or from x to the end of the document (following). There is also the
startAt(x) method - this does not define a boundary - it just moves the
iterator to a given start position.
So while the following operations are bounded:
cas.select(Token.class).following(x).asList()
cas.select(Token.class).preceding(x).asList()
these operations are their respective not-bounded versions
cas.select(Token.class).startAt(x).asList()
cas.select(Token.class).startAt(x).backwards().asList()
The not-bounded versions behave a bit differently from the bounded ones. E.g.
preceding(x) returns annotations in document order while startAt(x).backwards()
returns them in iteration order. Also,
following(x) and preceding(x) would never include x in their results, while
startAt(x) should return
x as the first entry in the result list. I do hope that I explained this
correctly and that it makes sense and that it mostly matches the
implementation. I am still working on setting up a tighter test suite to ensure
it does ;)
select() only really becomes a stream if you invoke stream() or a method from
the Stream interface (e.g. filter() or map()). It can also become a list, an
array, or an iterator. So the following is actually *not* possible:
select(Token.class).filter(t -> t.getCoveredText().equals("blah")).shifted(1)
because "shifted()" is a method from the builder personality of SelectFS while
"filter()" is a method of the Stream personality. However, this would work:
select(Token.class).filter(t -> t.getCoveredText().equals("blah")).skip(1)
because "skip()" is a method on Stream.
Ok, but independent of the different personalities of select(), I understand
that you'd find it not logical or intuitive that limit and shifted interact
with each other. But you do support the idea of
capping shift at 0 and simply ignoring any smaller values for bounded
selections.
Cheers,
-- Richard