Hi Richard, Sorry for the late reply, I’ve been busy with other things lately, but thanks for the extensive explanation, and it gives me a much better understanding of the select API. Generally I don’t oppose to any of the thoughts you have, but I don’t necessarily always find it intuitive (yet), and there hasn’t been any issues for our limited use cases, except for the bug you fixed very quickly, thanks for that :)
Cheers Mario > On 20 Nov 2020, at 22.04, Richard Eckart de Castilho <[email protected]> wrote: > > External email – Do not click links or open attachments unless you recognize > the sender and know that the content is safe. > > > Hi Mario, > > if I understand you correctly, you imagine select() to be a streaming > operation. Actually, it is not - at least not immediately. > > When select() is invoked, it creates an object that is a hybrid between a > builder, an Iterable and a Stream. If at any point you invoke an Iterable or > Stream method on it, it loses the other personalities. > > The methods such as the following are part of the "builder" personality: > > - following(x) > - coveredBy(x) > - covering(x) > - ... > > - shifted(y) > - backwards() > - noneOverlapping() > - typePriorities() > - ... > > While operating on the "builder" personality, the order of methods has no > effect. E.g. the following calls are all equivalent: > > cas.select(Token.class).shifted(-1).following(t3).backwards() > cas.select(Token.class).following(t3).backwards().shifted(-1) > cas.select(Token.class).backwards().shifted(-1).following(t3) > > If you try to give conflicting instructions to the builder personality, the > last instruction should be used, e.g. > > cas.select(Token.class).following(t3).shifted(-1).preceding(t4) > > should be equivalent to > > cas.select(Token.class).preceding(t4).shifted(-1) > > (... or if there are bugs it might do something unexpected ...) > > Methods like coveredBy(x) or covering(x) set up bounds for the iterator > internally created by SelectFS. > I think the initial idea for following(x)/preceding(x) was that they would > not define bounds - but IMHO that doesn't make too much sense. From my > perspective they also define bounds either from the beginning of the document > to x (preceding) or from x to the end of the document (following). There is > also the startAt(x) method - this does not define a boundary - it just moves > the iterator to a given start position. > > So while the following operations are bounded: > > cas.select(Token.class).following(x).asList() > cas.select(Token.class).preceding(x).asList() > > these operations are their respective not-bounded versions > > cas.select(Token.class).startAt(x).asList() > cas.select(Token.class).startAt(x).backwards().asList() > > The not-bounded versions behave a bit differently from the bounded ones. E.g. > preceding(x) returns annotations in document order while > startAt(x).backwards() returns them in iteration order. Also, > following(x) and preceding(x) would never include x in their results, while > startAt(x) should return > x as the first entry in the result list. I do hope that I explained this > correctly and that it makes sense and that it mostly matches the > implementation. I am still working on setting up a tighter test suite to > ensure it does ;) > > select() only really becomes a stream if you invoke stream() or a method from > the Stream interface (e.g. filter() or map()). It can also become a list, an > array, or an iterator. So the following is actually *not* possible: > > select(Token.class).filter(t -> t.getCoveredText().equals("blah")).shifted(1) > > because "shifted()" is a method from the builder personality of SelectFS > while "filter()" is a method of the Stream personality. However, this would > work: > > select(Token.class).filter(t -> t.getCoveredText().equals("blah")).skip(1) > > because "skip()" is a method on Stream. > > Ok, but independent of the different personalities of select(), I understand > that you'd find it not logical or intuitive that limit and shifted interact > with each other. But you do support the idea of > capping shift at 0 and simply ignoring any smaller values for bounded > selections. > > Cheers, > > -- Richard ________________________________ Disclaimer: This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.
