ok. Re: getting the top 5 or 10 items: Here's a technique you may find of use:
Put the items into a Java PriorityQueue. Keep a piece of data which is the bottom item, and in your insert-into-the-queue code, check if the item to-be-inserted is below that, and if so, skip it. This gives a very efficient way to get the top 5 or 10 items. HTH. -Marshall On 9/9/2019 4:09 AM, Mario Juric wrote: > Hi, > > Once again thanks for the response. It is really appreciated :) > > I tried the moveTo(fs) instead of just using an iterator constructed from the > FS, and this appeared to give me all items of the specified type when I > didn’t set any values on it, which was an accidental experiment, but when I > set the key property to what I was searching for then I got zero items back. > Not sure what I might be doing wrong here, but I have learned something maybe > more importantly to our use case in the mean time: The cost of indexing > exceeds by far the benefits of any expected lookup speed in our case. > > We are annotating a number of items with a lot of extracted feature > information, and the hope was to be able to quickly get top 5 or 10 or > whatever of the items with this or that key, which is why it was sorted by > key first in natural sort order and then by the value in reverse order, > meaning higher value is better, so that we could quickly get to the first > item with the right key and then start pulling the top most items until we > have those that we need. > > So even if I could get this to work optimally it would in our case not be > beneficial given the cost of indexing. It seems we really need many of those > queries before it pays of, since the amount of feature information is much > larger than the items they are associated with, so I reached to the > preliminary conclusion to not have features in any index at all and just > using plain FS record structures instead. It appears in our case much cheaper > to run through all target items, which there are comparatively less of, to > find what we need than to index all associated features and find the relevant > target items through feature look up. > > Cheers, > Mario > > > > > > > > > > > > >> On 6 Sep 2019, at 16:50 , Marshall Schor <[email protected]> wrote: >> >> Please don't add to the indexes, the FS you're temporarily using as the >> argument >> for the moveTo operation. (and of course, if you don't add it, you won't >> need >> to remove it...) >> >> If you describe your use case in a bit more detail, I can perhaps comment on >> this more. >> >> -Marshall >> >> On 9/6/2019 2:50 AM, Mario Juric wrote: >>> Hi, >>> >>> Thanks for responding. >>> >>> I tried with a temporary FS where the key value was set, but I got every >>> annotation from the index, so that didn’t appear to change anything, and it >>> also broke my unit tests immediately. I also stepped through the iterator >>> implementation and found construction of the iterator quite a bit complex >>> with an FS, so that went over my head without spending time to get a deeper >>> understanding of the underlying index implementation. Therefore I tried >>> with an indexed FS and this seemed to return the correct items, but it >>> would be awkward having to add some FS to the index in order to retrieve >>> something else and then having to remove the FS from the index again. I am >>> now also in doubt about the insertion costs, but I haven’t measured that >>> yet. >>> >>> I am not sure how many use custom FSIndex, but currently the API doesn’t >>> really support very well the type of use cases that we are working with, so >>> this is a disappointment for us. Does UIMA 3 improve on this? We are still >>> on 2.x since we are awaiting the next major DKPro release with UIMA 3 >>> because of dependencies. >>> >>> Thanks a lot and cheers, >>> Mario >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>> On 5 Sep 2019, at 23:42 , Richard Eckart de Castilho <[email protected]> >>>> wrote: >>>> >>>> On 5. Sep 2019, at 23:40, Marshall Schor <[email protected]> wrote: >>>>> The normal way to get the "binary search" kind of behavior is to get a >>>>> plain >>>>> iterator over the sorted index, and then use the moveTo method, >>>>> specifying a >>>>> target FS as the one to move to. The target FS can be a "temporary" FS, >>>>> one >>>>> that is never added to the indexes, itself; it is just used to supply >>>>> values >>>>> used in the comparison. >>>> Is there a way to do this using a "temporary" FS which does not take up >>>> CAS heap >>>> space in UIMAv2? >>>> >>>> -- Richard >
