ok.

Re: getting the top 5 or 10 items: Here's a technique you may find of use:

Put the items into a Java PriorityQueue.  Keep a piece of data which is the
bottom item, and in your insert-into-the-queue code, check if the item
to-be-inserted is below that, and if so, skip it.

This gives a very efficient way to get the top 5 or 10 items.

HTH. -Marshall

On 9/9/2019 4:09 AM, Mario Juric wrote:
> Hi,
>
> Once again thanks for the response. It is really appreciated :)
>
> I tried the moveTo(fs) instead of just using an iterator constructed from the 
> FS, and this appeared to give me all items of the specified type when I 
> didn’t set any values on it, which was an accidental experiment, but when I 
> set the key property to what I was searching for then I got zero items back. 
> Not sure what I might be doing wrong here, but I have learned something maybe 
> more importantly to our use case in the mean time: The cost of indexing 
> exceeds by far the benefits of any expected lookup speed in our case.
>
> We are annotating a number of items with a lot of extracted feature 
> information, and the hope was to be able to quickly get top 5 or 10 or 
> whatever of the items with this or that key, which is why it was sorted by 
> key first in natural sort order and then by the value in reverse order, 
> meaning higher value is better, so that we could quickly get to the first 
> item with the right key and then start pulling the top most items until we 
> have those that we need.
>
> So even if I could get this to work optimally it would in our case not be 
> beneficial given the cost of indexing. It seems we really need many of those 
> queries before it pays of, since the amount of feature information is much 
> larger than the items they are associated with, so I reached to the 
> preliminary conclusion to not have features in any index at all and just 
> using plain FS record structures instead. It appears in our case much cheaper 
> to run through all target items, which there are comparatively less of, to 
> find what we need than to index all associated features and find the relevant 
> target items through feature look up.
>
> Cheers,
> Mario
>
>
>
>
>
>
>
>
>
>
>
>
>> On 6 Sep 2019, at 16:50 , Marshall Schor <[email protected]> wrote:
>>
>> Please don't add to the indexes, the FS you're temporarily using as the 
>> argument
>> for the moveTo operation.  (and of course, if you don't add it, you won't 
>> need
>> to remove it...)
>>
>> If you describe your use case in a bit more detail, I can perhaps comment on
>> this more.
>>
>> -Marshall
>>
>> On 9/6/2019 2:50 AM, Mario Juric wrote:
>>> Hi,
>>>
>>> Thanks for responding.
>>>
>>> I tried with a temporary FS where the key value was set, but I got every 
>>> annotation from the index, so that didn’t appear to change anything, and it 
>>> also broke my unit tests immediately. I also  stepped through the iterator 
>>> implementation and found construction of the iterator quite a bit complex 
>>> with an FS, so that went over my head without spending time to get a deeper 
>>> understanding of the underlying index implementation. Therefore I tried 
>>> with an indexed FS and this seemed to return the correct items, but it 
>>> would be awkward having to add some FS to the index in order to retrieve 
>>> something else and then having to remove the FS from the index again. I am 
>>> now also in doubt about the insertion costs, but I haven’t measured that 
>>> yet.
>>>
>>> I am not sure how many use custom FSIndex, but currently the API doesn’t 
>>> really support very well the type of use cases that we are working with, so 
>>> this is a disappointment for us. Does UIMA 3 improve on this? We are still 
>>> on 2.x since we are awaiting the next major DKPro release with UIMA 3 
>>> because of dependencies.
>>>
>>> Thanks a lot and cheers,
>>> Mario
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> On 5 Sep 2019, at 23:42 , Richard Eckart de Castilho <[email protected]> 
>>>> wrote:
>>>>
>>>> On 5. Sep 2019, at 23:40, Marshall Schor <[email protected]> wrote:
>>>>> The normal way to get the "binary search" kind of behavior is to get a 
>>>>> plain
>>>>> iterator over the sorted index, and then use the moveTo method, 
>>>>> specifying a
>>>>> target FS as the one to move to.  The target FS can be a "temporary" FS, 
>>>>> one
>>>>> that is never added to the indexes, itself; it is just used to supply 
>>>>> values
>>>>> used in the comparison.
>>>> Is there a way to do this using a "temporary" FS which does not take up 
>>>> CAS heap
>>>> space in UIMAv2?
>>>>
>>>> -- Richard
>

Reply via email to