Re: Annotation access speed comparison (sneak-peek, non-authoritative)

Mario Juric Wed, 11 Nov 2020 06:37:57 -0800

Thanks Richard,

Would it makes sense to provide CAS files based on real documents for the 
benchmarks? I mean, we could run segmenters on some OpenAccess documents, map 
the annotations to those used by the tests, and store them somewhere as XMI or 
CAS binaries. The test could then load them during the initialization phase 
before benchmarking.


Cheers
Mario

> On 11 Nov 2020, at 15.26, Richard Eckart de Castilho <[email protected]> wrote:
>
> External email – Do not click links or open attachments unless you recognize 
> the sender and know that the content is safe.
>
>
> Hi Mario,
>
>> On 11. Nov 2020, at 09:11, Mario Juric <[email protected]> wrote:
>>
>> I ran the latest benchmarks, and they seem to confirm your initial 
>> conclusions, that the JCasUtil.selectCovered method perform better than the 
>> corresponding SelectFSs coveredBy method. I somehow picked up the idea that 
>> the new select API should improve performance, which it does for other 
>> select calls, but it is not the case for coveredBy. I was wondering whether 
>> you have some ideas as to why the new API performance isn't closer to the 
>> JCasUtil.selectCovered method?
>
> First, the benchmarks are not really very representative at the moment. They 
> create annotation structures that are highly overlapping and very dense - 
> something that you wouldn't see in real life normally. So some operations 
> show up slower in the benchmarks than they would feel in real life. The 
> benchmarks should probably be adjusted to test against different types of 
> structures.
>
> The select methods of uimaFIT are very lightweight and much less 
> flexible/configurable than SelectFS. If you look at cases where you have 
> small cases and a very high frequency of method calls, the uimaFIT methods 
> are likely to be better because they have a lower setup cost.
>
> For larger CASes and lower call frequencies, the SelectFS can be better. In 
> particular if you can iterate over the results (and maybe stop before having 
> iterated over all results) instead of retrieving the results as a list. 
> SelectFS tries hard to not calculate the full result list while uimaFIT will 
> usually calculate and return the full result list.
>
> There may be more to it, but that's my insight/intuition for the moment.
>
> Cheers,
>
> -- Richard

________________________________
Disclaimer:
This email and any files transmitted with it are confidential and directed 
solely for the use of the intended addressee or addressees and may contain 
information that is legally privileged, confidential, and exempt from 
disclosure. If you have received this email in error, please notify the sender 
by telephone, fax, or return email and immediately delete this email and any 
files transmitted along with it. Unintended recipients are not authorized to 
disclose, disseminate, distribute, copy or take any action in reliance on 
information contained in this email and/or any files attached thereto, in any 
manner other than to notify the sender; any unauthorized use is subject to 
legal prosecution.

Re: Annotation access speed comparison (sneak-peek, non-authoritative)

Reply via email to