Thanks Richard, Would it makes sense to provide CAS files based on real documents for the benchmarks? I mean, we could run segmenters on some OpenAccess documents, map the annotations to those used by the tests, and store them somewhere as XMI or CAS binaries. The test could then load them during the initialization phase before benchmarking.
Cheers Mario > On 11 Nov 2020, at 15.26, Richard Eckart de Castilho <[email protected]> wrote: > > External email – Do not click links or open attachments unless you recognize > the sender and know that the content is safe. > > > Hi Mario, > >> On 11. Nov 2020, at 09:11, Mario Juric <[email protected]> wrote: >> >> I ran the latest benchmarks, and they seem to confirm your initial >> conclusions, that the JCasUtil.selectCovered method perform better than the >> corresponding SelectFSs coveredBy method. I somehow picked up the idea that >> the new select API should improve performance, which it does for other >> select calls, but it is not the case for coveredBy. I was wondering whether >> you have some ideas as to why the new API performance isn't closer to the >> JCasUtil.selectCovered method? > > First, the benchmarks are not really very representative at the moment. They > create annotation structures that are highly overlapping and very dense - > something that you wouldn't see in real life normally. So some operations > show up slower in the benchmarks than they would feel in real life. The > benchmarks should probably be adjusted to test against different types of > structures. > > The select methods of uimaFIT are very lightweight and much less > flexible/configurable than SelectFS. If you look at cases where you have > small cases and a very high frequency of method calls, the uimaFIT methods > are likely to be better because they have a lower setup cost. > > For larger CASes and lower call frequencies, the SelectFS can be better. In > particular if you can iterate over the results (and maybe stop before having > iterated over all results) instead of retrieving the results as a list. > SelectFS tries hard to not calculate the full result list while uimaFIT will > usually calculate and return the full result list. > > There may be more to it, but that's my insight/intuition for the moment. > > Cheers, > > -- Richard ________________________________ Disclaimer: This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.
