Author: schor Date: Fri Nov 22 15:29:22 2019 New Revision: 1870165 URL: http://svn.apache.org/viewvc?rev=1870165&view=rev Log: no Jira, add a
Added: uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml Added: uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html URL: http://svn.apache.org/viewvc/uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html?rev=1870165&view=auto ============================================================================== --- uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html (added) +++ uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html Fri Nov 22 15:29:22 2019 @@ -0,0 +1,536 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "https://www.w3.org/TR/html4/loose.dtd"> + + + <!-- ====================================================================== --> + <!-- GENERATED FILE, DO NOT EDIT, EDIT THE XML FILE IN xdocs INSTEAD! --> + <!-- ====================================================================== --> + <html> + <head> + <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/> + <style type="text/css">@import "stylesheets/base.css";</style> + <meta name="author" value=" + Apache UIMA Team + "> + <meta name="email" value="d...@uima.apache.org"> + + + + <title>Apache UIMA - Cookbook: addressing some typical use-cases</title> + + <!-- Begin Cookie Consent plugin by Silktide - https://silktide.com/cookieconsent --> + <!-- Commented out because implied consent is not compatible with GDPR --> + <!-- + <script type="text/javascript"> + window.cookieconsent_options = {"message":"This website uses cookies to ensure you get the best experience on our website","dismiss":"Got it!","learnMore":"More info","link":"https://uima.apache.org/privacy-policy.html","theme":"dark-bottom"}; + </script> + + <script type="text/javascript" src="/cookieconsent2/cookieconsent.min.js"></script> + --> + <!-- End Cookie Consent plugin --> + + <!-- Begin Google Analytics --> + <!-- Commented out because GA requires consent according to GDPR --> + <!-- + <script> + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + + ga('create', 'UA-70846351-1', 'auto'); + ga('set', 'anonymizeIp', true); + ga('send', 'pageview'); + + </script> + --> + <!-- End Google Analytics --> + </head> + + <body> + <div class="topLogos"> + <table border="0" width="100%" cellspacing="0"> + <!-- TOP IMAGE --> + <tr> + <td align='LEFT'> + <a href="index.html"> + <img style="border: 1px solid black;" src="./images/UIMA_banner2tlpTm.png" alt="UIMA project logo" border="0"/> + </a> + </td> + <td align='CENTER'> + <div class="pageBanner">Cookbook: addressing some typical use-cases</div> + </td> + <td align='RIGHT'> + <a href="https://www.apache.org"> + <img src="./images/asf-logo-on-white-smallTm.png" alt="Apache UIMA" border="0"/> + </a> + </td> + </tr> + </table> + <hr noshade="" size="1"/> + </div> + <table border="0" width="100%" cellspacing="4"> + <tr> + <td align='RIGHT' colspan="2"> + <form method="get" action="https://www.google.com/search"> + Search the site + <input type="text" name="q" size="25" maxlength="255" value="" /> + <input type="hidden" name="sitesearch" value="https://uima.apache.org/" /> + <input name="Search" value="Search Site" type="submit"/> + </form> + </td> + </tr> + <tr> <!-- LEFT SIDE NAVIGATION --> + <td width="20%" valign="top"> + + + + + + + <!-- regular menu --> + <div class="navBar"> + <br/> + <div class="navBarItem"> <div class="navPartHeading">General</div> + </div> + <div class="navBar"> + <div class="navBarItem"> <a href="./index.html">Home</a> + </div> + <div class="navBarItem"> <a href="./downloads.cgi">Downloads</a> + </div> + <div class="navBarItem"> <a href="./documentation.html">Documentation</a> + </div> + <div class="navBarItem"> <a href="./news.html">News</a> + </div> + <div class="navBarItem"> <a href="./publications.html">Publications</a> + </div> + <br style="line-height: .5em"/> + <div class="navBarItem"> <a href="https://issues.apache.org/jira/browse/uima" target="_blank" rel="noopener">Issue tracker <img src="images/offsitelink.png"/></a> + </div> + <div class="navBarItem"> <a href="https://cwiki.apache.org/confluence/display/UIMA/" target="_blank" rel="noopener">Wiki <img src="images/offsitelink.png"/></a> + </div> + <br style="line-height: .5em"/> + <div class="navBarItem"> <a href="https://cwiki.apache.org/confluence/display/UIMA/Powered+by+Apache+UIMA" target="_blank" rel="noopener">Powered By UIMA <img src="images/offsitelink.png"/></a> + </div> + </div> + <br/> + <div class="navBarItem"> <div class="navPartHeading">Community</div> + </div> + <div class="navBar"> + <div class="navBarItem"> <a href="./get-involved.html">Get Involved</a> + </div> + <div class="navBarItem"> <a href="./mail-lists.html">Mailing Lists</a> + </div> + <div class="navBarItem"> <a href="./contribution-policy.html">Contribution Policies</a> + </div> + <div class="navBarItem"> <a href="./faq.html">FAQ</a> + </div> + <div class="navBarItem"> <a href="./project-guidelines.html">Project Guidelines</a> + </div> + </div> + <br/> + <div class="navBarItem"> <div class="navPartHeading">Scaleout Frameworks</div> + </div> + <div class="navBar"> + <div class="navBarItem"> <a href="./doc-uimaas-what.html">UIMA-AS</a> + </div> + <div class="navBarItem"> <a href="./doc-uimaducc-whatitam.html">UIMA-DUCC</a> + </div> + <div class="navBarItem"> <a href="./doc-uimaducc-demo.html">..Demo Page</a> + </div> + <div class="navBarItem"> <a href="http://uima-ducc-demo.apache.org:42133" target="_blank" rel="noopener">..Demo Live <img src="images/offsitelink.png"/></a> + </div> + </div> + <br/> + <div class="navBarItem"> <div class="navPartHeading">Components & Tools</div> + </div> + <div class="navBar"> + <div class="navBarItem"> <a href="./sandbox.html#uima-addons-annotators">Annotators</a> + </div> + <div class="navBarItem"> <a href="./toolsServers.html">Tools & Servers</a> + </div> + <div class="navBarItem"> <a href="./sandbox.html">Addons and Sandbox</a> + </div> + <div class="navBarItem"> <a href="./ruta.html">UIMA Ruta</a> + </div> + <div class="navBarItem"> <a href="./uimafit.html">uimaFIT</a> + </div> + <div class="navBarItem"> <a href="./external-resources.html">External Resources</a> + </div> + </div> + <br/> + <div class="navBarItem"> <div class="navPartHeading">Development</div> + </div> + <div class="navBar"> + <div class="navBarItem"> <a href="./dev-quick.html">Quick Start: building</a> + </div> + <div class="navBarItem"> <a href="./building-uima.html">Building from Source</a> + </div> + <div class="navBarItem"> <a href="./one-time-setup.html">One-time setups</a> + </div> + <div class="navBarItem"> <a href="./svn.html">Source Code</a> + </div> + <div class="navBarItem"> <a href="./distribution.html">Creating a Distribution</a> + </div> + <div class="navBarItem"> <a href="./release.html">Doing a UIMA release</a> + </div> + <div class="navBarItem"> <a href="https://www.apache.org/security/committers.html" target="_blank" rel="noopener">Doing a CVE (Apache) <img src="images/offsitelink.png"/></a> + </div> + <div class="navBarItem"> <a href="./eclipse-update-site.html">Eclipse Update Sites</a> + </div> + <div class="navBarItem"> <a href="./git.html">GIT</a> + </div> + <div class="navBarItem"> <a href="./codeConventions.html">Code Conventions</a> + </div> + <div class="navBarItem"> <a href="./uima-specification.html">UIMA Specification (OASIS)</a> + </div> + <div class="navBarItem"> <a href="./team-list.html">Project Team</a> + </div> + <div class="navBarItem"> <a href="./maven-design.html">Maven Use</a> + </div> + <div class="navBarItem"> <a href="./updating-website.html">Updating this Website</a> + </div> + </div> + <br/> + <div class="navBarItem"> <div class="navPartHeading">Events and Conferences</div> + </div> + <div class="navBar"> + <div class="navBarItem"> <a href="./coling14.html">COLING 2014</a> + </div> + <div class="navBarItem"> <a href="./gscl13.html">GSCL 2013</a> + </div> + <div class="navBarItem"> <a href="./iks09.html">IKS 2009</a> + </div> + <div class="navBarItem"> <a href="./gscl09.html">GSCL 2009</a> + </div> + <div class="navBarItem"> <a href="./lsm09.html">LSM 2009</a> + </div> + <div class="navBarItem"> <a href="./lrec08.html">LREC 2008</a> + </div> + <div class="navBarItem"> <a href="./gldv07.html">GLDV 2007</a> + </div> + </div> + <br/> + <div class="navBarItem"> <div class="navPartHeading">ASF</div> + </div> + <div class="navBar"> + <div class="navBarItem"> <a href="https://www.apache.org/licenses/" target="_blank" rel="noopener">License <img src="images/offsitelink.png"/></a> + </div> + <div class="navBarItem"> <a href="https://www.apache.org/foundation/thanks.html" target="_blank" rel="noopener">ASF Sponsors <img src="images/offsitelink.png"/></a> + </div> + <div class="navBarItem"> <a href="https://www.apache.org/foundation/sponsorship.html" target="_blank" rel="noopener">ASF Sponsorship <img src="images/offsitelink.png"/></a> + </div> + <div class="navBarItem"> <a href="./security_report">Security</a> + </div> + </div> + </div> + </td> + <td width="80%" align="left" valign="top"> + <div class="sectionTable"> + <table class="sectionTable"> + <tr><td> + <a name="Working with Feature Structures"><h1><img src="images/UIMA_4sq50tightCropSolid.png"/> Working with Feature Structures</h1></a> + </td></tr> + <tr><td> + <blockquote class="sectionBody"> + <p>These work with all kinds of Feature Structures, Annotations and non-Annotations, both.</p> + <table class="subsectionTable"> + <tr><td> + + + + <a name="Remove all Feature Structures of a particular type"> + <h2>Remove all Feature Structures of a particular type + </h2> + </a> + </td></tr> + <tr><td> + <blockquote class="subsectionBody"> + <p>There are built-in methods to do this, over all indexes in a particular view. There are 2 variations: + <ul><li>remove all including the subtypes of the type + <pre>myJCasView.removeAllIncludingSubtypes(Foo.type)</pre> + </li> + <li>remove all excluding the subtypes of the type + <pre>myJCasView.removeAllExcludingSubtypes(Foo.type)</pre></li></ul> + </p> + <p>Both of these are much faster than iterating over the Feature Structures; they directly clear the associated indexes.</p> + </blockquote> + </td></tr> + </table> + <table class="subsectionTable"> + <tr><td> + + + + <a name="General suggestions: working with iterators"> + <h2>General suggestions: working with iterators + </h2> + </a> + </td></tr> + <tr><td> + <blockquote class="subsectionBody"> + <p>Many times code will iterate over all instances of a type, and only do something with a subset. + Frequently, the iteration can be cut short, by starting near the spot of interest and stopping as soon + as it can be determined that no further iteration will find interesting Annotations.</p> + <p>Example: Let's say you have a "token" annotation, and want to find the "sentence" that contains it. + You could write an iterator over all sentences. + </p> + <h3>Stop early</h3> + <p> + When you find the first sentence that overlaps the token, you can use extra knowledge that you might have, + such as: there's only one sentence per token, to conclude that having found it, there's no need to do any + further iteration, so you can stop the iteration. + </p> + <p>Furthermore, if the token appears outside of any sentence, you can similarly stop the iteration, and return + an "empty" result, as soon as the test sentence begins after the token's "begin". + This is because, at that point, due to the sorting of the returned values, no future sentences could + start before or equal to the token's begin. + </p> + <h3>Begin closer to the right spot, maybe iterate backwards</h3> + <p>But you can do better.</p> + <p>You can start the iteration, instead of at the beginning, at the position of the token, and iterate backwards. + Iterators have a moveTo() method which takes a feature structure argument, so you can moveTo(the-token), + and then perhaps with some edge adjustment for equality, start iterating backwards, looking for the sentence at that + position that covers the token. + </p> + <p>If you are iterating backwards, and looking for a "covering" annotation, and know the largest span for that + covering type, then you can stop iterating as soon as the start position you reach, + the largest span, is less than + the start of the annotation you're trying to cover.</p> + <p style="margin-left:1rem">This is used internally in version 3's + <a target="_blank" rel="noopener" href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select.annot.subselect">select framework</a> + to speed up + the <code>covering</code> kind of iteration.</p> + <p>There are many other examples, but the principle is the same: start the iteration "close to" the right spot, + perhaps moving backwards instead of forwards, and end the iteration as soon as you can logically say that + no more suitable feature structures would be found. </p> + <h3>Use UIMA Version 3's select framework</h3> + <p>The <a target="_blank" rel="noopener" href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select">select framework</a> + incorporates many of the popular use cases for doing iterations that we've seen, into a Java friendly approach that + automatically uses optimized iterators and can produce Java Streams, as well.</p> + </blockquote> + </td></tr> + </table> + </blockquote> + </p> + </td></tr> + </table> + <div class="sectionTable"> + <table class="sectionTable"> + <tr><td> + <a name="Working with Annotations"><h1><img src="images/UIMA_4sq50tightCropSolid.png"/> Working with Annotations</h1></a> + </td></tr> + <tr><td> + <blockquote class="sectionBody"> + <ul> + <li><a href='#Watch out for type-priorites'> + Watch out for type-priorites + + </a></li> + <li><a href='#Annotation containment'> + Annotation containment + + </a></li> + <li><a href='#Adjusting an existing annotation's begin and end'> + Adjusting an existing annotation's begin and end + + </a></li> + <li><a href='#Avoid where possible, copying sets of Feature Structures'> + Avoid where possible, copying sets of Feature Structures + + </a></li> + </ul> + <p> + The CAS holds Feature Structures (FSs). There is special support for FSs which are a subtype of Annotation; + these have an associated Subject of Analysis (Sofa) and <code>begin</code> and <code>end</code> offsets. + </p> + <h3>Annotations are not required in all cases</h3> + <p>If your application deals with a different kind of unstructured data, say, for instance, images, then + Annotations may not be the appropriate supertype for your types, because they're designed for + things having a linear begin / end meaningful demarcations. </p> + <p>You can have your feature structures inherit from TOP, or from some other appropriate supertype, other + than Annotation.</p> + <h3>Making use of the built-in Annotation index</h3> + <p>Annotations are special in UIMA in that there is a "built-in" index, the AnnotationIndex, which can be used + to rapidly access these in a sorted order. The ordering is by <code>begin</code> (ascending), then by + <code>end</code> (descending), and then by type-priorities.</p> + <p style="margin-left:1rem"><i>This is really a set of indexes, one for each subtype of Annotation.</i></p> + <p style="margin-left:1rem"><i>Although the index has type-priorities, in UIMA v3, the <code>select-framework</code> + by default ignores these; this behavior can be overridden.</i></p> + <table class="subsectionTable"> + <tr><td> + + + + <a name="Watch out for type-priorites"> + <h2>Watch out for type-priorites + </h2> + </a> + </td></tr> + <tr><td> + <blockquote class="subsectionBody"> + <p>When 2 annotations have the same start and end, but different types, then one comes before the other, + according to type priorites. This is intended to allow you to say if you have a Sentence annotation, and a + Foo annotation, both covering the same span, to declare that the Sentence logically contains Foo, and not the + other way around.</p> + <p>To make this work, you need to specify the type priorities. This is a global setting for your application. + See + <a target="_blank" rel="noopener" href="http://uima.apache.org/d/uimaj-current/references.html#ugr.ref.xml.component_descriptor.aes.primitive"> + type priorities</a> (scroll down to find it) for how to specify this.</p> + <h3>Avoiding type priorities</h3> + <p>Often, the use of type priorities gets in the way. With UIMA Version 3, the + <a target="_blank" rel="noopener" href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select">select framework</a> + by default ignores type priorites when doing its operations; but this can be overridden as needed.</p> + </blockquote> + </td></tr> + </table> + <table class="subsectionTable"> + <tr><td> + + + + <a name="Annotation containment"> + <h2>Annotation containment + </h2> + </a> + </td></tr> + <tr><td> + <blockquote class="subsectionBody"> + <h3>a contains b</h3> + <ul><li>Ignoring type priorities:</li></ul> + <pre>a != null && b != null && // null check +a.getBegin() <= b.getBegin() && // a starts before (or equal to) b +a.getEnd() >= b.getEnd() // a ends after (or equal to) b</pre> + <h3>a and b overlap (have at least one char in common)</h3> + <pre> + // ((omitted) check for non-null) +if (a.getBegin() <= b.getBegin()) { // if a starts before (or equal to) b + return a.getEnd() > b.getBegin(); // then it overlaps if a's end is after b's begin +} else { // otherwise, b's begin is before a's begin + return b.getEnd() > a.getBegin(); // so it overlaps if b's end is after a's begin. +</pre> + </blockquote> + </td></tr> + </table> + <table class="subsectionTable"> + <tr><td> + + + + <a name="Adjusting an existing annotation's begin and end"> + <h2>Adjusting an existing annotation's begin and end + </h2> + </a> + </td></tr> + <tr><td> + <blockquote class="subsectionBody"> + <p>Sometimes, your code may want to adjust an annotations begin and end values. + If the annotation is not indexed, there's no issue - just change the value. + But if it is indexed, it's in index(es) in a position determined by its begin and end position, so if you + change these, the item needs to be reindexed (in all the indexes holding it). Typically, only one index + (the Annotation Index for a particular CAS View) is involved, but in general, there could be multiple + indexes involved.</p> + <p>If you are using UIMA version 2.7.0 or later, the UIMA + <a target="_blank" rel="nopener" href="https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.cas.updating_indexed_feature_structures">framework</a> + detects updates that would need this re-indexing, and + automatically removes the Feature Structure from all involved index(es), updates the Feature, and then adds the Feature Structure back to the index(es). + </p> + <p>You can improve the efficiency of this, if you are updating, say, both the begin and end value of an annotation, by + doing this yourself, in your code. + <ul><li>Removing the item from the index(es)</li> + <li>Doing both updates</li> + <li>Adding the item back into the index(es)</li></ul>. + More details <a target="_blank" rel="nopener" href="https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.cas.updating_indexed_feature_structures">here</a>. + </p> + <p>Example: if you know a particular annotation is only indexed in one view, + then you can update it's begin and end features using + <pre>a.<b>removeFsFromIndexes</b>(); + + a.setBegin(new_value_begin); + a.setEnd(new_value_end); + +a.<b>addToIndexes</b>();</pre> +This is the most efficient way to do this. + </p> + <p>There's a couple of special forms you can use to protect indexes while you're updating features used as keys. + This is useful when you're not sure what feature values might be used as keys in some index. + <pre> +try (AutoCloseable ac = my_cas.<b>protectIndexes</b>()) { + // ... arbitrary user code which updates features + // which may be "keys" in one or more indexes, e.g. + + a.setBegin(new_value_begin); + a.setEnd(new_value_end); +}</pre> +or +<pre> +my_cas.<b>protectIndexes</b>(() -> { + // ... arbitrary user code updating "key" features, + // but no checked exceptions are permitted + // (because inside a lambda) + + a.setBegin(new_value_begin); + a.setEnd(new_value_end); + });</pre> + These use the frameworks automatic detection mechanism, and removes Feature Structures from all involved indexes + if needed, but delays adding them back, until the end of the protected section. + </p> + </blockquote> + </td></tr> + </table> + <table class="subsectionTable"> + <tr><td> + + + + <a name="Avoid where possible, copying sets of Feature Structures"> + <h2>Avoid where possible, copying sets of Feature Structures + </h2> + </a> + </td></tr> + <tr><td> + <blockquote class="subsectionBody"> + <p>Operations which iterate over Feature Structures, and put them into a Collection or List, and then + iterate over that list to do some other operations, can often be done directly on the Feature Structures in the CAS, + omitting the first copying of them into a list. + </p> + <p>A frequent speedup can happen when the particular logic can detect when no further items in a (sorted) index + are needed, and the iteration can be stopped early.</p> + <p>For example, you might have code which iterates over all feature structures of a particular type, and puts these into a list, + and then goes thru the list, and picks out certain ones and put those into another list, which is then returned. + </p> + <p>The first copying can be omitted, by moving the logic of what to include into the first iteration, and producing the second + list directly.</p> + <p>In UIMA Version 3, you can make use of the <a target="_blank" rel="noopener" href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select">select framework</a>. + It already has many of the use-cases where you might want to start or exit an iteration, accounted for. + You can also use its ability to produce streams, and combine that with Java's takeWhile method, to exit a stream early. + </p> + </blockquote> + </td></tr> + </table> + </blockquote> + </p> + </td></tr> + </table> + </td> + </tr> + <!-- FOOTER --> + <tr><td colspan="2"> + <hr noshade="" size="1"/> + </td></tr> + <tr><td colspan="2"> + <table class="pageFooter"> + <tr> + <td><a href="index.html">Home</a></td> + <td><a href="privacy-policy.html">Privacy Policy</a></td> + <td style="font-size:75%"> + Copyright © 2006-2013, The Apache Software Foundation.<br/> + Apache UIMA, UIMA, the Apache UIMA logo and the Apache Feather logo are trademarks of The Apache Software Foundation.<br/> + All other marks mentioned may be trademarks or registered trademarks of their respective owners. + </td> + <td><a href="mailto:d...@uima.apache.org">Contact us</a></td> + </tr> + </table> + </td></tr> + </table> + </body> + </html> + Added: uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml URL: http://svn.apache.org/viewvc/uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml?rev=1870165&view=auto ============================================================================== --- uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml (added) +++ uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml Fri Nov 22 15:29:22 2019 @@ -0,0 +1,252 @@ +<?xml version="1.0" encoding="ISO-8859-1"?> + +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + https://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +<document> + + <properties> + <title>Cookbook: addressing some typical use-cases</title> + <author email="d...@uima.apache.org"> + Apache UIMA Team + </author> + </properties> + + <body> + + <section name="Working with Feature Structures"> + <p>These work with all kinds of Feature Structures, Annotations and non-Annotations, both.</p> + + <subsection name="Remove all Feature Structures of a particular type"> + <p>There are built-in methods to do this, over all indexes in a particular view. There are 2 variations: + <ul><li>remove all including the subtypes of the type + <pre>myJCasView.removeAllIncludingSubtypes(Foo.type)</pre> + </li> + <li>remove all excluding the subtypes of the type + <pre>myJCasView.removeAllExcludingSubtypes(Foo.type)</pre></li></ul> + </p> + <p>Both of these are much faster than iterating over the Feature Structures; they directly clear the associated indexes.</p> + </subsection> + + <subsection name="General suggestions: working with iterators"> + + + <p>Many times code will iterate over all instances of a type, and only do something with a subset. + Frequently, the iteration can be cut short, by starting near the spot of interest and stopping as soon + as it can be determined that no further iteration will find interesting Annotations.</p> + + <p>Example: Let's say you have a "token" annotation, and want to find the "sentence" that contains it. + You could write an iterator over all sentences. + </p> + <h3>Stop early</h3> + <p> + When you find the first sentence that overlaps the token, you can use extra knowledge that you might have, + such as: there's only one sentence per token, to conclude that having found it, there's no need to do any + further iteration, so you can stop the iteration. + </p> + + <p>Furthermore, if the token appears outside of any sentence, you can similarly stop the iteration, and return + an "empty" result, as soon as the test sentence begins after the token's "begin". + This is because, at that point, due to the sorting of the returned values, no future sentences could + start before or equal to the token's begin. + </p> + + <h3>Begin closer to the right spot, maybe iterate backwards</h3> + + <p>But you can do better.</p> + <p>You can start the iteration, instead of at the beginning, at the position of the token, and iterate backwards. + Iterators have a moveTo() method which takes a feature structure argument, so you can moveTo(the-token), + and then perhaps with some edge adjustment for equality, start iterating backwards, looking for the sentence at that + position that covers the token. + </p> + + <p>If you are iterating backwards, and looking for a "covering" annotation, and know the largest span for that + covering type, then you can stop iterating as soon as the start position you reach, + the largest span, is less than + the start of the annotation you're trying to cover.</p> + + <p style="margin-left:1rem">This is used internally in version 3's + <a target="_blank" rel="noopener" + href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select.annot.subselect">select framework</a> + to speed up + the <code>covering</code> kind of iteration.</p> + + <p>There are many other examples, but the principle is the same: start the iteration "close to" the right spot, + perhaps moving backwards instead of forwards, and end the iteration as soon as you can logically say that + no more suitable feature structures would be found. </p> + + <h3>Use UIMA Version 3's select framework</h3> + <p>The <a target="_blank" rel="noopener" + href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select">select framework</a> + incorporates many of the popular use cases for doing iterations that we've seen, into a Java friendly approach that + automatically uses optimized iterators and can produce Java Streams, as well.</p> + </subsection> + + </section> + + <section + name="Working with Annotations"> + + <subsectionToc/> + <p> + The CAS holds Feature Structures (FSs). There is special support for FSs which are a subtype of Annotation; + these have an associated Subject of Analysis (Sofa) and <code>begin</code> and <code>end</code> offsets. + </p> + + <h3>Annotations are not required in all cases</h3> + <p>If your application deals with a different kind of unstructured data, say, for instance, images, then + Annotations may not be the appropriate supertype for your types, because they're designed for + things having a linear begin / end meaningful demarcations. </p> + <p>You can have your feature structures inherit from TOP, or from some other appropriate supertype, other + than Annotation.</p> + + <h3>Making use of the built-in Annotation index</h3> + <p>Annotations are special in UIMA in that there is a "built-in" index, the AnnotationIndex, which can be used + to rapidly access these in a sorted order. The ordering is by <code>begin</code> (ascending), then by + <code>end</code> (descending), and then by type-priorities.</p> + <p style="margin-left:1rem"><i>This is really a set of indexes, one for each subtype of Annotation.</i></p> + <p style="margin-left:1rem"><i>Although the index has type-priorities, in UIMA v3, the <code>select-framework</code> + by default ignores these; this behavior can be overridden.</i></p> + + <subsection name="Watch out for type-priorites"> + <p>When 2 annotations have the same start and end, but different types, then one comes before the other, + according to type priorites. This is intended to allow you to say if you have a Sentence annotation, and a + Foo annotation, both covering the same span, to declare that the Sentence logically contains Foo, and not the + other way around.</p> + + <p>To make this work, you need to specify the type priorities. This is a global setting for your application. + See + <a target="_blank" rel="noopener" + href="http://uima.apache.org/d/uimaj-current/references.html#ugr.ref.xml.component_descriptor.aes.primitive"> + type priorities</a> (scroll down to find it) for how to specify this.</p> + + <h3>Avoiding type priorities</h3> + <p>Often, the use of type priorities gets in the way. With UIMA Version 3, the + <a target="_blank" rel="noopener" + href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select">select framework</a> + by default ignores type priorites when doing its operations; but this can be overridden as needed.</p> + </subsection> + + <subsection name="Annotation containment"> + <h3>a contains b</h3> + <ul><li>Ignoring type priorities:</li></ul> + <pre>a != null && b != null && // null check +a.getBegin() <= b.getBegin() && // a starts before (or equal to) b +a.getEnd() >= b.getEnd() // a ends after (or equal to) b</pre> + + <h3>a and b overlap (have at least one char in common)</h3> + <pre> + // ((omitted) check for non-null) +if (a.getBegin() <= b.getBegin()) { // if a starts before (or equal to) b + return a.getEnd() > b.getBegin(); // then it overlaps if a's end is after b's begin +} else { // otherwise, b's begin is before a's begin + return b.getEnd() > a.getBegin(); // so it overlaps if b's end is after a's begin. +</pre> + </subsection> + + + <subsection name="Adjusting an existing annotation's begin and end"> + <p>Sometimes, your code may want to adjust an annotations begin and end values. + If the annotation is not indexed, there's no issue - just change the value. + But if it is indexed, it's in index(es) in a position determined by its begin and end position, so if you + change these, the item needs to be reindexed (in all the indexes holding it). Typically, only one index + (the Annotation Index for a particular CAS View) is involved, but in general, there could be multiple + indexes involved.</p> + + <p>If you are using UIMA version 2.7.0 or later, the UIMA + <a target="_blank" rel="nopener" + href="https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.cas.updating_indexed_feature_structures">framework</a> + detects updates that would need this re-indexing, and + automatically removes the Feature Structure from all involved index(es), updates the Feature, and then adds the Feature Structure back to the index(es). + </p> + + <p>You can improve the efficiency of this, if you are updating, say, both the begin and end value of an annotation, by + doing this yourself, in your code. + <ul><li>Removing the item from the index(es)</li> + <li>Doing both updates</li> + <li>Adding the item back into the index(es)</li></ul>. + More details <a target="_blank" rel="nopener" + href="https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.cas.updating_indexed_feature_structures">here</a>. + </p> + + <p>Example: if you know a particular annotation is only indexed in one view, + then you can update it's begin and end features using + <pre>a.<b>removeFsFromIndexes</b>(); + + a.setBegin(new_value_begin); + a.setEnd(new_value_end); + +a.<b>addToIndexes</b>();</pre> +This is the most efficient way to do this. + </p> + + <p>There's a couple of special forms you can use to protect indexes while you're updating features used as keys. + This is useful when you're not sure what feature values might be used as keys in some index. + <pre> +try (AutoCloseable ac = my_cas.<b>protectIndexes</b>()) { + // ... arbitrary user code which updates features + // which may be "keys" in one or more indexes, e.g. + + a.setBegin(new_value_begin); + a.setEnd(new_value_end); +}</pre> +or +<pre> +my_cas.<b>protectIndexes</b>(() -> { + // ... arbitrary user code updating "key" features, + // but no checked exceptions are permitted + // (because inside a lambda) + + a.setBegin(new_value_begin); + a.setEnd(new_value_end); + });</pre> + These use the frameworks automatic detection mechanism, and removes Feature Structures from all involved indexes + if needed, but delays adding them back, until the end of the protected section. + </p> + + </subsection> + + <subsection name="Avoid where possible, copying sets of Feature Structures"> + + <p>Operations which iterate over Feature Structures, and put them into a Collection or List, and then + iterate over that list to do some other operations, can often be done directly on the Feature Structures in the CAS, + omitting the first copying of them into a list. + </p> + + <p>A frequent speedup can happen when the particular logic can detect when no further items in a (sorted) index + are needed, and the iteration can be stopped early.</p> + + <p>For example, you might have code which iterates over all feature structures of a particular type, and puts these into a list, + and then goes thru the list, and picks out certain ones and put those into another list, which is then returned. + </p> + + <p>The first copying can be omitted, by moving the logic of what to include into the first iteration, and producing the second + list directly.</p> + + <p>In UIMA Version 3, you can make use of the <a target="_blank" rel="noopener" + href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select">select framework</a>. + It already has many of the use-cases where you might want to start or exit an iteration, accounted for. + You can also use its ability to produce streams, and combine that with Java's takeWhile method, to exit a stream early. + </p> + + </subsection> + </section> + </body> + +</document> +