Re: Result specification - update needed

Adam Lally Mon, 27 Nov 2006 07:38:03 -0800

On 11/25/06, Marshall Schor <[EMAIL PROTECTED]> wrote:

I need to write up the version 2 tutorial and user's guide for Results
Specification.  The current write up is inaccurate, I think.  I started
to change it to fit the new API where it is not passed in as a
parameter, but there are more things that need fixing.


Could Adam and/or Thilo take a look at this write up and fix it up?
(see below):
<snip/>


Yes, this needed an overhaul.  Result Specifcation handling in
aggregates no longer has anything to do with the type of flow.  Here's
my suggested documentation (note I used <code/> tags for monospace
font as in HTML, I have no idea if that's right for docbook):

<section id="ugr.tug.aae.result_specification_setting">
        <title>Result Specification Setting</title>
        
        <para>The Result Specification is passed to the annotator instance by
calling its
                setResultSpecificaiton method. When called, the default
implementation saves the
                result specification in an instance variable of the Annotator
instance, which can be
                accessed by the annotator using the protected
                <code>getResultSpecification()</code> method.</para>
        
        <para>A Result Specification is a list of output types and / or 
type:feature
                names, which are expected to be
                <quote>output</quote> from the annotator. Annotators may use 
this to optimize
                their operations, when possible, for those cases where only
particular outputs are
                wanted. The interface to the Result Specification object (see 
the
JavaDocs) allows
                querying both types and particular features of types.</para>
        
        <para>Sometimes you can specify the Result Specification; othertimes,
you cannot (for
                instance, inside a Collection Processing Engine, you cannot). 
When you cannot
                specify it, or choose not to specify it (for example, using the 
form of the
                process(...) call on an Analysis Engine that doesn&apos;t 
include the Result
                Specification), a
                <quote>Default</quote> Result Specification is used.</para>
        
</section>

<section>
        <title>Default ResultSpecification</title>
        
        <para>The default Result Specification is taken from the Engine&apos;s 
output
                Capability Specification. Remember that a Capability 
Specification has both
                inputs and outputs, can specify types and / or features, and 
there
can be more than one
                Capability Set. If there is more than one set, the logical 
union of
these sets is used.
                The default Result Specification is exactly what&apos;s 
included in the output
                Capability Specification.</para>
        
</section>

<section>
        <title>Passing Result Specifications to Analysis Engines</title>
        
        <para>If you are not using a Collection Processing Engine, you can
specify a Result Specification
                for your AnalysisEngine(s) by calling the
<code>AnalysisEngine.setResultSpecification(ResultSpecification)</code>
                method.</para>
        <para>It is also possible to pass a Result Specification on each call to
                <code>AnalysisEngine.process(CAS, ResultSpecification)</code>.
However, this is not recommended
                if your Result Specification will stay constant across multiple
calls to <code>process</code>.
                In that case it will be more efficient to call
<code>AnalysisEngine.setResultSpecification(ResultSpecification)</code>
                only when the Result Specification changes.</para>
        <para>            
                For primitive Analysis Engines, whatever Result Specification 
you pass in is
                passed along to the annotator's
<code>setResultSpecification(ResultSpecification)</code>
                method.  For aggregate Analysis Engines, see below.</para>
</section>

<section>
        <title>Aggregates</title>
        
        <para>For aggregate engines, the Result Specification passed to the
                
<code>AnalysisEngine.setResultSpecification(ResultSpecification)</code>
method is intended
                to specify the set of output types/features that the aggregate
should produce.  This is not
                necessarily equivalent to the set of output types/features that 
each
annotator should produce.
                For example, an annotator may need to produce an intermediate 
type
that is then consumed
                by a downstream annotator, even though that intermediate type 
is not
part of the Result
                Specification.</para>
        <para>To handle this situation, when
<code>AnalysisEngine.setResultSpecification(ResultSpecification)</code>
is called on
                an aggregate, the framework computes the union of the passed 
Result
Specification with the set of
                <emph>all</emph> input types and features of <emph>all</emph>
component AnalysisEngines within that
                aggregate.  This forms the complete set of types and features 
that
any component of the aggregate
                might need to produce.  This derived Result Specification is 
then
passed to the
                
<code>AnalysisEngine.setResultSpecification(ResultSpecification)</code>
of each component AnalysisEngine.
                In the case of nested aggregates, this procedure is applied
recursively.</para>
</section>

<section>
        <title>Collection Proessing Engines</title>
        
        <para>The Default Result Specification is always used for all
components of a Collection
                Processing Engine.</para>
</section>

<!--
        This no longer belongs as part of the discussion of Rsult 
Specifications.
        The CapabilityLanguageFlow now skips annotators on the basis of their 
complete
        capabilities, it does not take the Result Specification into account.
        Result Specifications are no longer the concern of the Flow
Controller, since this
        was deemed to be too great a complexity without enough benefit.
        
<section>
        <title>Special rule for skipping Analysis Engines</title>
        
        <para>When using the CapabilityLanguageFlow, an annotator will be
also be skipped if all
                of its outputs are in the output capability of some annotator(s)
that has (have)
                executed previously in the flow. The concept here is that if 
all of an
                annotator&apos;s output types have already been produced, that
annotator will not
                be called.</para>
        
        <para>For an Aggregate, each annotator is passed a Result
Specification that is the
                intersection of the set of types mentioned in its output with 
the Result
                Specification passed to the aggregate. If this intersection is 
null
(the annotator
                does not produce any type included in the ResultSpecification), 
the
annotator will
                not be called at all.</para>
        
        <para>Therefore, if using the CapabilityLanguageFlow, if you want to
supply a custom
                ResultSpecification for the aggregate it must include any
intermediate types that
                need to be produced, or else things will not work 
properly.</para>
</section>
-->

Re: Result specification - update needed

Reply via email to