Re: Clarifying language subsumption in Result Specifications

Marshall Schor Mon, 28 Jan 2008 15:02:41 -0800

I tried implimenting this change, and 2 test cases fail. They look likethey are failing exactly in the case where the result specification hasa TypeOrFeature with a specified type other than "x-unspecified", andthe containsTypeOrFeature method is being called using the form whichdoesn't pass in an explicit type, so is being treated as ifx-unspecified was passed in.As discussed below, this should give "false", but the text cases expecttrue.


Should I change the test cases?  The failing ones are:

ResultSpecification_implTest: It defines a result spec containing thetype "FakeType" for languages "en", "de", "en-US", "en-GB", but not"x-unspecified". So the call rs.containsType("FakeType") returns false,but the test says it should return true (because the set of languagesfor FakeType is missing x-unspecified).


The other test is the PearRuntimeTest.
This test loads two Pears, runs them and then looks at the CAS result.

The descriptor for one of the tests, the TutorialDateTime descriptorsays it output 3 types, *but for language "en"* (only, and not forx-unspecified in particular).

The result spec built for the aggregate is empty (the test case hasnothing specified here).When it is passed down to the delegates, the setResultSpecification forthe Pear descriptor in PearAnalysisEngineWrapper is called. This is notimplemented, so it inherits from its super, which isAnalysisEngineImplBase - and this impl does nothing (expecting to beoverridden). I'll write this up as a Jira issue.But even if this were "fixed", because the outer Aggregate had nothingspecified in its capability, the inner primitive analysis engine is setup initially with a "default" result spec, which is its own outputcapabilities. This spec says it should produce results just for "en",and in particular it should *not* produce output for x-unspecified.This annotator is written to respect the result spec, so it doesn'tproduce anything.


Anyone object to my changing the test cases?

-Marshall

Marshall Schor wrote:

Language specifications are in a hierarchy. For example, from mostinclusive to finer subsets, we have:
x-unspecified
  en
    en-us
A result spec's most common use is in a negative sense - Annotatorscan check a result spec and if it doesn't contain the type or feature,it can skip producing that type or feature.
For simplicity, let's consider we have only one type or feature,called TF.
If the annotator thinks it produces TF for language en-us only, andwants to check if should skip producing this, it callscontainsType/Feature(TF, "en-us"). This is defined in the currentimpl to return true, if the result spec has languages x-unspecified,en, or en-us.
Let's consider the opposite case. Suppose we have an annotator thatcan produce TF for "en". Suppose the result-spec has an entry for TFonly for the language "en-us". Should that annotator produceresults? If it calls containsType/Feature(TF, "en"), it will get a"false" (current implementation).
After some thinking about this and some discussion (because I don'tthink I got it right, just by myself :-) ),
it seems that this is correct.  Consider the following case:
The language of the document is "en", and the containing (top-most)aggregate specified explicitly it wantedoutput only for en-us. In that case, the annotator should notproduce any results, because the languageof this doc is not en-us, and the assembler put together things thatthey said should only output en-us results.
This same logic seems to apply to x-unspecified:
Suppose we have an annotator that can produce TF for "x-unspecified".Suppose the result-spec has an entry for TF only for the language"en". Should that annotator produce results? If it callscontainsType/Feature(TF, "x-unspecified"), it should get a "false"(broken in the current implementation!, but was true I think in theprevious one).
Assume the language of the document is "x-unspecified", and thecontaining (top-most) aggregate specified explicitly it wantedoutput only for en. In that case, the annotator should not produceany results, because the languageof this doc is not "en", and the assembler put together things thatthey said should only output "en" results.
Do others agree with this?

-Marshall

Re: Clarifying language subsumption in Result Specifications

Reply via email to