Re: Clarifying language subsumption in Result Specifications

Marshall Schor Tue, 29 Jan 2008 05:40:22 -0800

Michael Baessler wrote:

Marshall Schor wrote:
Michael Baessler wrote:
Marshall Schor wrote:
Language specifications are in a hierarchy. For example, from mostinclusive to finer subsets, we have:
x-unspecified
  en
    en-us
A result spec's most common use is in a negative sense - Annotatorscan check a result spec and if it doesn't contain the type orfeature, it can skip producing that type or feature.
For simplicity, let's consider we have only one type or feature,called TF.
If the annotator thinks it produces TF for language en-us only, andwants to check if should skip producing this, it callscontainsType/Feature(TF, "en-us"). This is defined in the currentimpl to return true, if the result spec has languagesx-unspecified, en, or en-us.
Let's consider the opposite case. Suppose we have an annotatorthat can produce TF for "en". Suppose the result-spec has an entryfor TF only for the language "en-us". Should that annotatorproduce results? If it calls containsType/Feature(TF, "en"), itwill get a "false" (current implementation).
After some thinking about this and some discussion (because I don'tthink I got it right, just by myself :-) ),
it seems that this is correct.  Consider the following case:
The language of the document is "en", and the containing(top-most) aggregate specified explicitly it wantedoutput only for en-us. In that case, the annotator should notproduce any results, because the languageof this doc is not en-us, and the assembler put together thingsthat they said should only output en-us results.
This same logic seems to apply to x-unspecified:
Suppose we have an annotator that can produce TF for"x-unspecified". Suppose the result-spec has an entry for TF onlyfor the language "en". Should that annotator produce results? Ifit calls containsType/Feature(TF, "x-unspecified"), it should get a"false" (broken in the current implementation!, but was true Ithink in the previous one).
I'm not sure you are right here. I think if an annotator can produceTF for "x-unspecified" that means that it can produce TF for alllanguages. So if an "en" document comes in the annotator shouldproduce a result.
hmmm, this seems to contradict your statement below, saying "Thatcase is correct".
In the example below, the result-spec passed in to the annotator hasonly "en", not "x-unspecified". This is the case proposed in myparagraph. Below you say it is right for the annotator to *not*produce results, while above you say it should produce results. Thisis inconsistent, unless I've mangled something... Can you clarify?
-Marshall
Assume the language of the document is "x-unspecified", and thecontaining (top-most) aggregate specified explicitly it wantedoutput only for en. In that case, the annotator should not produceany results, because the languageof this doc is not "en", and the assembler put together things thatthey said should only output "en" results.
That case is correct.

-- Michael
Maybe the confusion comes from the different treatment of"x-unspecified". If "x-unspecified" is specified in the output spec ofan annotator it means that it can produce results for all languages.

True - and that works. But that wasn't the case I was trying todescribe - I was trying to describe the opposite case: The case wherethe *output spec* of an annotator is *missing* the "x-unspecified".To restate the case: The output spec has "en" (only), and theannotator, when running, queries the result spec with "x-unspecified".This proposal says in that case, containsType should return false. Doyou agree this should be the result in this case? It seems you do abovewhen you say "That case is correct", but disagree in the paragraph whereyou say "I'm not sure you are right here.".Perhaps I have not clearly described the two cases, but I think they arethe same case (and therefore need to have the same answer ;-) )

-Marshall


-- Michael

Re: Clarifying language subsumption in Result Specifications

Reply via email to