Michael Baessler wrote:
Marshall Schor wrote:
Michael Baessler wrote:
Marshall Schor wrote:
Language specifications are in a hierarchy. For example, from most inclusive to finer subsets, we have:

x-unspecified
  en
    en-us

A result spec's most common use is in a negative sense - Annotators can check a result spec and if it doesn't contain the type or feature, it can skip producing that type or feature.

For simplicity, let's consider we have only one type or feature, called TF.

If the annotator thinks it produces TF for language en-us only, and wants to check if should skip producing this, it calls containsType/Feature(TF, "en-us"). This is defined in the current impl to return true, if the result spec has languages x-unspecified, en, or en-us.

Let's consider the opposite case. Suppose we have an annotator that can produce TF for "en". Suppose the result-spec has an entry for TF only for the language "en-us". Should that annotator produce results? If it calls containsType/Feature(TF, "en"), it will get a "false" (current implementation).

After some thinking about this and some discussion (because I don't think I got it right, just by myself :-) ),
it seems that this is correct.  Consider the following case:
The language of the document is "en", and the containing (top-most) aggregate specified explicitly it wanted output only for en-us. In that case, the annotator should not produce any results, because the language of this doc is not en-us, and the assembler put together things that they said should only output en-us results.

This same logic seems to apply to x-unspecified:

Suppose we have an annotator that can produce TF for "x-unspecified". Suppose the result-spec has an entry for TF only for the language "en". Should that annotator produce results? If it calls containsType/Feature(TF, "x-unspecified"), it should get a "false" (broken in the current implementation!, but was true I think in the previous one).
I'm not sure you are right here. I think if an annotator can produce TF for "x-unspecified" that means that it can produce TF for all languages. So if an "en" document comes in the annotator should produce a result.
hmmm, this seems to contradict your statement below, saying "That case is correct".

In the example below, the result-spec passed in to the annotator has only "en", not "x-unspecified". This is the case proposed in my paragraph. Below you say it is right for the annotator to *not* produce results, while above you say it should produce results. This is inconsistent, unless I've mangled something... Can you clarify?

-Marshall

Assume the language of the document is "x-unspecified", and the containing (top-most) aggregate specified explicitly it wanted output only for en. In that case, the annotator should not produce any results, because the language of this doc is not "en", and the assembler put together things that they said should only output "en" results.

That case is correct.

-- Michael



Maybe the confusion comes from the different treatment of "x-unspecified". If "x-unspecified" is specified in the output spec of an annotator it means that it can produce results for all languages.
True - and that works. But that wasn't the case I was trying to describe - I was trying to describe the opposite case: The case where the *output spec* of an annotator is *missing* the "x-unspecified". To restate the case: The output spec has "en" (only), and the annotator, when running, queries the result spec with "x-unspecified". This proposal says in that case, containsType should return false. Do you agree this should be the result in this case? It seems you do above when you say "That case is correct", but disagree in the paragraph where you say "I'm not sure you are right here.". Perhaps I have not clearly described the two cases, but I think they are the same case (and therefore need to have the same answer ;-) )
-Marshall


-- Michael




Reply via email to