Michael Baessler wrote:
Marshall Schor wrote:
Language specifications are in a hierarchy. For example, from most
inclusive to finer subsets, we have:
x-unspecified
en
en-us
A result spec's most common use is in a negative sense - Annotators
can check a result spec and if it doesn't contain the type or
feature, it can skip producing that type or feature.
For simplicity, let's consider we have only one type or feature,
called TF.
If the annotator thinks it produces TF for language en-us only, and
wants to check if should skip producing this, it calls
containsType/Feature(TF, "en-us"). This is defined in the current
impl to return true, if the result spec has languages x-unspecified,
en, or en-us.
Let's consider the opposite case. Suppose we have an annotator that
can produce TF for "en". Suppose the result-spec has an entry for TF
only for the language "en-us". Should that annotator produce
results? If it calls containsType/Feature(TF, "en"), it will get a
"false" (current implementation).
After some thinking about this and some discussion (because I don't
think I got it right, just by myself :-) ),
it seems that this is correct. Consider the following case:
The language of the document is "en", and the containing (top-most)
aggregate specified explicitly it wanted
output only for en-us. In that case, the annotator should not
produce any results, because the language
of this doc is not en-us, and the assembler put together things that
they said should only output en-us results.
This same logic seems to apply to x-unspecified:
Suppose we have an annotator that can produce TF for
"x-unspecified". Suppose the result-spec has an entry for TF only
for the language "en". Should that annotator produce results? If it
calls containsType/Feature(TF, "x-unspecified"), it should get a
"false" (broken in the current implementation!, but was true I think
in the previous one).
I'm not sure you are right here. I think if an annotator can produce
TF for "x-unspecified" that means that it can produce TF for all
languages. So if an "en" document comes in the annotator should
produce a result.
hmmm, this seems to contradict your statement below, saying "That case
is correct".
In the example below, the result-spec passed in to the annotator has
only "en", not "x-unspecified". This is the case proposed in my
paragraph. Below you say it is right for the annotator to *not* produce
results, while above you say it should produce results. This is
inconsistent, unless I've mangled something... Can you clarify?
-Marshall
Assume the language of the document is "x-unspecified", and the
containing (top-most) aggregate specified explicitly it wanted
output only for en. In that case, the annotator should not produce
any results, because the language
of this doc is not "en", and the assembler put together things that
they said should only output "en" results.
That case is correct.
-- Michael