Benson: I'm in danger of trying to remember CPL's german decompounder and how
we used it. That would be a very unreliable memory.
However at the link below David and Rupert have a resoundingly informative
discussion about making similar work for synonyms. It might bear reading
through the kb
I don't think so ... Let me be specific:
First, consider the case of one 'analysis': an input token maps to a lemma
and a sequence of components.
So, we product
surface form
lemmaPI 0
comp1PI 0
comp2PI 1
.
with PL set appropriately to cover the pieces. A
HI Benson:
This is the case with n-gramming (though you have a more complicated start
chooser than most I imagine). Does that help get your ideas unblocked?
Will
-Original Message-
From: Benson Margulies [mailto:bimargul...@gmail.com]
Sent: Friday, October 24, 2014 4:43 PM
To: java-us
Consider a case where we have a token which can be subdivided in
several ways. This can happen in German. We'd like to represent this
with positionIncrement/positionLength, but it does not seem possible.
Once the position has moved out from one set of 'subtokens', we see no
way to move it back for