date:20141024

RE: A really hairy token graph case

2014-10-24 Thread Will Martin

Benson: I'm in danger of trying to remember CPL's german decompounder and how we used it. That would be a very unreliable memory. However at the link below David and Rupert have a resoundingly informative discussion about making similar work for synonyms. It might bear reading through the kb

Re: A really hairy token graph case

2014-10-24 Thread Benson Margulies

I don't think so ... Let me be specific: First, consider the case of one 'analysis': an input token maps to a lemma and a sequence of components. So, we product surface form lemmaPI 0 comp1PI 0 comp2PI 1 . with PL set appropriately to cover the pieces. A

RE: A really hairy token graph case

2014-10-24 Thread Will Martin

HI Benson: This is the case with n-gramming (though you have a more complicated start chooser than most I imagine). Does that help get your ideas unblocked? Will -Original Message- From: Benson Margulies [mailto:bimargul...@gmail.com] Sent: Friday, October 24, 2014 4:43 PM To: java-us

A really hairy token graph case

2014-10-24 Thread Benson Margulies

Consider a case where we have a token which can be subdivided in several ways. This can happen in German. We'd like to represent this with positionIncrement/positionLength, but it does not seem possible. Once the position has moved out from one set of 'subtokens', we see no way to move it back for