Hello,

reading 4.1 carefully I think it is more or less just what the POS
Tagger we have does. Its by default using Maxent and it can be
configured to use the token-window features as in section 4.1. Also it
can be trained to only predict two tags, in the case of 4.1 it would
be SPLIT and NO_SPLIT. With Maxent it is often not so important which
features you pick, better features often only help to gain a bit more
accuracy.

The POS Tagger can be trained and evaluated via the command line
rather quickly and can give you an answer to how well this will work
from a splitting perspective.

The training format looks like this
   ... he_NO_SPLIT thought_NO_SPLIT    ,_SPLIT    Jeff_NO_SPLIT
signs_NO_SPLIT on_NO_SPLIT

Have a look at our documentation, or ask here if you need more help [1].

The more tricky part is probably to compile the training data for this.

HTH,
Jörn

[1] 
https://opennlp.apache.org/docs/1.8.4/manual/opennlp.html#tools.postagger.tagging

On Thu, Feb 8, 2018 at 3:19 PM, karthika nair <[email protected]> wrote:
> Hello there,
>
>
>
> We use Machine Translation as one of our components for translations. We
> call AWS Translate downstream for short sentences and it performs decently
> well. However, being a neural MT system, it fails on longer sentences. Our
> metadata assets – (long synopsis, short synopsis) are typically sentences
> of length ~40words (or more!). AWS Translate often loses context, skips
> words and garbles meaning, resulting in poor translations.
>
>
>
> We are currently looking at sentence segmentation into phrases and getting
> those individual phrases translated and concatenated back. (ie.
> Implementing this paper
> <http://tcci.ccf.org.cn/conference/2016/papers/72.pdf>). However, the split
> model described is ambiguous about the feature defined(Specifically
> Equation 11 in Section 4.1). Has anyone here come across this problem /
> knows of any other approaches we could try for translating long sentences?
>
>
>
> Here’s an example of a long sentence –
>
> When returning to his old law practice proves harder than he thought, Jeff
> signs on to help his longtime nemesis Alan Connor represent Marvin
> Humphries, a Greendale Community College-trained engineer who designed a
> bridge that collapsed. To keep the school from shredding the evidence of
> his client’s shoddy education, Alan asks Jeff to steal his records so he
> can use them in court..
>
>
>
> We’d like this broken into
>
>    1. When returning to his old law practice proves harder than he thought,
>    2. Jeff signs on to help his longtime nemesis Alan Connor represent
>    Marvin Humphries,
>    3. a Greendale Community College-trained engineer who designed a bridge
>    that collapsed
>
> (We’ve verified that these clauses get translated correctly.)
>
>
>
> Thank you.
>
> Warm Regards,
>
> Karthika.

Reply via email to