Hello there,


We use Machine Translation as one of our components for translations. We
call AWS Translate downstream for short sentences and it performs decently
well. However, being a neural MT system, it fails on longer sentences. Our
metadata assets – (long synopsis, short synopsis) are typically sentences
of length ~40words (or more!). AWS Translate often loses context, skips
words and garbles meaning, resulting in poor translations.



We are currently looking at sentence segmentation into phrases and getting
those individual phrases translated and concatenated back. (ie.
Implementing this paper
<http://tcci.ccf.org.cn/conference/2016/papers/72.pdf>). However, the split
model described is ambiguous about the feature defined(Specifically
Equation 11 in Section 4.1). Has anyone here come across this problem /
knows of any other approaches we could try for translating long sentences?



Here’s an example of a long sentence –

When returning to his old law practice proves harder than he thought, Jeff
signs on to help his longtime nemesis Alan Connor represent Marvin
Humphries, a Greendale Community College-trained engineer who designed a
bridge that collapsed. To keep the school from shredding the evidence of
his client’s shoddy education, Alan asks Jeff to steal his records so he
can use them in court..



We’d like this broken into

   1. When returning to his old law practice proves harder than he thought,
   2. Jeff signs on to help his longtime nemesis Alan Connor represent
   Marvin Humphries,
   3. a Greendale Community College-trained engineer who designed a bridge
   that collapsed

(We’ve verified that these clauses get translated correctly.)



Thank you.

Warm Regards,

Karthika.

Reply via email to