Hello there,
We use Machine Translation as one of our components for translations. We call AWS Translate downstream for short sentences and it performs decently well. However, being a neural MT system, it fails on longer sentences. Our metadata assets – (long synopsis, short synopsis) are typically sentences of length ~40words (or more!). AWS Translate often loses context, skips words and garbles meaning, resulting in poor translations. We are currently looking at sentence segmentation into phrases and getting those individual phrases translated and concatenated back. (ie. Implementing this paper <http://tcci.ccf.org.cn/conference/2016/papers/72.pdf>). However, the split model described is ambiguous about the feature defined(Specifically Equation 11 in Section 4.1). Has anyone here come across this problem / knows of any other approaches we could try for translating long sentences? Here’s an example of a long sentence – When returning to his old law practice proves harder than he thought, Jeff signs on to help his longtime nemesis Alan Connor represent Marvin Humphries, a Greendale Community College-trained engineer who designed a bridge that collapsed. To keep the school from shredding the evidence of his client’s shoddy education, Alan asks Jeff to steal his records so he can use them in court.. We’d like this broken into 1. When returning to his old law practice proves harder than he thought, 2. Jeff signs on to help his longtime nemesis Alan Connor represent Marvin Humphries, 3. a Greendale Community College-trained engineer who designed a bridge that collapsed (We’ve verified that these clauses get translated correctly.) Thank you. Warm Regards, Karthika.
