Incidentally, you can split the fields more simply using the “unpaste” command:
cat file_de-en.tsv | unpaste file.{de,en}
Unpaste is available here:
https://github.com/mjpost/bin/blob/master/unpaste
matt (from my phone)
> Le 30 mars 2020 à 21:01, Artem Shevchenko a écrit :
>
>
> fo
Hi,
v9 is mainly for other languages - it is slightly bigger than earlier
versions for languages
where multiple versions exist.
-phi
On Mon, Mar 30, 2020 at 9:01 PM Artem Shevchenko wrote:
> found how to split fields in tab-separated de-en sentences.
> just if someone needs it, do it with cut
found how to split fields in tab-separated de-en sentences.
just if someone needs it, do it with cut - f 1 or 2:
cat file_de-en.tsv | cut -f 1 > file.de
cat file_de-en.tsv | cut -f 2 > file.en
so the only question, is europarl v9 better than v8 or v7.
вт, 31 мар. 2020 г. в 02:21, Artem Shevchen
Hello,
thank you very much for your reply.
my target is to rebuild translation memory for de-en pair while keeping
truecase in the German phrase table.
In models released with 4.0 for de-en it is all smallcased, which makes
impossible to distinguish between e.g. a noun (das Wissen) and a verb zu
w
Hi,
you are free to use this data - v9 has only been generated for some
language pairs, since the amount of translations have not increased
significantly for a few years by now.
-phi
On Mon, Mar 30, 2020 at 6:50 AM Artem Shevchenko wrote:
> Hello,
>
> I have found this:
> http://www.statmt.org
Hello,
I have found this:
http://www.statmt.org/europarl/v9/ dated 2019-02
It contains parallel corpus v9?
However no mentioning of v9 elsewhere.
Is it released?
Can it be used?
Thank you!
Artem Shevchenko
___
Moses-support mailing list
Moses-suppo