Re: [Apertium-stuff] Released: nno-nob 1.4.0

2021-04-29 Thread Francis Tyers via Apertium-stuff
A 2021-04-29 11:54, Kevin Brubeck Unhammer escrigué: Hi, I've tagged some new releases of nno, nob and apertium-nno-nob. Like before[0], the work has been funded by the Norwegian Ministry of Culture via Nynorsk pressekontor (NPK) and the Norwegian News Agency, now with direct commits from

Re: [Apertium-stuff] Cleaning Parallel Corpus

2021-04-29 Thread VIVEK VICKY
Awesome I will try it out. Thanks!! On Thu, 29 Apr, 2021, 11:31 pm Tanmai Khanna, wrote: > Since you have only about 5-8 such sentences for every 2000 lines, and it > seems like empty lines are a reliable marker for these kind of situations, > something I would do is to prune the corpus and

Re: [Apertium-stuff] Cleaning Parallel Corpus

2021-04-29 Thread Tanmai Khanna
Since you have only about 5-8 such sentences for every 2000 lines, and it seems like empty lines are a reliable marker for these kind of situations, something I would do is to prune the corpus and remove any empty line along with two lines before and two lines after it from both the english and

Re: [Apertium-stuff] Cleaning Parallel Corpus

2021-04-29 Thread VIVEK VICKY
On Thu, Apr 29, 2021 at 3:35 PM Kevin Brubeck Unhammer wrote: > VIVEK VICKY > čálii: > > > Hello everyone, > > The eng-spa parallel corpora I am using(http://www.statmt.org/europarl/, > > http://www.statmt.org/wmt13/training-parallel-nc-v8.tgz), have empty > lines > > in either languages due to

[Apertium-stuff] Released: nno-nob 1.4.0

2021-04-29 Thread Kevin Brubeck Unhammer
Hi, I've tagged some new releases of nno, nob and apertium-nno-nob. Like before[0], the work has been funded by the Norwegian Ministry of Culture via Nynorsk pressekontor (NPK) and the Norwegian News Agency, now with direct commits from contributors Anja, Victoria and Hallvard of NPK :-) One

Re: [Apertium-stuff] Cleaning Parallel Corpus

2021-04-29 Thread Kevin Brubeck Unhammer
VIVEK VICKY čálii: > Hello everyone, > The eng-spa parallel corpora I am using(http://www.statmt.org/europarl/, > http://www.statmt.org/wmt13/training-parallel-nc-v8.tgz), have empty lines > in either languages due to splitting of a sentence into two or merging of > two sentences after the