Hi Jinyi,
When executing the script like you did, make sure, that the files
"raw.chn" and "raw.eng" are located in the same directory as
"clean-corpus.pl". Otherwise indicate the path to the files:
./clean-corpus-n.perl path/to/raw chn eng output/path/raw.clean 1 100
it might also help to rename
When I excueted the command "./clean-corpus-n.perl raw chn eng clean 1 100",
it showed as follows:
clean-corpus.perl: processing raw.chn & .eng to clean, cutoff 1-100
Use of uninitialized value in open at ./clean-corpus-n.perl line 46.
Use of uninitialized value in concatenation (.) or string at
Yep. I had that kind of error once. You have to sentence align first. The
problem is the lack of documentation and not the programs themselves.
Quoting Barry Haddow <[EMAIL PROTECTED]>:
> Hi Iain
>
> You should check that your es and en files both have the same number of
> lines.
> I think this
Hi Iain
You should check that your es and en files both have the same number of lines.
I think this error message is telling you that there's a length mismatch,
perhaps from the concatenation script?
regards
Barry
On Friday 14 March 2008 14:24:37 Iain Adams wrote:
> Dear Mailing List,
>
> We a
Dear Mailing List,
We are trying to train SRILM on europarl(en-es) but are running into problems.
After tokenizing the data we run clean-corpus-n.perl however the message
/home/aca04iba/en-es/corpus/corpus.tok.en is too long! at
/home/aca04iba/en-es/bin/moses-scripts/scripts-20080306-1457/trainin