Hi Tomasz
The moseserver is just the decoder, so it doesn't do any of the pre- and
post-processing steps that you also need. In particular it does not do
tokenisation. You need to send it tokenised text, and then de-tokenise
the output,
cheers - Barry
On 12/11/15 13:40, Tomasz Gawryl wrote:
Hi Ulrich,
I have a question about Moses server too. I'm testing it as a wrapper for
Across server to check pre-translation possibilities. It generally works but
there is one problem. Input segments are translated without tokenization, so
every word close to special character (for example `this
Hi Barry,
Have there ever been any thoughts about implementing
tokenization/detokenization directly in Moses? I suppose this is some
work as Moses should become language-aware, but I can only see
advantanges from this. Besides, Moses is a language tool so these
concepts shouldn't be so
there has been thoughts. There is a c++ tokenizer in
contrib/c++tokenizer
it compiles into a library file, ready for integration.
The last time i checked, it gave a slightly worse BLEU. Not much, but
consistent.
If anyone wants to carry on with it, they're welcome to
Hieu Hoang
Thanks for the info Hieu, didn't know that:) I'll try it sometime.
Best,
Panos
On 12/11/2015 4:41 μμ, Hieu Hoang wrote:
there has been thoughts. There is a c++ tokenizer in
contrib/c++tokenizer
it compiles into a library file, ready for integration.
The last time i checked, it gave a
Hi Dingyuan,
I was actually thinking about implementing the logic and rules from the
perl scripts, which seem to do the job, into a separate library (that
thankfully exists as Heiu informed:)).
Asking an end-user to add himself a programming layer in a seemingly
straightforward process is
Hi,
there are a lot of different pre and post processing
steps that you may want to apply for any given
language pair, so it makes sense to keep them
out of the decoder.
If you are interested in a server implementation
that integrates tokenization, truecasing, etc., check
out Christian Buck's