hat's working best
for your use case.
Also, I'm sure that there would be a couple of other ways of harnessing
a dictionary in Moses.
Cheers,
Matthias
> > On Fri, Jul 7, 2017 at 4:26 PM, Matthias Huck <mh...@cis.lmu.de> wrote:
>
> >
> > Hi,
> >
&
Hi,
A simple solution would be to just append your dictionary to the
parallel training data. Or create a second phrase table from the
dictionary and do phrase table fillup or something similar.
Cheers,
Matthias
On Fri, 2017-07-07 at 15:02 +0530, Sanjanashree Palanivel wrote:
> HI all,
>
>
Hi,
Philipp Koehn's textbook is a nice introduction to SMT:
http://www.cambridge.org/catalogue/catalogue.asp?isbn=0521874157
http://www.statmt.org/book/
For advanced topics, it's best to read the primary literature (i.e.,
research papers published in conference proceedings and scientific
Hi,
It might be better to do phrase table fill-up.
You would add entries from a second phrase table ("background phrase
table") to your first phrase table ("foreground phrase table") only if
they're not present yet. You end up with a single table without
duplicates. Added background phrases can
Hi Marcin,
If a sentence-level BLEU does the job for you (rather than corpus
-level), then check out the `sentence-bleu-nbest` tool in Moses. This
tool worked for me a couple of months ago, and I hope that nobody broke
it in the meantime.
Once you have sentence-level BLEU scores for all the
Hi,
mgiza can be configured to write a Model 1 file to disk.
Use the configuration option "model1dumpfrequency".
https://web.archive.org/web/20150919195919/http://www.kyloo.net/software/doku.php/mgiza:configure
Cheers,
Matthias
On Mon, 2017-02-13 at 16:50 +, Hieu Hoang wrote:
> the slide
Hi,
Maybe your moses.ini lets the decoder expect five input factors, wherea
s there are only four present in the data?
I see this in your log file:
input-factors: 0 1 2 3 4
Cheers,
Matthias
On Tue, 2016-12-06 at 11:18 +0200, Hasan Sait ARSLAN wrote:
> Hi,
>
> I have a factored
Hi,
In the EMS configuration file, you can specify
decoder-settings = "..."
under both [TUNING] and [EVALUATION]. Maybe that's all you need?
Cheers,
Matthias
On Tue, 2016-08-23 at 00:40 +0100, Hieu Hoang wrote:
> no really sure what you mean. Shouldn't have to dig around mert
>
> during
> the tuning process that only the forms appear.
>
> Best regards,
>
> Carlos
>
> 2016-04-28 20:14 GMT+02:00 Matthias Huck <mh...@cis.lmu.de>:
>
> > Hi,
> >
> > Moses can be configured to output the target-side factors of your
> > c
Hi,
Moses can be configured to output the target-side factors of your choice.
Add something like this to your moses.ini:
[output-factors]
0
1
2
Cheers,
Matthias
On Thu, 2016-04-28 at 18:16 +0200, Carlos Escolano wrote:
> Hi,
>
> Thank you for your answer.
>
> You are right. While the
Hi Despina,
It seems to me that bjam doesn't use the boost build in your home
directory, but some other boost version installed on the system.
Maybe you should try
./bjam --with-boost=/home/despina/boost_1_55_0 -j4 -a
Cheers,
Matthias
On Fri, 2016-03-11 at 16:29 +, Hieu Hoang
Hi,
We once empirically compared two different recombination schemes in a
hierarchical phrase-based system (without any kind of neural network
language model):
Recombination T. The T recombination scheme recombines derivations that
produce identical translations. (I.e., hypotheses with the same
Hi Jasneet,
Why don't you use a proper profiling tool, e.g. the one in valgrind [1]?
If you visualize its output [2], you'll see quickly where the program
spends all the computing time.
Cheers,
Matthias
[1] http://valgrind.org/docs/manual/cl-manual.html
[2]
Hi,
You can set a local verbosity level for your feature function, e.g.:
CoarseBiLM name=CoarseBiLM100 verbosity=
If you use the macros FEATUREVERBOSE(level,str),
FEATUREVERBOSE2(level,str), or IFFEATUREVERBOSE(level) in your feature
function code, the verbose output will only be
igured, it should tell
you about it. But maybe not with a segmentation fault. :-)
> On 29 Jan 2016 9:15 pm, "Matthias Huck" <mh...@inf.ed.ac.uk> wrote:
> > Hi,
> >
> > It seems to me that this toy string-to-tree setup is either
> > outdated,
> > or it a
Hi,
It seems to me that this toy string-to-tree setup is either outdated,
or it always had issues. It should be replaced.
Under real-world conditions, the decoder should always be able to
produce some hypothesis. We would therefore usually extract a whole set
of glue rules. And we would
Hi,
I believe that the "~" might be the culprit. Try:
./bjam
--with-irstlm=/home/mty2015/Public/MTEngine/Moseshome/mosesdecoder/irstlm
(If this is the correct absolute path to your IRSTLM installation.)
Cheers,
Matthias
On Wed, 2016-01-20 at 00:32 +, Hieu Hoang wrote:
> it's
Hi Liang,
mteval-v13a.pl does some internal tokenization and probably splits those
"~~" words into " ~ ~ ". If this is happening,
it explains your difference in the calculated BLEU scores.
Cheers,
Matthias
On Mon, 2016-01-18 at 17:01 +0800, 姚亮 wrote:
> Dear Moses Support Team,
>
>I
Hi,
Have you tried to use an absolute path?
Cheers,
Matthias
On Mon, 2016-01-18 at 02:52 +0100, Ouafa Benterki wrote:
> Hello,
>
> I installed IRSTLM but when i used the command
> ./bjam --with-irstlm=/path to irstlm/ the installation failed
> can you advise
>
> Best
--
The University of
Hi,
If you don't need all score components of a phrase table, the easiest
way to get rid of them is to set the scaling factors for the undesired
phrase table feature function components to 0 before tuning, and ask
the optimizer to ignore them. The feature function configuration
parameter
So, what has been the proper solution?
On Fri, 2016-01-08 at 13:20 -0500, Nicholas Ruiz wrote:
> Thanks everyone, it's working now.
>
> zınɹ ʞɔıu
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Hi Nick,
What you're attempting to do should generally be no problem. There's
most likely some issue with your EMS configuration file. Doesn't it tell
you something like:
BUGGY CONFIG LINE (474): wrapping-frame = $tokenized-input
I get this when I put two spaces between
Hmm, maybe it can also cause trouble with the reuse of parts from
previous steps if the user doesn't proceed with care.
You could overwrite steps/1/config.1 on a call of experiment.perl
-config config.1 -continue 1 -exec .
On Fri, 2016-01-08 at 20:56 +, Matthias Huck wrote:
> Hi Phil
d behaviour
> down the road...
>
>
> -phi
>
> On Fri, Jan 8, 2016 at 1:24 PM, Matthias Huck <mh...@inf.ed.ac.uk>
> wrote:
> Hi Philipp,
>
> On Fri, 2016-01-08 at 13:17 -0500, Philipp Koehn wrote:
> > the comma
Hi Tom,
There used to be a freely available Chinese word segmenter provided by
the LDC as well. Unfortunately, things keep disappearing from the web.
https://web.archive.org/web/20130907032401/http://projects.ldc.upenn.edu/Chinese/LDC_ch.htm
For Arabic, I think that many academic research groups
Hi,
It's a problem that apparently occurs very rarely, and as Guy mentioned,
we were so far assuming that it's caused by a zlib bug.
However, the zlib bug was (to my knowledge) fixed in zlib v1.2.8.
This seems to be the bug fix:
Hi,
As an addendum:
You can try a manual workaround. Run gunzip on extract.o.sorted.gz and
do lexical-reordering-score on the resulting plain text file.
It might be inconvenient but would hopefully solve the issue.
Cheers,
Matthias
On Thu, 2015-12-17 at 17:44 +, Matthias Huck wrote
Hi Lane,
Well, you can find excellent descriptions of phrase-based decoding
algorithms in the literature, though possibly not all details of this
specific implementation.
I like this description:
R. Zens, and H. Ney. Improvements in Dynamic Programming Beam Search for
Phrase-based Statistical
t; using this option, which has now been fixed
>
> https://github.com/moses-smt/mosesdecoder/commit/72bef00781de9821f2cff227ca7417939041d4e1
>
>
> On 04/10/2015 23:25, Matthias Huck wrote
Hi Yuqi,
You can build a debug compile by calling bjam with:
--variant=debug
Cheers,
Matthias
On Sun, 2015-10-04 at 23:05 +0200, Yuqi Zhang wrote:
> Hello,
>
>
> How can I debug the decoder?
>
>
> Must I turn off the pre-compile signal "WITH_THREADS"?
> Can it be turned off?
Hi,
The Hindi-English language pair was part of the WMT shared translation
task in 2014. See the following website for download links of training
data and dev/test sets:
http://www.statmt.org/wmt14/translation-task.html
Cheers,
Matthias
On Sun, 2015-09-27 at 20:15 +0530, nakul sharma wrote:
>
Hi Vincent,
Pruning the phrase table will discard many bad entries.
The decoder is typically configured to load no more than a maximum
number of translation options per distinct source side. Use
table-limit=20 as a parameter to your translation model feature to limit
the amount of candidates to
ink it works. The decoder does this, not the phrase table binarizer.
You could run a simple experiments in order to verify. Add
-feature-overwrite 'TranslationModel0 table-limit=20' (or equivalent) to
your decoder call.
Cheers,
Matthias
> Le 24/09/2015 15:21, Matthias Huck a écrit :
> > Hi Vi
Hi Vincent,
This is a different topic, and I'm not completely clear about what
exactly you did here. Did you decode the source side of the parallel
training data, conduct sentence selection by applying a threshold on the
decoder score, and extract a new phrase table from the selected fraction
of
Hi Vincent,
On Thu, 2015-09-24 at 22:37 +0200, Vincent Nguyen wrote:
> Thanks Matthias for the detailed explanation.
> I think I have most of it in mind except not really understanding how
> this one works :
>
> "Difficult sentences generally have worse model score than easy ones but
> may
Hi Asad,
You can try Hunalign or the Microsoft Bilingual Sentence Aligner (if
it's for non-commercial purposes).
Cheers,
Matthias
On Sun, 2015-09-06 at 10:24 +, Asad A.Malik wrote:
> Hi All,
>
>
> I am currently trying to develop the parallel corpus. I wanted to know
> is there any tool
Hi Liling,
This tool calculates sentence-level BLEU scores (smoothed via
incrementing the n-gram counts by 1):
bin/sentence-bleu
Make sure that you provide the hypothesis and reference files in an
appropriately processed way. The tool doesn't apply any tokenization or
remove any markup
Hi,
I found this older tutorial to be very useful as well:
Practical Domain Adaptation by Marcello Federico and Nicola Bertoldi
http://www.mt-archive.info/10/AMTA-2012-Bertoldi-ppt.pdf
(The document formatting is unfortunately slightly messed up.)
SMT research survey wiki:
On Fri, 2015-07-17 at 09:08 +0400, Hieu Hoang wrote:
the OnDisk pt can do everything - sparse features, properties, hiero
models. it's just slow and big
i think the old Binary pt did sparse features but not properties, the
Compact pt does neither
Ah, I guess that explains why it didn't
contains sparse features, then this needs to be flagged in the
configuration file by adding the word sparse after the phrase table
file name.. Did i miss anything?
Regards,
Jian
On Thu, Jul 16, 2015 at 3:23 AM, Matthias Huck mh...@inf.ed.ac.uk
wrote:
Hi Jian
functions,
I'd like to know are there any difference between these two options,
for example, tuning, compute sentence translation scores ...
Regards,
Jian
On Thu, Jul 16, 2015 at 2:06 AM, Matthias Huck mh...@inf.ed.ac.uk
wrote:
Hi,
Are you planning
Hi,
Are you planning to use binary domain indicator features? I'm not sure
whether a sparse feature function for this is currently implemented. If
you're working with a small set of domains, you can employ dense
indicators instead (domain-features = indicator in EMS). You'll have
to re-extract
Hi Hieu,
That should be no problem. Pretty sure I did that a couple of times
already. No need to add another [INTERPOLATED-LM] section. Just try!
Cheers,
Matthias
On Sun, 2015-06-28 at 10:55 +0400, Hieu Hoang wrote:
in the EMS, is it possible to create interpolated LM for different
factors?
Hi James,
Irrespective of the fact that you need to tune the weights of the
log-linear model:
Let me provide more references in order to shed light on how well
established simple pruning techniques are in our field as well as in
related fields (namely, automatic speech recognition).
This list
like to know which terminals (non terminals) are
corresponded to which source word's index in the source. Could you
guide me how to obtain that?
Thanks again
On Thu, Jun 18, 2015 at 9:48 PM, Matthias Huck mh...@inf.ed.ac.uk
wrote:
Hi,
You can calculate
From: Matthias Huck mh...@inf.ed.ac.uk
Sent: Friday, June 19, 2015 5:08 PM
To: Read, James C
Cc: Hieu Hoang; moses-support@mit.edu; Arnold, Doug
Subject: Re: [Moses-support] Major bug found in Moses
Hi James,
Yes, he just said that.
The decoder's job is to find
Hi Evgeny,
If setting TRAINING:config won't help, then it might get a bit tricky.
Another thing you can try is setting filtered-config or filtered-dir in
the [TUNING] section.
The next workaround I can think of is pointing to existing files in all
the [CORPUS:*] sections by setting
Hi,
If your system output is lowercase, you could try SRILM's `disambig`
tool for predicting the correct casing in a postprocessing step.
http://www.speech.sri.com/projects/srilm/manpages/disambig.1.html
Cheers,
Matthias
On Fri, 2015-05-22 at 11:20 +0200, Ondrej Bojar wrote:
Hi,
we also
Oh, are there two ways of doing this?
I use config-with-reused-weights rather than weight-config.
On Wed, 2015-05-20 at 15:11 -0400, Philipp Koehn wrote:
Hi,
you can point to the previous configuration file with the old weights:
[TUNING]
### instead of tuning with this setting, old
Hi Graham,
Did you have a look at the tarballs that were distributed last year?
http://www.statmt.org/wmt14/translation-task.html
There are three different version:
- Test sets (5.2 MB) These are the source sgm files with extra filler
sentences. They were the actual files released for the
Hi,
Is there possibly a problem when continuing interrupted tuning runs with
sparse features?
It seems to me that mert-moses.pl doesn't add the [weight-file] section
to the run*.moses.ini it creates right after resuming the tuning. That
would imply that no sparse weights are used in the next
2015 at 01:26, Matthias Huck mh...@inf.ed.ac.uk wrote:
I think your remark in the mail from January was correct, it
has to be
ePos-sPos+1 backoff
but currently still is
ePos-sPos+1 = backoff
Are you able to somehow
Hi Lexi,
The feature most likely won't be particularly important.
But this might be a completely different issue than you think. You
should debug this. Can you print the phrase pair that is applied when
the error occurs?
I recently came across a segfault that seemed to be caused by the OSM
Hi Hieu,
It seems that [decoding-graph-backoff] doesn't quite behave like last
year any more. Can you briefly explain how its behaviour has changed,
i.e. what it did before and what it does now? Can you please also let me
know whether there's a way to reproduce the old behaviour via
configuration
.
On Thu, 2015-04-16 at 21:34 +0400, Hieu Hoang wrote:
Didn't know it has changed. How should it behave and how does it
actually behave?
On 16 Apr 2015 21:04, Matthias Huck mh...@inf.ed.ac.uk wrote:
Hi Hieu,
It seems that [decoding-graph-backoff] doesn't quite behave
:
Matthias Huck, Joern Wuebker, Felix Rietig, and Hermann Ney.
A Phrase Orientation Model for Hierarchical Machine Translation.
In ACL 2013 Eighth Workshop on Statistical Machine Translation (WMT 2013),
pages 452-463, Sofia, Bulgaria, August 2013.
I don't know if the usage of the feature
Hi,
I'm looking for a tool to rerank n-best lists in Moses' current format,
including sparse features. The CSLM toolkit has quite a nice re-ranker
implementation, but apparently it doesn't know sparse features yet.
If anyone already has an extended version of the existing re-ranker from
the CSLM
and Moses on-disk
phrase tables (and obviously neural networks).
Why not adding more functionality ...
- Holger
On 03/27/2015 11:42 PM, Matthias Huck wrote:
Hi,
I'm looking for a tool to rerank n-best lists in Moses' current format,
including sparse features. The CSLM toolkit has
Hi,
I've recently been using these sparse feature functions without any
issues in multi-threaded chart-based decoding. There might be a problem
with thread safety, but I currently can't tell why you got the
segmentation fault. You should investigate this in more detail.
Cheers,
Matthias
On
Hi,
Some pre-trained models for Moses Release 3.0 have been made publicly
available anyway:
http://www.statmt.org/moses/RELEASE-3.0/models/
http://www.statmt.org/moses/?n=moses.releases
http://www.statmt.org/mosescore/uploads/Internal/D1.4_Moses_v3_Release_Notes.pdf
I can't tell whether you're
On Sat, 2015-02-28 at 16:45 +, Hieu Hoang wrote:
i've never seen the phrase-based n-best extraction explicitly
described. There was a paper on directed graph enumeration (I forget
which) that was helpful t me when I was implementing it.
Maybe this?
Hart, P., Nilsson, N., and Raphael,
On Sat, 2015-02-28 at 17:11 +, Matthias Huck wrote:
On Sat, 2015-02-28 at 16:45 +, Hieu Hoang wrote:
i've never seen the phrase-based n-best extraction explicitly
described. There was a paper on directed graph enumeration (I forget
which) that was helpful t me when I
somewhere between 2.1 and 3.0, the keyword 'distinct' was
Oops, that was me. And it wasn't intended. I'm using this for my own
setups and apparently copied it to master when I added some other stuff.
Hope I didn't mess up other people's experiments. It's been in master
since 7 August 2014
Set a higher weight for UnknownWordPenalty? Maybe the default is not
adequate if you do strange things like this.
On Tue, 2015-02-24 at 23:49 +0100, Marcin Junczys-Dowmunt wrote:
Hi,
I have a problem with a single score phrase table. All scores have been
combined into one score as a linear
Hi Hayo,
Can you please do two things:
1.) Send me the file filtered/moses.ini so that I can have a look at the
feature functions and scaling factors in there.
2.) Tell me the Git commit ID of the Moses version you're working with.
A bug was put into master with commit 70e8eb5. It's been fixed
-23 at 18:37 +, Matthias Huck wrote:
On Fri, 2015-01-23 at 18:18 +, Hieu Hoang wrote:
True, but that complicates the framework, and doesn't deal with sparse
features.
Why does it complicate the framework? Isn't the trick about tuneable
mostly that you don't write those scores
Hi,
As Rico pointed out before: the glue rules are missing.
Cheers,
Matthias
On Sun, 2015-01-25 at 03:25 +0800, hxshi wrote:
I can't get any output with my syntactic baseline. Will anybody know
what maybe wrong?
I trained a string2tree baseline. Got a rule-table such like this:
%
, the write another (tuneable)
ff which grabs whatever scores it wants from the pt
On 23 January 2015 16:55:56 GMT+00:00, Matthias Huck
mh...@inf.ed.ac.uk wrote:
Hi,
Is there any existing functionality to set only specific score
components of a feature function
Hi,
Is there any existing functionality to set only specific score
components of a feature function as untuneable?
Feature functions have a boolean tuneable parameter, but it affects
all the scores produced by it. It doesn't help in case I want to switch
off individual scores from a phrase
for. And maybe somebody on the mailing
list has implemented this and never put it into master?
I want it for MIRA, btw.
I think it should be added if it doesn't exist somewhere yet. Unless
someone has strong objections.
On 23 January 2015 18:09:11 GMT+00:00, Matthias Huck
mh...@inf.ed.ac.uk wrote
PlusEquals() before already, then you don't have to modify
anything.
Cheers,
Matthias
Regards
On Friday, January 23, 2015 12:11 AM, Matthias Huck
mh...@inf.ed.ac.uk wrote:
Hi Arefeh,
Can you try to run the setup from the cluster on your local desktop
system? With the same input
Hi Arefeh,
Can you try to run the setup from the cluster on your local desktop
system? With the same input, a Moses binary compiled from the same
sources, and the same command to produce the n-best lists? Normally it
should give you the same output.
Why would the feature never produce an overall
Hi Roee,
I would be very surprised if each and every Moses feature is described
somewhere. But Moses is generally very well documented, and you find all
the information you need for building a state-of-the-art baseline system
on the website [http://www.statmt.org/moses/] and in the manual
Hi,
The data is sentence-segmented.
Assume you train your model with a training corpus which contains a
single parallel sentence pair. Your training sentence has length L on
both source and target side, and it's aligned along the diagonal.
If n L, you cannot extract any phrase of length n
We typically try to increase the tuning set in order to obtain more
reliable sparse feature weights. But in your case it's rather the test
set that seems a bit small for trusting the BLEU scores.
Do the sparse features give you any large improvement on the tuning set?
On Thu, 2015-01-15 at
On Thu, 2015-01-15 at 13:54 +0800, HOANG Cong Duy Vu wrote:
- tune test
(based on source)
size of overlap set = 624
(based on target)
size of overlap set = 386
(tune test have high overlapping parts based on source sentences,
but half of them have different target sentences)
Does
Hi,
I've just pushed a commit to Moses that brings about a slight change
wrt. the way the chart decoder deals with feature scores.
The chart decoder now stores deltas of individual feature scores instead
of constantly summing everything up. This behaviour is similar to what
we have been doing
Hi Marcin,
I don't quite understand why this is a problem. But if you're looking
for alternative implementations for word alignment symmetrization:
The Jane toolkit includes a program called `mergeAlignment`.
It should be able to read Moses format alignments.
Hi Amir,
The input is passed to the feature functions via
InitializeForInput(InputType const source).
This method is called before search and collecting of translation
options (cf. moses/FF/FeatureFunction.h). You can set a member variable
to have access to the input in your scoring method.
$ your-moses-directory/scripts/ems/experiment.perl -config config.toy -exec
On Sat, 2014-11-29 at 15:00 +, Asad A.Malik wrote:
Hi All,
How can I Run experiment.perl -config config.toy -exec.
When I type following command:
$ run experiment.perl -config
Hi,
If this happens in scripts/training/clean-corpus-n.perl then you should
check whether a parallel corpus with the same number of lines on source
and target side is passed to that script. Maybe there's an issue with
your training data or something went wrong in a previous step of the
-
Von: Vera Aleksic, Linguatec GmbH
Gesendet: Donnerstag, 27. November 2014 09:42
An: 'Matthias Huck'; Raj Dabre
Betreff: AW: [Moses-support] Unknown single words that are part of phrases
Hi,
Thank you for your answers.
@Raj, one-word-translations do not exist, I have searched for them
, Matthias Huck wrote:
Hi Vera,
It's odd that the lexical translation model contains such an entry if
the pair is always unaligned. Maybe you used a different word alignment
when you extracted the lexicon model?
You should manually have a look at your word alignment in order to check
Hi,
Supposedly your phrase table does not contain an entry Gitarre |||
guitar because this word pair is always unaligned in your training
data. You could try to improve your word alignment quality.
Alternatively, you could implement a procedure in the manner of the
forced single word heuristic
Hi Arefeh,
Have you been able to resolve that issue? Maybe one of your GIZA
alignments is flawed, for instance because the GIZA process was
terminated before is finished. Did you check that both the standard and
the inverse alignment files have the same number of lines?
Check it like this:
$
distinct n-best lists with at most
100 items and they seem to be a little bit better filled than with -sd 0.
With cube pruning, -cbd some_number does not seem to do anything, I
also tried to increase the pop limit with no success.
Best,
Marcin
W dniu 05.06.2014 19:17, Matthias Huck pisze
Hi Gideon,
I still tend to believe that there's some issue with your preprocessing.
Or maybe there's a mismatch in the way you preprocessed your training
and test data? The OOV rates on MT06 and MT08 are very low in the
systems built by us at RWTH (cf. the numbers I sent you as a reply to
your
:08, Matthias Huck wrote:
Hi Lars,
The instructions you're looking for are here:
http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel
You can also create a KenLM binary file instead and use it in the
decoder with the KENLM line in the [feature] section of your
Hi Mauro,
The weights for LM1 and LM2 are missing in your config file. You need to
add them in the [weight] section.
# core weights
[weight]
Distortion0= 0.3
UnknownWordPenalty0= 1
WordPenalty0= -1
TranslationModel0= 0.2 0.2 0.2 0.2
PhrasePenalty0= 0.2
LexicalReordering0= 0.3 0.3 0.3 0.3 0.3
Hi Lars,
The instructions you're looking for are here:
http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel
You can also create a KenLM binary file instead and use it in the
decoder with the KENLM line in the [feature] section of your moses.ini.
$ kenlm/build_binary
Hi Arefeh,
You could intersect the standard and inverse alignments from GIZA
instead of applying the grow-diag-final-and heuristic. This will
typically impair translation quality, though.
Cheers,
Matthias
On Sat, 2014-02-15 at 14:18 -0800, Arefeh Kazemi wrote:
Hello all,
I am using Moses
Hi Per,
The standard workflow is to run a postprocessing step on the output,
e.g. with scripts/tokenizer/detokenizer.perl in Moses.
Usage ./detokenizer.perl (-l [en|fr|it|cs|...]) tokenizedfile
detokenizedfile
Options:
-u ... uppercase the first char in the final sentence.
-q ...
It's called Cairo:
Cairo: An Alignment Visualization Tool. Noah A. Smith and Michael E.
Jahr. In Proceedings of the Language Resources and Evaluation Conference
(LREC 2000), pages 549–552, Athens, Greece, May/June 2000.
http://www.cs.cmu.edu/~nasmith/papers/smith+jahr.lrec00.pdf
Hi James,
There has been a vast literature on adaptation techniques for SMT in
recent years.
Some reading suggestions:
http://www.statmt.org/wmt07/pdf/WMT17.pdf
http://www.statmt.org/wmt09/pdf/WMT-0932.pdf
http://dl.acm.org/citation.cfm?id=1870702
Hi James,
the Phrasal toolkit is freely available as well
[http://nlp.stanford.edu/phrasal/],
so why don't you consider extracting discontinuous phrases using Stanford's
original implementation?
Cheers,
Matthias
On Tue, 2013-11-05 at 07:29 +, Read, James C wrote:
Interesting.
This
...@mit.edu [moses-support-boun...@mit.edu] on behalf
of Matthias Huck [mh...@inf.ed.ac.uk]
Sent: 05 November 2013 13:15
To: moses-support@mit.edu
Subject: Re: [Moses-support] gappy phrases
Hi James,
the Phrasal toolkit is freely available as well
[http://nlp.stanford.edu/phrasal/],
so
Hi James,
I tried Phrasal Beta2 and Beta3 a couple of months ago. Both worked for
me with some minor hassle. You should follow the instructions from
http://www-nlp.stanford.edu/wiki/Software/Phrasal#Phrasal in order to
set up your environment.
I'm also sure that the Phrasal developers are able
Hi,
RWTH Aachen University implemented extraction of discontinuous phrases
and decoding with source-side gaps in the Jane toolkit
[www.hltpr.rwth-aachen.de/jane/].
We did not see any clear improvements over standard phrase-based setups
in our experiments, though.
Some results were published in
98 matches
Mail list logo