Re: [Moses-support] liboolm.a: could not read symbols: File format not recognized - Moses compile error

2007-11-28 Thread Chris Dyer
Hi Brianna, Sorry you're having troubles. This issue is almost certainly due to the fact that you built SRILM on another architecture. If you rebuild it on the same machine as you're building moses on, this should solve the problem. Chris On 29 Nov 2007 14:17:31 +1100, Brianna Laugher <[EMAIL PR

Re: [Moses-support] NaN into the alignment phase

2007-12-15 Thread Chris Dyer
Hi Marco, I happen to be up late tonight debugging this very same problem. What are the odds? Here's what I know so far: 1) Once you hit this problem, you're never going to recover, so it's good to put in an exit(1) in GIZA when you've detected it. 2) I think this has to do with a numerical und

Re: [Moses-support] learning separate model with new data

2008-01-11 Thread Chris Dyer
Hi Jaakko- GIZA computes word alignments by starting with an essentially random model and then successively making changes so that it more closely predicts the patterns in the training data you've provided by using the old model to predict alignments of the training data. It would be very easy to

Re: [Moses-support] GIZA error

2008-01-17 Thread Chris Dyer
You'll see this error if you build GIZA++ without -DBINARY_SEARCH_FOR_TTABLE . It's harmless, although it means GIZA will use a bit more memory than it would otherwise. Chris 2008/1/17 menor bangget <[EMAIL PROTECTED]>: > Hi all, > > I ran GIZA, and I found error about "parameter 'coocurrencefil

Re: [Moses-support] How to train faster?

2008-01-19 Thread Chris Dyer
Hi, > 1. Recently I've been trying to train my corpus on my dual core processor. > But it seemed that my system only used half the resource. On the process > manager, the CPU only loaded 50% of its full capacity. > > So, can anybody tell me how to maximize my system during training the > GIZA, ru

Re: [Moses-support] GIZA error

2008-01-22 Thread Chris Dyer
r*, const std::vector std::allocator >&, const std::vector std::allocator >&, bool) const' must be introduced by > 'template <>' > TTables.cc:40: error: template-id 'printProbTable<>' for 'void > tmodel::printProbTable(const char*, const >

Re: [Moses-support] GIZA error

2008-01-22 Thread Chris Dyer
that a more recent version was made > available in October. Would it be worth me getting the newer version, and > if so, are the installation steps the same as described in > www.statmt.org/wmt07/baseline.html? > Thank you, > Llio > > > On Jan 22, 2008 5:38 PM, Chris D

Re: [Moses-support] GIZA error

2008-01-25 Thread Chris Dyer
.org/wmt07/baseline.html, and I have followed the file > > hierarchy in those instructions. I'm experiencing problems with the > > following command: > > ./regenerate-makefiles.sh > > The final message is 'automake failed'. My automake version is automake &

Re: [Moses-support] ERROR2: nan in HMM training

2008-01-28 Thread Chris Dyer
I'm reasonably confident that this is an underflow error caused by multiplying a bunch of small probabilities that eventually end up at zero when the forward/backward probs are computed for each sentence pair. There is a "real" solution to this, which I might? implement eventually, but for now, al

Re: [Moses-support] GIZA++ nan errors

2008-01-29 Thread Chris Dyer
What phase of GIZA training is this occuring in? GIZA runs several iterations in several stages, Model 1, HMM, Model 3, Model 4, etc. And what is the very first sign of trouble you see? Are there any errors before this? This problem generally means that you're trying to model something that GIZA

Re: [Moses-support] Minor GIZA bug

2008-02-15 Thread Chris Dyer
Hi Qin- Thanks for letting me know about this problem. I'll submit your recommended fix. I'm not completely familiar with the GIZA implementation of the HMM model, but this seems reasonable enough. Chris On Fri, Feb 15, 2008 at 4:40 PM, Qin Gao <[EMAIL PROTECTED]> wrote: > Hi All, > > I found

Re: [Moses-support] [Fwd: Run mert-moses.pl with confusion network]

2008-02-20 Thread Chris Dyer
Also, if you are using general lattices (as opposed to regular confusion networks) as input, you should update to the latest version of the decoder from Subversion, since I checked in a fairly crucial bug fix yesterday. Chris On Wed, Feb 20, 2008 at 4:37 PM, Chris Dyer <[EMAIL PROTECTED]>

Re: [Moses-support] [Fwd: Run mert-moses.pl with confusion network]

2008-02-20 Thread Chris Dyer
The lattice format isn't documented yet on the webpage, but you can see some examples of it in the lattice-distortion test directory Hieu mentions. It should be fairly straightforward to decipher. Since this format encodes a single lattice/CN per line of text, it can be used easily with MER train

Re: [Moses-support] [Fwd: Run mert-moses.pl with confusion network]

2008-02-25 Thread Chris Dyer
-log(p) -- you can change this in WordLattice.cpp if you want to deal with more conventional costs, but the rest of the inputs to the decoder are given as probabilities so I wanted to be consistent). If you want a null transition, set the arc label to '*eps*' and the decoder will treat this as

Re: [Moses-support] [Fwd: Run mert-moses.pl with confusion network]

2008-02-27 Thread Chris Dyer
input-output object : [0.000] seconds > read confusion net with format 0 > End. : [0.000] seconds > confusion net statistics: > created: 1 > destroyed: 1 > succ. read:0 > columns: 0 > words: 0 > avg. word/column: nan > avg. col

Re: [Moses-support] Giza HMM errors - NAN

2008-02-28 Thread Chris Dyer
I haven't looked into what's causing the particular problem on this corpus, but another known problem with the GIZA HMM model is that it doesn't do a fairly standard kind of normalization in the forward-backward training, which causes underflow errors in some sentences (especially quite long ones),

Re: [Moses-support] lowercasing/recasing

2008-03-05 Thread Chris Dyer
There have been some advocates of preserving case information as you describe, although I've only seen them discussed in the context of small-coverage systems, such as in the IWSLT task. See, for example, the system description of the Carnegie Mellon Univ system from 2006's IWSLT entry: http://w

Re: [Moses-support] lowercasing/recasing

2008-03-05 Thread Chris Dyer
> Faced with improper input, would it not make more sense to try and "fix > it" in the source language before translation, rather than distorting > the translation with the induced errors, then trying to fix the > translation ? That could be an interesting experiment. The transformation one p

Re: [Moses-support] Required number of sentences for experimental system

2008-03-14 Thread Chris Dyer
Hi Christine-- The answer to "How much data?" always tends to be "More". If the genres line up, I've seen very good systems built from as few as 20k parallel sentences, for some languages at least. On the other hand, some language pairs require much more training data to attain reasonable perform

Re: [Moses-support] Giza HMM errors - NAN

2008-03-25 Thread Chris Dyer
:02 AM, John D. Burger <[EMAIL PROTECTED]> wrote: > Chris Dyer wrote: > > > I haven't looked into what's causing the particular problem on this > > corpus, but another known problem with the GIZA HMM model is that it > > doesn't do a fairly stand

Re: [Moses-support] Learning weight for a new feature in moses MERT training

2008-04-10 Thread Chris Dyer
Hi Jason-This sounds like an interesting feature. I'm not too familiar with Moses's MERT architecture, but you may just need to update IOStream::OutputNBestList to write the (unweighted) feature value for each hypothesis in the n-best list at the appropriate position (ie, after g, and before tm),

Re: [Moses-support] maximum Phrase length

2008-04-28 Thread Chris Dyer
Yes, the maximum phrase length is the maximum number of tokens in the foreign language side of an entry in the translation model. (There is no limit to the number of tokens that the target phrase may contain.) Chris On Mon, Apr 28, 2008 at 8:26 PM, marco turchi <[EMAIL PROTECTED]> wrote: > Dear e

Re: [Moses-support] Training Language model SRILM europarl v3

2008-05-01 Thread Chris Dyer
It sounds like you may be exceeding your machine's physical memory, so the OS is using virtual memory, which can make things quite slow. You can use the 'top' command to watch the process size of ngram-count and make sure you don't get into the situation. Chris On Thu, May 1, 2008 at 5:59 PM, Sa

Re: [Moses-support] confusion network vs. 1-best

2008-05-12 Thread Chris Dyer
Hi Hu- What tool are you using to generate the confusion networks? If you use SRILM's lattice-tool, you should make sure to set the LM weight and AM weights to something appropriate. Chris On Sun, May 11, 2008 at 11:33 PM, Hu Xiaoguang <[EMAIL PROTECTED]> wrote: > hi, all > > I did some experime

Re: [Moses-support] confusion network vs. 1-best

2008-05-12 Thread Chris Dyer
you can use the following options: -htk-acscale ASCALE -htk-lmscale LMSCALE -htk-wdpenalty WDPENALTY On Mon, May 12, 2008 at 10:00 AM, Hu Xiaoguang <[EMAIL PROTECTED]> wrote: > Hi Chris, > > Yes, I use the SRILM's lattice-tool to convert the word lattice into > confusion network, > I'm not ver

Re: [Moses-support] GIZA options - IBM Models

2008-05-16 Thread Chris Dyer
You can use the --giza-option to tell GIZA to use a different model, but I don't believe the moses scripts can be told look for the nonstandard file that GIZA will generate in this case (they are named differently for the kind of model used). But, this wouldn't be hard to make this configurable or

Re: [Moses-support] GIZA options - IBM Models

2008-05-17 Thread Chris Dyer
> > You can use the --giza-option to tell GIZA to use a different model, > > but I don't believe the moses scripts can be told look for the > > nonstandard file that GIZA will generate in this case (they are named > > differently for the kind of model used). But, this wouldn't be hard > > to

Re: [Moses-support] Cannot make GIZA++ executable

2008-07-07 Thread Chris Dyer
Some versions of the g++ compiler (3.4.x I think) have a bug that prevents this code from compiling properly. Working around the bug is not straightforward, so trying a different version of the compiler is probably your best bet. Chris On Mon, Jul 7, 2008 at 10:00 PM, Vineet Kashyap <[EMAIL PROT

Re: [Moses-support] giza question

2008-07-16 Thread Chris Dyer
t; > This is more than one obviously... > > Regards, > Sanne > > > > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Dyer > Sent: woensdag 9 juli 2008 18:40 > To: [EMAIL PROTECTED] > Subject: Re: [Moses-support] g

Re: [Moses-support] Non-deterministic GIZA?

2008-07-16 Thread Chris Dyer
There's been a recent release of GIZA (July 8) that fixes some potential sources of non-determinism, specifically relating to how distortion models (model 2 or the HMM) get initialized. When did you download it from http://code.google.com/p/giza-pp/ ? --Chris On Wed, Jul 16, 2008 at 6:35 PM, Joh

Re: [Moses-support] Giza + Phrase extraction

2008-07-23 Thread Chris Dyer
> There are also various post hoc approaches to removing noise from > phrases tables and alignments. Some recent examples: > http://aclweb.org/anthology-new/D/D07/D07-1103.pdf > http://aclweb.org/anthology-new/W/W08/W08-0306.pdf > > Although there's nothing like this included in Moses, it would be

Re: [Moses-support] Giza + Phrase extraction

2008-07-23 Thread Chris Dyer
son et al. 2007) a default technique in Moses? > how is the threshold set? > > thanks a lot > Marco > > > On Wed, Jul 23, 2008 at 6:35 PM, Chris Dyer <[EMAIL PROTECTED]> wrote: >> >> > There are also various post hoc approaches to removing noise from >> &

Re: [Moses-support] Seemingly buggy behavior when specifying

2008-08-06 Thread Chris Dyer
Usually when -inf shows up in moses, it's because someone took the log of a probability that was zero and then didn't floor the score. It looks like the lexRO code does both properly, but do any of the entries in your lex RO table have zeros? 2008/8/7 Jason Katz-Brown <[EMAIL PROTECTED]>: > Hi ag

Re: [Moses-support] Seemingly buggy behavior when specifying

2008-08-07 Thread Chris Dyer
e of those slots. 2008/8/7 Jason Katz-Brown <[EMAIL PROTECTED]>: > By the way, I should have noted that most of my confusion is how any > feature could have nonzero value in a translation hypothesis if its > weight is zero. Or is this possible? > > Thanks again, > Jason > &g

Re: [Moses-support] giza model 1 uniform initialization

2008-08-10 Thread Chris Dyer
normalizeTable > > I use a more straightforward method (below) which yields in different > output, could someone elaborate? > > > > Mycode: > > Probability(f|e) = Count(e,f) / count (e) > The code that normalizes the counts to probabilities is in TTable::normalizeTable. But, for the initializati

Re: [Moses-support] train-factored-phrase-model.perl Couldn't find factor 0 in token

2008-08-19 Thread Chris Dyer
That usually means that you have the '|' symbol in your input data. One way of avoiding this is using the --factor-delimiter option and setting it to some random string like +++! . You can also remove the | from your training data. Chris On Tue, Aug 19, 2008 at 8:27 AM, jide Otuyelu <[EMAIL PROT

Re: [Moses-support] How to solve? make GIZA++ TTables.cc:39 error: too few templ

2008-10-17 Thread Chris Dyer
Hi, There's a known bug with certain versions of g++ that GIZA++ hits. If feasible, you might switch to a different version (>4.0 works for sure). If not, send your machine architecture/OS type and perhaps someone will have a binary they can provide. Thanks, Chris On Fri, Oct 17, 2008 at 1:43 A

Re: [Moses-support] mert algorithm

2008-11-20 Thread Chris Dyer
The algorithm was first described in Och & Ney 2003, Minimum Error Training in Statistical MT: http://acl.ldc.upenn.edu/acl2003/main/pdfs/Och.pdf There's a newer paper that goes into some more detail as well: W. Macherey et al. (2008) Lattice-based minumum error rate training for stat. mt http://w

Re: [Moses-support] 'proper' conditioning in phrase extract

2008-11-21 Thread Chris Dyer
Hi Ondrej, See below. > And one additional question: when extracting phrases, phrase-extract actually > extracts all phrases that *are not incompatible* with the alignment. I'm > thinking about a different method: just phrases that *are 'strictly' > compatible*, which means I would extract: > > a=

Re: [Moses-support] Beam thresholding

2009-02-06 Thread Chris Dyer
One way to do it is to just set a really high number for the threshold. The maximum "ceiling" used by moses for a feature value is 100, and then pick the largest total sum that your feature weights can have, double it (since you may have negative feature values), and set that... There may be an e

Re: [Moses-support] Parallelising Giza++ for supercomputers

2009-02-20 Thread Chris Dyer
Another architecture to consider is storing/distributing the ttable from a single central repository. Most of the ttable is full of crap, and for each sentence, you know exactly what parameters will be required in advance of running your E step. However, by not distributing stuff that you don't n

Re: [Moses-support] Giza++ input tokens (templates)

2009-02-26 Thread Chris Dyer
> Do you think this is possible? Would Giza++ require massive > modifications to be able to align these kind of tokens? My gut feeling > was that a n-gram with a gap in (a template) is to all intents and > purposes just the same as an n-gram and so the algorithm should > perform with similar accura

Re: [Moses-support] Error in running moses with randlm

2009-03-04 Thread Chris Dyer
Yeah, sorry about this- I broke moses, at least for certain compilers. I'll fix it shortly. -Chris On Wed, Mar 4, 2009 at 12:17 PM, Miles Osborne wrote: > ok, it seems that the most recent version of Moses had a bad commit > and broke the language model interface.  so, this is not really > anyth

[Moses-support] New release of Giza++

2009-03-20 Thread Chris Dyer
Hi all- There's a new release of GIZA++ available from http://code.google.com/p/giza-pp/ . The changes address build issues on a variety of platforms and compilers, including: - better adherence to c++ header naming conventions (fixes build problems on gcc > 3.4) - autodetection of MacOSX (bas

Re: [Moses-support] Questions about GIZA++

2009-04-01 Thread Chris Dyer
> I am using GIZA++ to train my model. I have two questions regarding it: > > 1) How to set the Maximum sentence length? by default it is 100 but I have > sentences longer than that. How I can set it? Giza doesn't really support sentences longer than 100 words- you can change the limit in the code,

Re: [Moses-support] Error when run moses with lattices format as input

2009-04-16 Thread Chris Dyer
You need to add a -weight-i flag to the command line which specifies how much weighting to apply to the arc feature. e.g.: moses ... -weight-i 0.5 -Chris On Thu, Apr 16, 2009 at 9:58 AM, Nguyen Manh Hung wrote: > Hi, > > I'm using Moese to decode with lattices format as input. Also I make > la

Re: [Moses-support] Error when run moses with lattices format as input

2009-04-16 Thread Chris Dyer
Can you send me a stack trace for where the SEGV is happening? Once the phrase table has been binarized, there's no need to have any special temporary space. On Tue, Apr 28, 2009 at 10:46 AM, Nguyen Manh Hung wrote: > Chris Dyer さんは書きました: >> >> You need to add a -weight-i

Re: [Moses-support] Error when run moses with lattices format as input

2009-04-16 Thread Chris Dyer
Hung > > 2009-04-16 (木) の 11:34 -0400 に Chris Dyer さんは書きました: >> Can you send me a stack trace for where the SEGV is happening? Once >> the phrase table has been binarized, there's no need to have any >> special temporary space. >> >> On Tue, Apr 28, 2009 at

Re: [Moses-support] GIZA++ Configuration for HMM Alignment

2009-04-18 Thread Chris Dyer
There's actually an option for this in train-factored-phrase-model.perl. Just specify "--hmm" and it will automatically set the Giza++ options appropriately. -Chris 2009/4/18 Manoj C (మనోజ్ చిన్నకోట్ల) : > Hi All, > > I am training a standard translation model using moses (using the > train-facto

Re: [Moses-support] giza-pp in train-factored-phrase-model.perl script

2009-05-14 Thread Chris Dyer
These warnings aren't too serious. The alignments should be fine. Chris On Thu, May 14, 2009 at 8:56 PM, Tom Hoar wrote: > During the train-factored-phrase-model.perl script of my > character-by-character aligment, giza-- reported the errors below. > > Will this trained model be usable or do the

Re: [Moses-support] MERT - optimal weights out of specified ranges?

2009-06-04 Thread Chris Dyer
Hi Thang, The ranges that are specified for moses are just suggestions for random starting points for the MERT algorithm. However, it may (and often does) find weights that end up outside of these ranges. -Chris On Thu, Jun 4, 2009 at 2:32 AM, Thang Luong Minh wrote: > Dear experts, > > I notice

Re: [Moses-support] Fwd: alignment problem

2009-06-18 Thread Chris Dyer
The alignment models are going to struggle quite a bit when the source to target length ratio is so skewed. I would recommend finding a way to retokenize/resegment the source and/or target language so as to induce a more even ratio. If this isn't possible, you may need to look into custom alignme

Re: [Moses-support] Word lattice distortion cost

2009-07-20 Thread Chris Dyer
This is probably a problem with the regression test. The two conditions ought to be identical, as you expect. However, keep in mind that the distortion model is incredibly weak, and the heuristic distance definition used in lattice decoding is also just an approximation, so an off-by-one error isn

Re: [Moses-support] EM Model 1 Pseudocode

2009-07-23 Thread Chris Dyer
Yes, you should think of total_s(e) as total_s(j) where j is the index of the current e you're looking at. On Thu, Jul 23, 2009 at 7:23 PM, James Read wrote: > Hi, > > I've been looking at the pseudocode for Model 1 as provided in Koehn's > lecture notes online. I can't help noticing what seems to

Re: [Moses-support] output word graph format?

2009-08-23 Thread Chris Dyer
The output word graph is in HTK Standard Lattice Format (SLF). A number of tools (for example, SRILM's lattice-tool) read word graphs in this format and can do things with it. You ought to be able to find the path in the lattice corresponding to the decoder's best path. Chris On Sun, Aug 23, 20

Re: [Moses-support] Looking for text corpora

2009-09-06 Thread Chris Dyer
This was recently announced on the corpora list: http://www.uncorpora.org/ -Chris On Sun, Sep 6, 2009 at 1:36 PM, Catalin Braescu wrote: > Thanks, Miles! From your link I got http://www.statmt.org/europarl/ > > Any other such goodies? > > > Catalin > > -- > Omlulu.com > > > On Sun, Sep 6, 2009

Re: [Moses-support] Giza++ segv

2009-09-13 Thread Chris Dyer
Is it possible that you have a sentence of length zero? size2 is, I believe, one of the dimensions of the trellis, which in one direction is the source sentence length and in the other is the target sentence length. On Mon, Sep 14, 2009 at 1:28 AM, John Kolen wrote: > I'm trying to run the exampl

Re: [Moses-support] modelling reordering in word alignment

2009-10-31 Thread Chris Dyer
Modeling reordering is usually helpful, even during alignment. This is especially true for lexical translation models (where words are generated by other words, rather than phrases being generated from other phrases). The reordering models don't have to be particularly complicated to achieve quit

Re: [Moses-support] modelling reordering in word alignment

2009-11-04 Thread Chris Dyer
the papers below, contains substantial >> discussion and comprehensive experimental results on the benefits of >> modeling reordering. >> http://aclweb.org/anthology-new/J/J03/J03-1002.pdf >> >> >> On Sat, Oct 31, 2009 at 7:56 PM, Chris Dyer wrote: >>> Mo

Re: [Moses-support] The flag -early-discarding-threshold in moses

2009-11-09 Thread Chris Dyer
This functionality is broken in the tip of the trunk. There was a project last january to check change the way hypothesis scoring was done to be more flexible that broke this. It needs to be fixed. One alternative is to roll back to the version of the code that was at the tip of the trunk in Dec

Re: [Moses-support] Format of phrase reordering file extract.o.gz

2009-11-11 Thread Chris Dyer
Hi John- The first label is the orientation of the phrase pair with respect to its left context (on the source side), and the second is the orientation with respect to its right context. That's why you have to have "swap other" or "other swap", since a phrase can only be inverted on one side. Hope

Re: [Moses-support] How to run giza++ with a dictionary?

2009-12-20 Thread Chris Dyer
I'm not sure what the command line options are off the top of my head, but I seem to recall that Giza just boosts the counts of pairs from the dictionary by some fixed amount (5 or something). You can get same effect by adding the entries from your dictionary to the end of your corpus. 2009/12/20

Re: [Moses-support] ConfusionNet::GetSubString error when using lattice with UTF8 input

2009-12-31 Thread Chris Dyer
Confusion network input causes this error when verbose=3. You can fix this by using a lower level verbosity. On Thu, Dec 31, 2009 at 10:09 PM, liu chang wrote: > Hi, > > I'm having a strange problem that moses crashes when fed with a > lattice that has non-ASCII characters in it. If the input is

Re: [Moses-support] Stack smashing detected, died with signal 6

2010-01-21 Thread Chris Dyer
Stack smashing- that's a new one for Giza! Are you using the version of giza++ from google code (http://code.google.com/p/giza-pp/)? Some older versions had a few uninitialized variables that could conceivably cause crashes on some architectures. 2010/1/21 Guillem Massó Sanabre : > Hi, > > I am

Re: [Moses-support] word lattice + multiple translation tables optimization problem

2010-01-22 Thread Chris Dyer
I think the issue here has to do with the MERT configuration. When you use lattice input, the weights on the lattice edges are included in the translation model as a feature that MERT can optimize. I'm not familiar with the MERT optimizers that are included with Moses, but it sounds like it is ex

Re: [Moses-support] moses_chart: tuning with mert-moses-new.pl doesn't change the moses.ini

2010-02-02 Thread Chris Dyer
For French-English translation, hierarchical models will probably not do any better than phrase-based models. The rule-of-thumb seems to be that language which large-scale reordering fare better with hierarchical models, but language pairs with only relatively local reordering are better with phra

Re: [Moses-support] search graph to word lattice

2010-03-01 Thread Chris Dyer
I don't have such a tool, but it wouldn't be too difficult to write one. I think the difference between word graph and search graph is the search graph has full phrases on the edges, whereas the word graph has single words on the edges. For the input, you need single word edges. -Chris 2010/3/1

Re: [Moses-support] search graph to word lattice

2010-03-01 Thread Chris Dyer
999896  l=-13.695 > r=-20, 0, -1.60944, 0, 0, 0     w=bill clinton , pC=0.0613498, c=-3.23392 > ... > > I'm not sure if I'm using the command line argument correctly: > echo 'who is bill clinton ?' | \ > moses -f moses.ini -output-word-graph test.graph 0 > > Jör

Re: [Moses-support] search graph to word lattice

2010-03-01 Thread Chris Dyer
ch part of the split ? > Maybe it has not any real impact in the end, or has it ? > Loïc > > 2010/3/1 Chris Dyer >> >> I guess word-graph doesn't split phrases either (I was just guessing). >>  It appears to be in SLF format, which is used by a number of tools >>

Re: [Moses-support] Reordering in moses

2010-03-02 Thread Chris Dyer
You can train a reordering model using train-factored-phrase-model.perl that learns reordering patterns from a parallel corpus. There's also been a lot of work done on doing reordering with hand-written rules that apply before translation so as to make the source language have a structure that is m

Re: [Moses-support] search graph to word lattice

2010-03-04 Thread Chris Dyer
(4) >>      | >>      |--->(5)->(6) >>      how  |   is bill ? >>           | >>           |>(7)->(8) >>            is the   bill >> >>     where (6) is a recombined hypo pointing to (4) and covering tokens 1-3 &

Re: [Moses-support] segmentation fault with lattice decoding

2010-03-04 Thread Chris Dyer
allocator.h: > No such file or directory. >         in > /usr/lib/gcc/x86_64-linux-gnu/4.1.2/../../../../include/c++/4.1.2/ext/new_allocator.h > > Indeed, the header file does not exist on my system. > Do I need to install some additional packages and re-compile Moses in a &g

Re: [Moses-support] segmentation fault with lattice decoding

2010-03-04 Thread Chris Dyer
>> CXXFLAGS = -g -O2 >> to >> CXXFLAGS = -g >> do a 'make clean all', then rerun, you should get a more readable stacktrace. >> Try rerunning just on the sentence that gave you the problems to see if you >> can reproduce the problem. >> &g

Re: [Moses-support] segmentation fault with lattice decoding

2010-03-04 Thread Chris Dyer
4e9867 in > Moses::TranslationOptionCollection::CreateTranslationOptions > (this=0x12f0670, decodestep...@0x7a0600) >     at TranslationOptionCollection.cpp:389 > ---Type to continue, or q to quit--- > #8  0x0044657f in Moses::Manager::ProcessSentence > (this=0x7fffd1188160) >

Re: [Moses-support] segmentation fault with lattice decoding

2010-03-14 Thread Chris Dyer
Oh right, I had completely forgotten about this. With non-lattice input, there is some logic that looks for phrases only up to a certain max phrase size. However, this does not work with lattices and must be disabled. I usually set the max phrase size to be 10 or something like that, which h

Re: [Moses-support] Adding sentence-level flag features

2010-03-25 Thread Chris Dyer
Moses uses features to discriminate between alternative translations of individual sentences, so if the value is constant for all possible translations (for example, because it is a function of the input), the model won't be able to take advantage of it. It sounds like you might be proposing somet

Re: [Moses-support] Adding sentence-level flag features

2010-03-25 Thread Chris Dyer
e only way to find out would be to try. I appreciate any > suggestions you have. > > Suzy > > On 26/03/2010, at 11:32 AM, Chris Dyer wrote: > >> Moses uses features to discriminate between alternative translations >> of individual sentences, so if the value is constant for

Re: [Moses-support] Adding sentence-level flag features

2010-03-26 Thread Chris Dyer
incorporate the feature. > Also, what weight is used, weight-i? > > Suzy > > On 26/03/2010, at 12:30 PM, Chris Dyer wrote: > >> That sounds reasonable.  And, I don't think you'll need to add an >> extra feature to moses to do this.  The lattice input format l

Re: [Moses-support] Assertion weightAll.size() >= weightAllOffset + numScoreComponent failed

2010-05-16 Thread Chris Dyer
That doesn't happen too often...Can you send along your moses.ini file? On Sun, May 16, 2010 at 7:17 PM, David Edelstein wrote: > Hello, > > I have trained, tuned, and prepared an eval set, trying to get Moses > to decode Arabic to English. However, trying with either filtered or > unfiltered phr

Re: [Moses-support] Assertion weightAll.size() >= weightAllOffset + numScoreComponent failed

2010-05-16 Thread Chris Dyer
Jie's assessment is correct, your moses.ini is missing a value under the [weight-t] section. There should be 5 values there, but instead there are 4. I'm not familiar with the MERT implementation that is bundled with moses these days, so I can't really tell you where to look, but it should defini

Re: [Moses-support] converting lattice from HTK to PLF

2010-08-02 Thread Chris Dyer
If you do put together a script to convert to PLF (using the Moses documentation), this would be a valuable contribution to the moses code base. I'll be happy to answer questions about the lattice format as they come up, although I'm starting a new job this week and may be delayed in responding to

Re: [Moses-support] SLF to PLF: how to remove links

2010-08-06 Thread Chris Dyer
Moses interprets the string *EPS* as an epsilon transition in the lattice, which means it can take the transition (and use any associated features), but the translation model will ignore the transition. -Chris On Fri, Aug 6, 2010 at 11:02 AM, Sylvain Raybaud wrote: > There may be others but I see

Re: [Moses-support] SLF to PLF - tests - moses crashing

2010-08-09 Thread Chris Dyer
I'm sorry I haven't had an opportunity to look into this yet (hopefully later this evening). But, one thing that you need to do is make sure that your config file has an entry setting the maximum phrase size to a very large number, which prevents some bad pruning from taking place that can lead to

Re: [Moses-support] SLF to PLF - tests - moses crashing

2010-08-09 Thread Chris Dyer
Yes, that will do it. On Mon, Aug 9, 2010 at 4:13 PM, Sylvain Raybaud wrote: > On Monday 09 August 2010 20:43:53 Chris Dyer wrote: >> I'm sorry I haven't had an opportunity to look into this yet >> (hopefully later this evening). But, one thing that you need to do

[Moses-support] Lattice (PLF) verifier

2010-08-15 Thread Chris Dyer
Hi moses users, This message is only of interest if you use Moses's word lattice translation features. Since Moses is hardly graceful when it encounters malformed lattice input, I've added a simple binary (moses-cmd/src/checkplf) that you can use to verify that your lattice inputs are both syntact

Re: [Moses-support] Decoding lattice with moses_chart?

2010-08-24 Thread Chris Dyer
I don't know if moses's chart decoder supports lattices, but two other chart decoders, Joshua and cdec, do. On Tue, Aug 24, 2010 at 8:27 PM, Hwidong Na wrote: > Hi all, > > I want to decode an input lattice with moses_chart. When I switch the > decoder from moses to moses_chart, it results as fol

Re: [Moses-support] Word lattice representation for Moses (PLF)

2010-10-18 Thread Chris Dyer
Hi Mehmet, The following lattice will do what you are asking for, I think: ((('x',0.5,1),('xy',0.5,2)),(('yz',1,2),),(('z',1,1),),) The trick is ot use the last element of the tuples to indicate what node the edge ends up in. The first two nodes, have single edges leaving, but the edges don't le

Re: [Moses-support] Proposal to replace vertical bar as factor delimeter

2010-11-15 Thread Chris Dyer
> --factorDelimiter=| There is such a flag. I implemented this about 4 years ago, but AFAIK I'm the only one who ever uses (and I still use it). -C > > etc. > > Miles > > On 15 November 2010 21:30, Hieu Hoang wrote: >> That's a good idea. In the decoder, there's 4 places that has to be >> change

Re: [Moses-support] Lower scores with Word Lattice

2010-11-16 Thread Chris Dyer
> I had a query with regard to use of lattice input in moses. > There is a little difference in the translations generated when I run moses > using the 'normal' input format and when I run it with 'lattice input' > format. > The translations weren't radically different - only a few phrases were > d

Re: [Moses-support] compound spiltting for German

2010-11-16 Thread Chris Dyer
I have some software that will generate splits from German language compounds: https://github.com/redpony/cdec/tree/master/compound-split/ It can produce either lattices of high probability splits or just a 1-best split. The model used is a conditional random field trained on a small amount of

Re: [Moses-support] Use of qsub array in moses-parallel.pl

2010-12-16 Thread Chris Dyer
Would it be possible to have some kind of flag that turns this on or off? For a variety of reasons I've been working with the same software in a bunch of different environments that are similar (but just different enough) that I found it useful to make the parts that deal with the cluster sort of c

Re: [Moses-support] ' character in word lattices

2010-12-18 Thread Chris Dyer
You can escape it with a backslash: ((('\'',1,1),),) On Sat, Dec 18, 2010 at 7:47 AM, Mehmet Tatlıcıoğlu wrote: > Hi, > How can I put ' character as a label on an edge in word lattices? > eg. if the label is "test", then the lattice component is the form > of ((('test', 1.0, 1), ), ) . what about

Re: [Moses-support] the insides of lattice decoding

2011-01-25 Thread Chris Dyer
Hi Sylvain, I've gone ahead and added the relevant function to WordLattice.h/cpp that should make it a bit easier to construct lattices programmatically. You'll need to encode them in the data type defined in PCNTools.h, which is basically a programmatic representation of the PLF format described i

Re: [Moses-support] German-English-German Moses

2011-03-10 Thread Chris Dyer
There's a German compound splitting tool that's tuned for MT that's released as part of cdec (https://github.com/redpony/cdec). You'll have to build the decoder, but then you should be able to run the script in cdec / compound-split / compound-split.pl -Chris On Thu, Mar 10, 2011 at 1:50 PM, Tom

Re: [Moses-support] German-English-German Moses

2011-03-11 Thread Chris Dyer
t was trained using segmentations that seemed (to my intuition) to be "sensible" for MT. On Fri, Mar 11, 2011 at 3:05 AM, Joerg Tiedemann wrote: > On Thu, Mar 10, 2011 at 8:09 PM, Chris Dyer wrote: >> There's a German compound splitting tool that's tuned for MT that&#x

Re: [Moses-support] producing the minimal number of LM-OOVs

2011-03-19 Thread Chris Dyer
I've started using an OOV feature (fires for each LM-OOV) together with an open-vocabulary LM, and found that this improves the BLEU score. Typically, the weight learned on the OOV feature (by MERT) is quite a bit more negative than the default amount estimated during LM training, but it is still f

Re: [Moses-support] producing the minimal number of LM-OOVs

2011-03-20 Thread Chris Dyer
>>> >>>>> An LM-OOV feature sounds like a good solution to me. Chris, have you >>>>> tried pegging the LM-OOV feature weight at an extremely high value? I >>>>> suspect the gains you are getting are due to the use of in LM >>>>> co

Re: [Moses-support] producing the minimal number of LM-OOVs

2011-03-21 Thread Chris Dyer
On Mon, Mar 21, 2011 at 3:19 AM, Alex Fraser wrote: >> 2) there seems to be some evidence that some translations in the >> phrase table are so bad that having leaving some words untranslated >> is "better" than using what's in the phrase table. I can see an >> argument that says that you should us

Re: [Moses-support] producing the minimal number of LM-OOVs

2011-03-21 Thread Chris Dyer
>> I allow pass through of all words, with a penalty that is also learned >> by MERT. > Interesting stuff. Do you have results published on this? This was easiest to implement when I wrote cdec, and the results seemed good enough, so I never did a proper comparison. I will describe the newer innova

Re: [Moses-support] Implementation of Lattice MERT

2011-03-25 Thread Chris Dyer
cdec (https://github.com/redpony/cdec) includes an implementation, called vest. But someone needs to write code that will cause moses to export its search lattices in the right format (which is a funny crappy json-based encoding). On Fri, Mar 25, 2011 at 2:59 PM, Lane Schwartz wrote: > Does anyon

  1   2   >