[jira] [Commented] (JOSHUA-332) Merge 7 branch into master

2017-10-26 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220971#comment-16220971
 ] 

Matt Post commented on JOSHUA-332:
--

I am not sure if it's worth merging 7. I'm not saying it's not, I just honestly 
don't know. There was a lot of work there but it was also a number of ideas 
that were never fully implemented. We did do quite a bit of work on it, but it 
may be very far from being able to be merged. You might consider just 
abandoning those unless you have a clear idea how to pull them in.

>  Merge 7 branch into master
> ---
>
> Key: JOSHUA-332
> URL: https://issues.apache.org/jira/browse/JOSHUA-332
> Project: Joshua
>  Issue Type: Task
>  Components: core
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 7
>
>
> As discussed on the mailing list, let's branch _master_ into a _6x_ branch 
> and merge branch _7_ into _master_ in order to keep developing on top of the 
> latest in the main branch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (JOSHUA-277) UnsatisfiedLinkError: no ken in java.library.path

2017-08-27 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143361#comment-16143361
 ] 

Matt Post commented on JOSHUA-277:
--

If you want to tar up your whole run directory and make it available somewhere 
I can take a closer look.





> UnsatisfiedLinkError: no ken in java.library.path
> -
>
> Key: JOSHUA-277
> URL: https://issues.apache.org/jira/browse/JOSHUA-277
> Project: Joshua
>  Issue Type: Bug
>Reporter: Thamme Gowda
>
> I followed this guide http://joshua.incubator.apache.org/6.0/quick-start.html 
> to test the latest build.
> Assuming there few things are broken due to newer maven build system, I tried 
> to fix pipeline.pl to get the quick start guide working.
> Which files from kenlm build should I add to JNI path? (I am unable to locate 
> the library file in the kenlm build output)
> Here is the full log:
> {code}
> $JOSHUA/bin/pipeline.pl --source bn --target en --type hiero 
> --no-prepare --aligner berkeley --corpus input/bn-en/tok/training.bn-en   
>   --tune input/bn-en/tok/dev.bn-en --test input/bn-en/tok/devtest.bn-en
> [train-copy-and-filter] cached, skipping...
> [train-vocab-bn] cached, skipping...
> [train-vocab-en] cached, skipping...
> [tune-copy-and-filter] cached, skipping...
> [tune-vocab-bn] cached, skipping...
> [tune-vocab-en.0] cached, skipping...
> [tune-vocab-en.1] cached, skipping...
> [tune-vocab-en.2] cached, skipping...
> [tune-vocab-en.3] cached, skipping...
> [test-copy-and-filter] cached, skipping...
> [test-vocab-bn] cached, skipping...
> [test-vocab-en.0] cached, skipping...
> [test-vocab-en.1] cached, skipping...
> [test-vocab-en.2] cached, skipping...
> [test-vocab-en.3] cached, skipping...
> [source-numlines] cached, skipping...
> [source-numlines] retrieved cached result =>20788
> [berkeley-aligner-chunk-0] cached, skipping...
> [aligner-combine] cached, skipping...
> [pack-grammar] cached, skipping...
> [lm-sort-uniq] cached, skipping...
> [kenlm] cached, skipping...
> [compile-kenlm] cached, skipping...
> [glue-tune] cached, skipping...
> Error: Could not find or load main class 
> joshua.util.encoding.EncoderConfiguration
> [tune-bundle] cached, skipping...
> [mert-1] rebuilding...
>   
> dep=/Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/data/tune/corpus.bn
>  [CHANGED]
>   
> dep=/Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/joshua.config
>  [CHANGED]
>   dep=tune/model/grammar.packed/slice_0.source [CHANGED]
>   
> dep=/Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/joshua.config.final
>  [NOT FOUND]
>   
> cmd=/Users/thammegr/work/projects/apache/incubator-joshua/scripts/training/run_tuner.py
>  
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/data/tune/corpus.bn
>  
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/data/tune/corpus.en
>  --tunedir 
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune --tuner 
> mert --decoder 
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/decoder_command
>  --decoder-config 
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/joshua.config
>  --decoder-output-file 
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/output.nbest
>  --decoder-log-file 
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/joshua.log
>  --iterations 10 --metric 'BLEU 4 closest'
>   JOB FAILED (return code 1)
> Exception in thread "main" java.lang.RuntimeException: Unable to instantiate 
> feature function 'StateMinimizingLanguageModel -lm_order 5 -lm_file 
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/model/lm.kenlm'!
>   at 
> org.apache.joshua.decoder.Decoder.initializeFeatureFunctions(Decoder.java:761)
>   at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:514)
>   at org.apache.joshua.decoder.Decoder.(Decoder.java:122)
>   at org.apache.joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:69)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.joshua.decoder.Decoder.initializeFeatureFunctions(Decoder.java:757)
>   ... 3 more
> Caused by: java.lang.ExceptionInInitializerError
>   at 
> org.apache.joshua.decoder.ff.lm.StateMinimizingLanguageModel.initializeLM(StateMinimizingLanguageModel.java:75)
>   at 
> 

[jira] [Commented] (JOSHUA-277) UnsatisfiedLinkError: no ken in java.library.path

2017-08-22 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137288#comment-16137288
 ] 

Matt Post commented on JOSHUA-277:
--

Is there a file lib/libken.so? And bin/lmplz?

> UnsatisfiedLinkError: no ken in java.library.path
> -
>
> Key: JOSHUA-277
> URL: https://issues.apache.org/jira/browse/JOSHUA-277
> Project: Joshua
>  Issue Type: Bug
>Reporter: Thamme Gowda
>
> I followed this guide http://joshua.incubator.apache.org/6.0/quick-start.html 
> to test the latest build.
> Assuming there few things are broken due to newer maven build system, I tried 
> to fix pipeline.pl to get the quick start guide working.
> Which files from kenlm build should I add to JNI path? (I am unable to locate 
> the library file in the kenlm build output)
> Here is the full log:
> {code}
> $JOSHUA/bin/pipeline.pl --source bn --target en --type hiero 
> --no-prepare --aligner berkeley --corpus input/bn-en/tok/training.bn-en   
>   --tune input/bn-en/tok/dev.bn-en --test input/bn-en/tok/devtest.bn-en
> [train-copy-and-filter] cached, skipping...
> [train-vocab-bn] cached, skipping...
> [train-vocab-en] cached, skipping...
> [tune-copy-and-filter] cached, skipping...
> [tune-vocab-bn] cached, skipping...
> [tune-vocab-en.0] cached, skipping...
> [tune-vocab-en.1] cached, skipping...
> [tune-vocab-en.2] cached, skipping...
> [tune-vocab-en.3] cached, skipping...
> [test-copy-and-filter] cached, skipping...
> [test-vocab-bn] cached, skipping...
> [test-vocab-en.0] cached, skipping...
> [test-vocab-en.1] cached, skipping...
> [test-vocab-en.2] cached, skipping...
> [test-vocab-en.3] cached, skipping...
> [source-numlines] cached, skipping...
> [source-numlines] retrieved cached result =>20788
> [berkeley-aligner-chunk-0] cached, skipping...
> [aligner-combine] cached, skipping...
> [pack-grammar] cached, skipping...
> [lm-sort-uniq] cached, skipping...
> [kenlm] cached, skipping...
> [compile-kenlm] cached, skipping...
> [glue-tune] cached, skipping...
> Error: Could not find or load main class 
> joshua.util.encoding.EncoderConfiguration
> [tune-bundle] cached, skipping...
> [mert-1] rebuilding...
>   
> dep=/Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/data/tune/corpus.bn
>  [CHANGED]
>   
> dep=/Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/joshua.config
>  [CHANGED]
>   dep=tune/model/grammar.packed/slice_0.source [CHANGED]
>   
> dep=/Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/joshua.config.final
>  [NOT FOUND]
>   
> cmd=/Users/thammegr/work/projects/apache/incubator-joshua/scripts/training/run_tuner.py
>  
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/data/tune/corpus.bn
>  
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/data/tune/corpus.en
>  --tunedir 
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune --tuner 
> mert --decoder 
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/decoder_command
>  --decoder-config 
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/joshua.config
>  --decoder-output-file 
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/output.nbest
>  --decoder-log-file 
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/joshua.log
>  --iterations 10 --metric 'BLEU 4 closest'
>   JOB FAILED (return code 1)
> Exception in thread "main" java.lang.RuntimeException: Unable to instantiate 
> feature function 'StateMinimizingLanguageModel -lm_order 5 -lm_file 
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/model/lm.kenlm'!
>   at 
> org.apache.joshua.decoder.Decoder.initializeFeatureFunctions(Decoder.java:761)
>   at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:514)
>   at org.apache.joshua.decoder.Decoder.(Decoder.java:122)
>   at org.apache.joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:69)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.joshua.decoder.Decoder.initializeFeatureFunctions(Decoder.java:757)
>   ... 3 more
> Caused by: java.lang.ExceptionInInitializerError
>   at 
> org.apache.joshua.decoder.ff.lm.StateMinimizingLanguageModel.initializeLM(StateMinimizingLanguageModel.java:75)
>   at 
> org.apache.joshua.decoder.ff.lm.LanguageModelFF.(LanguageModelFF.java:156)
>   at 
> 

[jira] [Commented] (JOSHUA-277) UnsatisfiedLinkError: no ken in java.library.path

2017-08-22 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137271#comment-16137271
 ] 

Matt Post commented on JOSHUA-277:
--

KenLM is not getting built. Did you check out the Getting Started page? 
download-deps.sh downloads and builds KenLM. Likely it is failing.

https://cwiki.apache.org/confluence/display/JOSHUA/Getting+Started

> UnsatisfiedLinkError: no ken in java.library.path
> -
>
> Key: JOSHUA-277
> URL: https://issues.apache.org/jira/browse/JOSHUA-277
> Project: Joshua
>  Issue Type: Bug
>Reporter: Thamme Gowda
>
> I followed this guide http://joshua.incubator.apache.org/6.0/quick-start.html 
> to test the latest build.
> Assuming there few things are broken due to newer maven build system, I tried 
> to fix pipeline.pl to get the quick start guide working.
> Which files from kenlm build should I add to JNI path? (I am unable to locate 
> the library file in the kenlm build output)
> Here is the full log:
> {code}
> $JOSHUA/bin/pipeline.pl --source bn --target en --type hiero 
> --no-prepare --aligner berkeley --corpus input/bn-en/tok/training.bn-en   
>   --tune input/bn-en/tok/dev.bn-en --test input/bn-en/tok/devtest.bn-en
> [train-copy-and-filter] cached, skipping...
> [train-vocab-bn] cached, skipping...
> [train-vocab-en] cached, skipping...
> [tune-copy-and-filter] cached, skipping...
> [tune-vocab-bn] cached, skipping...
> [tune-vocab-en.0] cached, skipping...
> [tune-vocab-en.1] cached, skipping...
> [tune-vocab-en.2] cached, skipping...
> [tune-vocab-en.3] cached, skipping...
> [test-copy-and-filter] cached, skipping...
> [test-vocab-bn] cached, skipping...
> [test-vocab-en.0] cached, skipping...
> [test-vocab-en.1] cached, skipping...
> [test-vocab-en.2] cached, skipping...
> [test-vocab-en.3] cached, skipping...
> [source-numlines] cached, skipping...
> [source-numlines] retrieved cached result =>20788
> [berkeley-aligner-chunk-0] cached, skipping...
> [aligner-combine] cached, skipping...
> [pack-grammar] cached, skipping...
> [lm-sort-uniq] cached, skipping...
> [kenlm] cached, skipping...
> [compile-kenlm] cached, skipping...
> [glue-tune] cached, skipping...
> Error: Could not find or load main class 
> joshua.util.encoding.EncoderConfiguration
> [tune-bundle] cached, skipping...
> [mert-1] rebuilding...
>   
> dep=/Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/data/tune/corpus.bn
>  [CHANGED]
>   
> dep=/Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/joshua.config
>  [CHANGED]
>   dep=tune/model/grammar.packed/slice_0.source [CHANGED]
>   
> dep=/Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/joshua.config.final
>  [NOT FOUND]
>   
> cmd=/Users/thammegr/work/projects/apache/incubator-joshua/scripts/training/run_tuner.py
>  
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/data/tune/corpus.bn
>  
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/data/tune/corpus.en
>  --tunedir 
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune --tuner 
> mert --decoder 
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/decoder_command
>  --decoder-config 
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/joshua.config
>  --decoder-output-file 
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/output.nbest
>  --decoder-log-file 
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/joshua.log
>  --iterations 10 --metric 'BLEU 4 closest'
>   JOB FAILED (return code 1)
> Exception in thread "main" java.lang.RuntimeException: Unable to instantiate 
> feature function 'StateMinimizingLanguageModel -lm_order 5 -lm_file 
> /Users/thammegr/work/projects/apache/incubator-joshua/data/bn-en/tune/model/lm.kenlm'!
>   at 
> org.apache.joshua.decoder.Decoder.initializeFeatureFunctions(Decoder.java:761)
>   at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:514)
>   at org.apache.joshua.decoder.Decoder.(Decoder.java:122)
>   at org.apache.joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:69)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.joshua.decoder.Decoder.initializeFeatureFunctions(Decoder.java:757)
>   ... 3 more
> Caused by: java.lang.ExceptionInInitializerError
>   at 
> 

[jira] [Commented] (JOSHUA-331) Address Apache Joshua 6.1 RC#3 Issues

2017-03-06 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898102#comment-15898102
 ] 

Matt Post commented on JOSHUA-331:
--

All four files should be variations of each other. Alas, they are not.

lm.berkeleylm.gz uncompressed is lm.berkeleylm, so that's good.

lm.gz, unfortunately, when uncompressed, is missing a line from "lm". However, 
recompressing "lm" (with its additional line) into lm.gz results in no changes 
to the tests. 

However, I regenerated all the files from the "lm" file (with its additional 
 line, which is crucial for one of the tests). This was done in the 
following manner:

cat lm | gzip -9n > lm.gz
$JOSHUA/scripts/lm/compile_berkeleylm.py lm lm.berkeleylm
cat lm.berkeleylm | gzip -9n > lm.berkeleylm.gz

Running "mvn test" then succeeds, so I have done this and committed and pushed.

All of these tests are important because they exercise BerkeleyLM's ability to 
read and properly recognize its different supported files.

Now, compile_berkeleylm.py is a fairly simply wrapper around a java call. So it 
would not be difficult to modify the code and distribute only the 
human-readable "lm" file.

Thoughts?

> Address Apache Joshua 6.1 RC#3 Issues
> -
>
> Key: JOSHUA-331
> URL: https://issues.apache.org/jira/browse/JOSHUA-331
> Project: Joshua
>  Issue Type: Task
>  Components: release
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 6.1
>
>
> Address the following issues:
> {quote}
> Every ASF release MUST contain one or more source packages, which MUST be
> sufficient for a user to build and test the release provided they have
> access to the appropriate platform and tools. - NO
> -Not building due to failing test (BerkleyLM failure).  I'm digging a
> bit more into this.
> {quote}
> {quote}
> Every artifact distributed to the public through Apache channels MUST be
> accompanied by one file containing an OpenPGP compatible ASCII armored
> detached signature and another file containing an MD5 checksum.
> - .asc - NO
> I get warning:
> "gpg --verify joshua-incubating-6.1-src.tar.gz.asc
> joshua-incubating-6.1-src.tar.gz
> gpg: Signature made Thu Feb 23 09:15:17 2017 CET using RSA key ID
> 891768A5
> gpg: Good signature from "Tommaso Teofili "
> [unknown]
> gpg: WARNING: This key is not certified with a trusted signature!
> gpg:  There is no indication that the signature belongs to the
> owner."
> - .md5 - NO
> My md5 of joshua-incubating-6.1-src.tar.gz is
> 504976876b01294811293aa45b5400f5, the joshua-incubating-6.1-src.tar.gz.md5
> indicates it should be 22b738eeae45757715080702a5bd2789
> - .sha - NO
> My sha of joshua-incubating-6.1-src.tar.gz is
> 4AB5BA24301590F36AE6452DACC3F21CBD8B3FEC, the
> joshua-incubating-6.1-src.tar.gz.md5 indicates it should be
> 2a55b6d341dddc5369b22a4802a86ec40accd0a1
> - KEYS - YES
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (JOSHUA-329) A suspicious use of incrementer in for statement

2017-01-31 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15847740#comment-15847740
 ] 

Matt Post commented on JOSHUA-329:
--

I think you are correct. Thanks for pointing this out!

> A suspicious use of incrementer in for statement
> 
>
> Key: JOSHUA-329
> URL: https://issues.apache.org/jira/browse/JOSHUA-329
> Project: Joshua
>  Issue Type: Bug
>Reporter: Jaechang Nam
>Priority: Trivial
>
> In a recent snapshot of the github mirror, I've found a suspicious 
> incrementer in 
> src/main/java/org/apache/joshua/decoder/ff/lm/LanguageModelFF.java.
> {code:java}
> 269   for (int i = 0; i < tokens.length; i++) {
> 270 if (tokens[i] > 0) { // skip nonterminals
> 271   for (int j = 0; j < alignments.length; j += 2) {
> 272 if (alignments[j] == i) {
> 273   String annotation = 
> sentence.getAnnotation((int)alignments[i] + begin, "class");
> 274   if (annotation != null) {
> 275 //System.err.println(String.format("  
> word %d source %d abs %d annotation %d/%s",
> 276 //i, alignments[i], alignments[i] + 
> begin, annotation, Vocabulary.word(annotation)));
> 277 tokens[i] = Vocabulary.id(annotation);
> 278 break;
> 279   }
> 280 }
> 281   }
> 282 }
> 283   }
> {code}
> In Line 273, alignments[i] should be alignments[j] if tokens.length is not 
> same as alignments.length? Since I don't have domain knowledge, this may not 
> be correct but just wanted to report this in case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (JOSHUA-327) travis build fails

2017-01-25 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post resolved JOSHUA-327.
--
Resolution: Fixed

> travis build fails
> --
>
> Key: JOSHUA-327
> URL: https://issues.apache.org/jira/browse/JOSHUA-327
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>
> Travis builds with KenLM fail because the "downlown-deps.sh" script pauses 
> indefinitely for a response.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-327) travis build fails

2017-01-25 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838962#comment-15838962
 ] 

Matt Post commented on JOSHUA-327:
--

Fixed in commit f581881631ffa8f68d9f4e864909da5002b6067f.

> travis build fails
> --
>
> Key: JOSHUA-327
> URL: https://issues.apache.org/jira/browse/JOSHUA-327
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>
> Travis builds with KenLM fail because the "downlown-deps.sh" script pauses 
> indefinitely for a response.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-328) failure when glue grammar is listed first

2017-01-25 Thread Matt Post (JIRA)
Matt Post created JOSHUA-328:


 Summary: failure when glue grammar is listed first
 Key: JOSHUA-328
 URL: https://issues.apache.org/jira/browse/JOSHUA-328
 Project: Joshua
  Issue Type: Bug
Affects Versions: 6.1
Reporter: Matt Post
 Fix For: 6.1


If doing CKY-decoding (-search cky), listing the glue grammar before the packed 
grammar results in a parsing failure. E.g., the following lines in the config 
file:

tm = thrax -maxspan -1 -owner glue -path model/glue.grammar
tm = thrax -maxspan 20 -path model/grammar.packed -owner pt

will result in failed decoding every time, and a printing of the following 
error message:

ERROR - the goal_bin does not have exactly one item




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-325) Warning about non-ASF licensed downloads

2017-01-25 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838650#comment-15838650
 ] 

Matt Post commented on JOSHUA-325:
--

This was fixed with commit c65f70fc427945a2b18a9b2ee77b8614be7fc051, and 
further addressed with commit f581881631ffa8f68d9f4e864909da5002b6067f.

> Warning about non-ASF licensed downloads
> 
>
> Key: JOSHUA-325
> URL: https://issues.apache.org/jira/browse/JOSHUA-325
> Project: Joshua
>  Issue Type: Task
>Affects Versions: 6.1
>Reporter: Matt Post
>Assignee: Matt Post
>Priority: Minor
>
> Via Tom Barber:
> The download-deps.sh file obviously downloads and builds stuff with non ASF
> licenses, I realise this is for model training purposes only, and 99.9%
> wont care, but should we consider putting a prompt into that script warning
> people. I ask because a company might add in the training modules blindly
> assuming because the script is distributed by the ASF the modules are also
> ASL2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-325) Warning about non-ASF licensed downloads

2017-01-25 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post resolved JOSHUA-325.
--
Resolution: Fixed

> Warning about non-ASF licensed downloads
> 
>
> Key: JOSHUA-325
> URL: https://issues.apache.org/jira/browse/JOSHUA-325
> Project: Joshua
>  Issue Type: Task
>Affects Versions: 6.1
>Reporter: Matt Post
>Assignee: Matt Post
>Priority: Minor
>
> Via Tom Barber:
> The download-deps.sh file obviously downloads and builds stuff with non ASF
> licenses, I realise this is for model training purposes only, and 99.9%
> wont care, but should we consider putting a prompt into that script warning
> people. I ask because a company might add in the training modules blindly
> assuming because the script is distributed by the ASF the modules are also
> ASL2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-327) travis build fails

2017-01-25 Thread Matt Post (JIRA)
Matt Post created JOSHUA-327:


 Summary: travis build fails
 Key: JOSHUA-327
 URL: https://issues.apache.org/jira/browse/JOSHUA-327
 Project: Joshua
  Issue Type: Bug
Reporter: Matt Post


Travis builds with KenLM fail because the "downlown-deps.sh" script pauses 
indefinitely for a response.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-324) Address Apache Joshua 6.1 RC#2 Issues

2017-01-25 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837799#comment-15837799
 ] 

Matt Post commented on JOSHUA-324:
--

Lewis — Does this mean we're good to go? Is there something I can do?

> Address Apache Joshua 6.1 RC#2 Issues
> -
>
> Key: JOSHUA-324
> URL: https://issues.apache.org/jira/browse/JOSHUA-324
> Project: Joshua
>  Issue Type: Task
>Affects Versions: 6.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 6.1
>
>
> Feedback from [~jmclean] (thank you Justin) on our RC#2 is as follows
> {code}
> ==
> - Your missing incubating in the release artifacts name. [1]
> - There are a number of binary files in the source release that look to be
> compiled source code.
> I checked:
> - name doesn’t include incubating
> - signatures and hashes correct
> - DISCLAIMER exists
> - LICENSE is missing a few things (see below)
> - a source file is missing an Apache header [7]
> - Several unexpected binary files are contained in the source release
> [8][9][10][11]
> - Can compile from source
> License is missing:
> - MIT licensed normalize.css v3.0.3 bundled in [5]
> - glyph icon fonts [6]
> Not an issue but it's a little odd to have LICENSE and NOTICE.txt - usually
> both are bare or both have .txt extension.
> Also while looking at your site I noticed that the download links of you
> incubating site [2] points to github, please change to point to the offical
> release area.
> Also the 6.1 release has already been tagged and it available for public
> download on github [4]  before this vote is finished. This is IMO against
> Apache release policy [3] please remove.
> I also notice you recently released the language packs (18th Nov) but there
> doesn’t seem to have been a vote for that? Any reason for this?
> ===
> [1] http://incubator.apache.org/incubation/Incubation_Policy.html#Releases
> [2] 
> https://cwiki.apache.org/confluence/display/JOSHUA/Apache+Joshua+%28Incubating%29+Home
> [3] http://www.apache.org/dev/release.html#what
> [4] https://github.com/apache/incubator-joshua/releases
> [5] ./demo/bootstrap/css/bootstrap.min.css
> [6] apache-joshua-6.1/demo/bootstrap/fonts/*
> [7] ./src/test/java/org/apache/joshua/decoder/ff/tm/OwnerMapTest.java
> [8] ./bin/GIZA++
> [9] ./bin/mkcls
> [10 ]./bin/snt2cooc.out
> [11] ,/src/test/resources/berkeley_lm/lm.berkeleylm.gz
> [12] http://www.mail-archive.com/general%40incubator.apache.org/msg57543.html
> [13] http://www.mail-archive.com/general%40incubator.apache.org/msg57551.html
> {code}
> This is a blocking issue and until addressed we cannot release 6.1-incubating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-326) Make preprocessing phase pluggable

2016-12-22 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770280#comment-15770280
 ] 

Matt Post commented on JOSHUA-326:
--

+1

> Make preprocessing phase pluggable
> --
>
> Key: JOSHUA-326
> URL: https://issues.apache.org/jira/browse/JOSHUA-326
> Project: Joshua
>  Issue Type: Improvement
>  Components: pipeline
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 6.2
>
>
> It'd be nice to have the data preprocessing phase pluggable, with a default 
> simple Java implementation and eventually other more advanced ones based on 
> external tools like Apache OpenNLP.
> That should replace our scripts based preprocessing:
> - tokenization: 
> https://github.com/apache/incubator-joshua/blob/master/scripts/preparation/tokenize.pl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-325) Warning about non-ASF licensed downloads

2016-12-12 Thread Matt Post (JIRA)
Matt Post created JOSHUA-325:


 Summary: Warning about non-ASF licensed downloads
 Key: JOSHUA-325
 URL: https://issues.apache.org/jira/browse/JOSHUA-325
 Project: Joshua
  Issue Type: Task
Affects Versions: 6.1
Reporter: Matt Post
Assignee: Matt Post
Priority: Minor


Via Tom Barber:
The download-deps.sh file obviously downloads and builds stuff with non ASF
licenses, I realise this is for model training purposes only, and 99.9%
wont care, but should we consider putting a prompt into that script warning
people. I ask because a company might add in the training modules blindly
assuming because the script is distributed by the ASF the modules are also
ASL2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-324) Address Apache Joshua 6.1 RC#2 Issues

2016-11-29 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706660#comment-15706660
 ] 

Matt Post commented on JOSHUA-324:
--

I don't see any of the binaries [8] [9] [10]: 
https://github.com/apache/incubator-joshua/tree/master/bin

We discussed the language packs on the mailing list, but I didn't call a vote — 
it didn't cross my mind.

> Address Apache Joshua 6.1 RC#2 Issues
> -
>
> Key: JOSHUA-324
> URL: https://issues.apache.org/jira/browse/JOSHUA-324
> Project: Joshua
>  Issue Type: Task
>Affects Versions: 6.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 6.1
>
>
> Feedback from [~jmclean] (thank you Justin) on our RC#2 is as follows
> {code}
> ==
> - Your missing incubating in the release artifacts name. [1]
> - There are a number of binary files in the source release that look to be
> compiled source code.
> I checked:
> - name doesn’t include incubating
> - signatures and hashes correct
> - DISCLAIMER exists
> - LICENSE is missing a few things (see below)
> - a source file is missing an Apache header [7]
> - Several unexpected binary files are contained in the source release
> [8][9][10][11]
> - Can compile from source
> License is missing:
> - MIT licensed normalize.css v3.0.3 bundled in [5]
> - glyph icon fonts [6]
> Not an issue but it's a little odd to have LICENSE and NOTICE.txt - usually
> both are bare or both have .txt extension.
> Also while looking at your site I noticed that the download links of you
> incubating site [2] points to github, please change to point to the offical
> release area.
> Also the 6.1 release has already been tagged and it available for public
> download on github [4]  before this vote is finished. This is IMO against
> Apache release policy [3] please remove.
> I also notice you recently released the language packs (18th Nov) but there
> doesn’t seem to have been a vote for that? Any reason for this?
> ===
> [1] http://incubator.apache.org/incubation/Incubation_Policy.html#Releases
> [2] 
> https://cwiki.apache.org/confluence/display/JOSHUA/Apache+Joshua+%28Incubating%29+Home
> [3] http://www.apache.org/dev/release.html#what
> [4] https://github.com/apache/incubator-joshua/releases
> [5] ./demo/bootstrap/css/bootstrap.min.css
> [6] apache-joshua-6.1/demo/bootstrap/fonts/*
> [7] ./src/test/java/org/apache/joshua/decoder/ff/tm/OwnerMapTest.java
> [8] ./bin/GIZA++
> [9] ./bin/mkcls
> [10 ]./bin/snt2cooc.out
> [11] ,/src/test/resources/berkeley_lm/lm.berkeleylm.gz
> [12] http://www.mail-archive.com/general%40incubator.apache.org/msg57543.html
> [13] http://www.mail-archive.com/general%40incubator.apache.org/msg57551.html
> {code}
> This is a blocking issue and until addressed we cannot release 6.1-incubating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-315) Thrax keeps all rules

2016-11-17 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15673452#comment-15673452
 ] 

Matt Post commented on JOSHUA-315:
--

Yeah, I had expected a bigger savings, too. I should quantify it in terms of 
runtime, as well.

> Thrax keeps all rules
> -
>
> Key: JOSHUA-315
> URL: https://issues.apache.org/jira/browse/JOSHUA-315
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
> Fix For: 6.2
>
>
> When extracting rules, Thrax keeps *all* options for each target side. For 
> large bitexts and common source sides (e.g., "de" for Spanish–English), there 
> can be tens of thousands of translations, due to errors in the alignments and 
> phenomena like garbage collection. The decoder throws out all but the top 
> num_translation_options of these (default 20), but before doing so, it has to 
> score all the target side options with all feature functions, include the 
> language model. This slows down "warming up" of the model and means that the 
> first sentences to use these items are very slow to translation.
> I have updated scripts/training/filter-rules.pl to filter out using Thrax's 
> rarity penalty field, but it would be much better if Thrax were to keep only 
> the most 100 frequent translation options for each source side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-315) Thrax keeps all rules

2016-11-14 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664649#comment-15664649
 ] 

Matt Post commented on JOSHUA-315:
--

This has been addressed in commit 885389d513b5d0f3f68b59c3b17a776584b3a208. If 
you add the word "count" to the list of thrax features in the thrax config 
file, a sixth field will be extracted with the rule count, e.g.,

[X] ||| de ||| of ||| 0.72572 0.29124 1 0 0.39357 0.17023 ||| 0-0 ||| 
2565758
[X] ||| de ||| to ||| 2.89509 2.10811 1 0 2.87285 2.08282 ||| 0-0 ||| 215020
[X] ||| de ||| in ||| 3.11663 2.17583 1 0 2.91081 2.34837 ||| 0-0 ||| 207011
...

This is then used by the filter-rules.pl script (with the flag -t 100) to prune 
remove all rules except the top 100 most frequent, for each source side. This 
has been added to the pipeline. The grammars seem to be about 5% smaller and 
should have only a positive effect on running time.

> Thrax keeps all rules
> -
>
> Key: JOSHUA-315
> URL: https://issues.apache.org/jira/browse/JOSHUA-315
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
> Fix For: 6.2
>
>
> When extracting rules, Thrax keeps *all* options for each target side. For 
> large bitexts and common source sides (e.g., "de" for Spanish–English), there 
> can be tens of thousands of translations, due to errors in the alignments and 
> phenomena like garbage collection. The decoder throws out all but the top 
> num_translation_options of these (default 20), but before doing so, it has to 
> score all the target side options with all feature functions, include the 
> language model. This slows down "warming up" of the model and means that the 
> first sentences to use these items are very slow to translation.
> I have updated scripts/training/filter-rules.pl to filter out using Thrax's 
> rarity penalty field, but it would be much better if Thrax were to keep only 
> the most 100 frequent translation options for each source side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-315) Thrax keeps all rules

2016-11-14 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post resolved JOSHUA-315.
--
Resolution: Fixed

> Thrax keeps all rules
> -
>
> Key: JOSHUA-315
> URL: https://issues.apache.org/jira/browse/JOSHUA-315
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
> Fix For: 6.2
>
>
> When extracting rules, Thrax keeps *all* options for each target side. For 
> large bitexts and common source sides (e.g., "de" for Spanish–English), there 
> can be tens of thousands of translations, due to errors in the alignments and 
> phenomena like garbage collection. The decoder throws out all but the top 
> num_translation_options of these (default 20), but before doing so, it has to 
> score all the target side options with all feature functions, include the 
> language model. This slows down "warming up" of the model and means that the 
> first sentences to use these items are very slow to translation.
> I have updated scripts/training/filter-rules.pl to filter out using Thrax's 
> rarity penalty field, but it would be much better if Thrax were to keep only 
> the most 100 frequent translation options for each source side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-318) scripts/training/run_tuner.py should enable configurable memory usage when invioking joshua-decoder

2016-10-30 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15620936#comment-15620936
 ] 

Matt Post commented on JOSHUA-318:
--

Ah, I see what happens here. To be compatible with Moses, we query the decoder 
for its list of dense features. But that loads all the models and so on, so you 
might need lots of memory. This will be rewritten in 7 since we no longer have 
dense features there. I don't think I'll fix it before the release.

> scripts/training/run_tuner.py should enable configurable memory usage when 
> invioking joshua-decoder
> ---
>
> Key: JOSHUA-318
> URL: https://issues.apache.org/jira/browse/JOSHUA-318
> Project: Joshua
>  Issue Type: Improvement
>  Components: tuner
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
> Fix For: 6.2
>
>
> When I run the run_tuner.py script I can easily run into the following
> {code}
> [mert-1] rebuilding...
>   dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en
>   dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
> [CHANGED]
>   dep=tune/model/grammar.gz.packed/slice_0.source [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py 
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en 
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru 
> --tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner 
> mert --decoder 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command 
> --decoder-config 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
> --decoder-output-file 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest 
> --decoder-log-file 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log 
> --iterations 10 --metric 'BLEU 4 closest'
>   JOB FAILED (return code 1)
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.joshua.decoder.ff.tm.packed.PackedGrammar$PackedSlice.initializeFeatureStructures(PackedGrammar.java:385)
>   at 
> org.apache.joshua.decoder.ff.tm.packed.PackedGrammar$PackedSlice.(PackedGrammar.java:368)
>   at 
> org.apache.joshua.decoder.ff.tm.packed.PackedGrammar.(PackedGrammar.java:153)
>   at 
> org.apache.joshua.decoder.Decoder.initializeTranslationGrammars(Decoder.java:458)
>   at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:389)
>   at org.apache.joshua.decoder.Decoder.(Decoder.java:128)
>   at org.apache.joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:69)
> Traceback (most recent call last):
>   File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 553, 
> in 
> main(sys.argv)
>   File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 536, 
> in main
> run_zmert(opts.tunedir, opts.source, opts.target, opts.decoder, 
> opts.decoder_config, opts.decoder_output_file, opts)
>   File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 417, 
> in run_zmert
> opts.metric, opts.iterations or 10)
>   File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 399, 
> in setup_configs
> for feature,weight in get_features(config):
>   File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 351, 
> in get_features
> output = check_output("%s/bin/joshua-decoder -c %s -show-weights -v 0" % 
> (JOSHUA, config_file), shell=True)
>   File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 626, in 
> check_output
> **kwargs).stdout
>   File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 708, in 
> run
> output=stdout, stderr=stderr)
> subprocess.CalledProcessError: Command 
> '/usr/local/incubator-joshua/bin/joshua-decoder -c 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
> -show-weights -v 0' returned non-zero exit status 1
> {code}
> This is because, by default the joshua-decoder script runs with 4g of memory. 
> The run_runer.py script should be flexible enough to continue with the memory 
> allocation provided when a pipe was initially invoked. This value should then 
> be passed to the joshua-decoder script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-317) SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391

2016-10-30 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15620904#comment-15620904
 ] 

Matt Post commented on JOSHUA-317:
--

I don't have any trouble running this. What version of Python are you using?

> SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391
> 
>
> Key: JOSHUA-317
> URL: https://issues.apache.org/jira/browse/JOSHUA-317
> Project: Joshua
>  Issue Type: Bug
>  Components: tuner
>Affects Versions: 6.0.5
> Environment: Python 3.5
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 6.1
>
>
> {code}
> [tune-bundle] rebuilding...
>   
> dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/grammar.packed/slice_0.source
>  [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/model/run-joshua.sh
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force 
> --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir 
> /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/model 
> --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" 
> -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 
> tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function 
> "StateMinimizingLanguageModel -lm_order 5 -lm_file 
> /usr/local/joshua_resources/russian_experiments/exp3/lm.kenlm"  -tm0/type 
> hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm 
> /usr/local/joshua_resources/russian_experiments/exp3/grammar.packed --tm 
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/grammar.glue
>   took 0 seconds (0s)
> [mert-1] rebuilding...
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en 
> [CHANGED]
>   dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
> [CHANGED]
>   dep=tune/model/grammar.packed/slice_0.source [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py 
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en 
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru 
> --tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner 
> mert --decoder 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command 
> --decoder-config 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
> --decoder-output-file 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest 
> --decoder-log-file 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log 
> --iterations 10 --metric 'BLEU 4 closest'
>   JOB FAILED (return code 1)
>   File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 391
> 'ITERATIONS': `iterations`,
>   ^
> SyntaxError: invalid syntax
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-312) Even though alignment is cached, it is always re-done in pipeline re-execution

2016-10-18 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586063#comment-15586063
 ] 

Matt Post commented on JOSHUA-312:
--

This is fixed with commit 301f301cdcad5ab49c8465506791e5f117e1c944 (just 
pushed). The problem was that I changed the structure of the alignment splits, 
and did not update the paths for Berkeley aligner. Sorry about the trouble!

> Even though alignment is cached, it is always re-done in pipeline re-execution
> --
>
> Key: JOSHUA-312
> URL: https://issues.apache.org/jira/browse/JOSHUA-312
> Project: Joshua
>  Issue Type: Improvement
>  Components: alignment
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Critical
> Fix For: 6.2
>
>
> Say if a pipeline fails after alignment. The alignment result is never cached 
> and it becomes necessary to undertake alignment... again!
> We should investigate the process for caching alignments as it would really 
> speed up rerunning end-to-end pipelines for large input datasets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-312) Even though alignment is cached, it is always re-done in pipeline re-execution

2016-10-18 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post resolved JOSHUA-312.
--
Resolution: Fixed

> Even though alignment is cached, it is always re-done in pipeline re-execution
> --
>
> Key: JOSHUA-312
> URL: https://issues.apache.org/jira/browse/JOSHUA-312
> Project: Joshua
>  Issue Type: Improvement
>  Components: alignment
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Critical
> Fix For: 6.2
>
>
> Say if a pipeline fails after alignment. The alignment result is never cached 
> and it becomes necessary to undertake alignment... again!
> We should investigate the process for caching alignments as it would really 
> speed up rerunning end-to-end pipelines for large input datasets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-315) Thrax keeps all rules

2016-10-13 Thread Matt Post (JIRA)
Matt Post created JOSHUA-315:


 Summary: Thrax keeps all rules
 Key: JOSHUA-315
 URL: https://issues.apache.org/jira/browse/JOSHUA-315
 Project: Joshua
  Issue Type: Bug
Reporter: Matt Post
 Fix For: 6.2


When extracting rules, Thrax keeps *all* options for each target side. For 
large bitexts and common source sides (e.g., "de" for Spanish–English), there 
can be tens of thousands of translations, due to errors in the alignments and 
phenomena like garbage collection. The decoder throws out all but the top 
num_translation_options of these (default 20), but before doing so, it has to 
score all the target side options with all feature functions, include the 
language model. This slows down "warming up" of the model and means that the 
first sentences to use these items are very slow to translation.

I have updated scripts/training/filter-rules.pl to filter out using Thrax's 
rarity penalty field, but it would be much better if Thrax were to keep only 
the most 100 frequent translation options for each source side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-311) Improve pipeline logging to indicate location on berkeley alignment log(s)

2016-10-13 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572497#comment-15572497
 ] 

Matt Post commented on JOSHUA-311:
--

In any case, I'm going to move this to 6.2.

> Improve pipeline logging to indicate location on berkeley alignment log(s)
> --
>
> Key: JOSHUA-311
> URL: https://issues.apache.org/jira/browse/JOSHUA-311
> Project: Joshua
>  Issue Type: Improvement
>  Components: alignment, logging, pipeline
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
> Fix For: 6.2
>
>
> When one runs a pipeline using --aligner berkeley, no log location is 
> provided for user to follow progress of alignment.
> {code}
> [berkeley-aligner-chunk-0] rebuilding...
>   dep=alignments/0/word-align.conf [CHANGED]
>   
> dep=/usr/local/jpl/xdata/joshua_experiments/russian_experiments/0/data/train/splits/corpus.en.0
>  [NOT FOUND]
>   
> dep=/usr/local/jpl/xdata/joshua_experiments/russian_experiments/0/data/train/splits/corpus.ru.0
>  [NOT FOUND]
>   dep=alignments/0/training.align [NOT FOUND]
>   cmd=java -d64 -Xmx10g -jar 
> /usr/local/jpl/xdata/joshua_experiments/incubator-joshua/ext/berkeleyaligner/distribution/berkeleyaligner.jar
>  ++alignments/0/word-align.conf
> {code}
> We could add something like
> {code}
> [berkeley-aligner-chunk-0] rebuilding...
>   dep=alignments/0/word-align.conf [CHANGED]
>   
> dep=/usr/local/jpl/xdata/joshua_experiments/russian_experiments/0/data/train/splits/corpus.en.0
>  [NOT FOUND]
>   
> dep=/usr/local/jpl/xdata/joshua_experiments/russian_experiments/0/data/train/splits/corpus.ru.0
>  [NOT FOUND]
>   dep=alignments/0/training.align [NOT FOUND]
>   cmd=java -d64 -Xmx10g -jar 
> /usr/local/jpl/xdata/joshua_experiments/incubator-joshua/ext/berkeleyaligner/distribution/berkeleyaligner.jar
>  ++alignments/0/word-align.conf logs being written to /path/to/log
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-280) Existing Language packs not compatible with Joshua master

2016-10-13 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post resolved JOSHUA-280.
--
Resolution: Fixed

> Existing Language packs not compatible with Joshua master
> -
>
> Key: JOSHUA-280
> URL: https://issues.apache.org/jira/browse/JOSHUA-280
> Project: Joshua
>  Issue Type: Bug
>  Components: language packs
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Assignee: Matt Post
>Priority: Critical
> Fix For: 6.1
>
>
> When I work with the existing Spanish --> English language pack at 
> http://cs.jhu.edu/~post/language-packs/language-pack-es-en-phrase-2015-03-06.tgz,
>  I get the following error
> {code}
> lmcgibbn@LMC-032857 
> /usr/local/Cellar/joshua/HEAD/libexec/language-pack-es-en-phrase-2015-03-06(NUTCH-2089)
>  $ ./run-joshua-server.sh
> INFO - Parameters read from configuration file: joshua.config
> INFO - tm = 'moses -owner pt -maxspan 0 -path phrase-table.packed 
> -max-source-len 5'
> INFO - defaultnonterminal = 'X'
> INFO - goalsymbol = 'GOAL'
> INFO - featurefunction = 'StateMinimizingLanguageModel -lm_type kenlm 
> -lm_order 5 -lm_file lm.kenlm'
> INFO - markoovs = 'false'
> INFO - search = 'stack'
> INFO - pop-limit: 100
> INFO - poplimit = '100'
> INFO - topn = '0'
> INFO - useuniquenbest = 'true'
> INFO - outputformat = '%s'
> INFO - includealignindex = 'false'
> INFO - featurefunction = 'OOVPenalty'
> INFO - featurefunction = 'WordPenalty'
> INFO - featurefunction = 'Distortion'
> INFO - featurefunction = 'PhrasePenalty'
> INFO - c = 'joshua.config'
> INFO - server-port: 5674
> INFO - serverport = '5674'
> INFO - Read 9 weights (0 of them dense)
> INFO - Reading vocabulary: phrase-table.packed/vocabulary
> INFO - Read 191983 entries from the vocabulary
> INFO - Reading packed config: phrase-table.packed/config
> 102030405060708090.100%
> Exception in thread "main" java.lang.RuntimeException: The grammar at 
> phrase-table.packed was packed with packer version 0, but the earliest 
> supported version is 3
>   at 
> org.apache.joshua.decoder.ff.tm.packed.PackedGrammar.readConfig(PackedGrammar.java:1061)
>   at 
> org.apache.joshua.decoder.ff.tm.packed.PackedGrammar.(PackedGrammar.java:143)
>   at 
> org.apache.joshua.decoder.phrase.PhraseTable.(PhraseTable.java:65)
>   at 
> org.apache.joshua.decoder.Decoder.initializeTranslationGrammars(Decoder.java:603)
>   at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:514)
>   at org.apache.joshua.decoder.Decoder.(Decoder.java:126)
>   at org.apache.joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:69)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-280) Existing Language packs not compatible with Joshua master

2016-10-13 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572431#comment-15572431
 ] 

Matt Post commented on JOSHUA-280:
--

This is all fixed with the new language packer. Language packs will now include 
the runtime and have no external dependencies (including on Joshua or $JOSHUA).

> Existing Language packs not compatible with Joshua master
> -
>
> Key: JOSHUA-280
> URL: https://issues.apache.org/jira/browse/JOSHUA-280
> Project: Joshua
>  Issue Type: Bug
>  Components: language packs
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Assignee: Matt Post
>Priority: Critical
> Fix For: 6.1
>
>
> When I work with the existing Spanish --> English language pack at 
> http://cs.jhu.edu/~post/language-packs/language-pack-es-en-phrase-2015-03-06.tgz,
>  I get the following error
> {code}
> lmcgibbn@LMC-032857 
> /usr/local/Cellar/joshua/HEAD/libexec/language-pack-es-en-phrase-2015-03-06(NUTCH-2089)
>  $ ./run-joshua-server.sh
> INFO - Parameters read from configuration file: joshua.config
> INFO - tm = 'moses -owner pt -maxspan 0 -path phrase-table.packed 
> -max-source-len 5'
> INFO - defaultnonterminal = 'X'
> INFO - goalsymbol = 'GOAL'
> INFO - featurefunction = 'StateMinimizingLanguageModel -lm_type kenlm 
> -lm_order 5 -lm_file lm.kenlm'
> INFO - markoovs = 'false'
> INFO - search = 'stack'
> INFO - pop-limit: 100
> INFO - poplimit = '100'
> INFO - topn = '0'
> INFO - useuniquenbest = 'true'
> INFO - outputformat = '%s'
> INFO - includealignindex = 'false'
> INFO - featurefunction = 'OOVPenalty'
> INFO - featurefunction = 'WordPenalty'
> INFO - featurefunction = 'Distortion'
> INFO - featurefunction = 'PhrasePenalty'
> INFO - c = 'joshua.config'
> INFO - server-port: 5674
> INFO - serverport = '5674'
> INFO - Read 9 weights (0 of them dense)
> INFO - Reading vocabulary: phrase-table.packed/vocabulary
> INFO - Read 191983 entries from the vocabulary
> INFO - Reading packed config: phrase-table.packed/config
> 102030405060708090.100%
> Exception in thread "main" java.lang.RuntimeException: The grammar at 
> phrase-table.packed was packed with packer version 0, but the earliest 
> supported version is 3
>   at 
> org.apache.joshua.decoder.ff.tm.packed.PackedGrammar.readConfig(PackedGrammar.java:1061)
>   at 
> org.apache.joshua.decoder.ff.tm.packed.PackedGrammar.(PackedGrammar.java:143)
>   at 
> org.apache.joshua.decoder.phrase.PhraseTable.(PhraseTable.java:65)
>   at 
> org.apache.joshua.decoder.Decoder.initializeTranslationGrammars(Decoder.java:603)
>   at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:514)
>   at org.apache.joshua.decoder.Decoder.(Decoder.java:126)
>   at org.apache.joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:69)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-299) Move regression tests to proper unit tests

2016-10-13 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572425#comment-15572425
 ] 

Matt Post commented on JOSHUA-299:
--

This was almost entirely completed, and we are marking it done. It has been 
completed on the 7 branch.

> Move regression tests to proper unit tests
> --
>
> Key: JOSHUA-299
> URL: https://issues.apache.org/jira/browse/JOSHUA-299
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>Assignee: Lewis John McGibbney
> Fix For: 6.1
>
>  Time Spent: 2m
>  Remaining Estimate: 0h
>
> Many of the regression tests (test*.sh under src/test/resources) have been 
> moved to proper unit tests, but this move should be completed, and the 
> regression tests should be deleted. This should be done for 6.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-299) Move regression tests to proper unit tests

2016-10-13 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post resolved JOSHUA-299.
--
Resolution: Fixed

> Move regression tests to proper unit tests
> --
>
> Key: JOSHUA-299
> URL: https://issues.apache.org/jira/browse/JOSHUA-299
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>Assignee: Lewis John McGibbney
> Fix For: 6.1
>
>  Time Spent: 2m
>  Remaining Estimate: 0h
>
> Many of the regression tests (test*.sh under src/test/resources) have been 
> moved to proper unit tests, but this move should be completed, and the 
> regression tests should be deleted. This should be done for 6.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-313) Provide a language model based on OpenNLP

2016-10-12 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568747#comment-15568747
 ] 

Matt Post commented on JOSHUA-313:
--

What specifically did you have in mind? From a quick look it seems that the 
abilities there are fairly limited, and in fact they might benefit from 
importing BerkeleyLM (which is not as good as KenLM for training LMs or 
representing them, but is good, and is Apache-licensed).

> Provide a language model based on OpenNLP
> -
>
> Key: JOSHUA-313
> URL: https://issues.apache.org/jira/browse/JOSHUA-313
> Project: Joshua
>  Issue Type: Improvement
>  Components: core
>Reporter: Tommaso Teofili
>Priority: Minor
> Fix For: 7
>
>
> Since OPENNLP-659 OpenNLP has language modelling capabilities so we could 
> evaluate it within Joshua.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-314) Enable set structured-output from config file

2016-10-12 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568730#comment-15568730
 ] 

Matt Post commented on JOSHUA-314:
--

It really bothers me (JOSHUA-289 in particular), but I was just going to put it 
off to 6.2 (which will be 7) because it's not really crucial and will just 
release 6.1 further. It's not really much of an advertised feature and we have 
lots of plans toward the API for 7, so it makes sense to me to just ignore it 
for now. Does that sound okay? 


> Enable set structured-output from config file
> -
>
> Key: JOSHUA-314
> URL: https://issues.apache.org/jira/browse/JOSHUA-314
> Project: Joshua
>  Issue Type: Improvement
>  Components: core
>Reporter: Tommaso Teofili
>
> Currently if one sets _use-structured-output = true_ in joshua.config that 
> results in error when parsing the config as it's not explicitly handled by 
> {{JoshuaConfiguration#readConfig}} (it can only be set programmatically), I 
> think it'd be nice to be able to configure it from config file too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-314) Enable set structured-output from config file

2016-10-12 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568694#comment-15568694
 ] 

Matt Post commented on JOSHUA-314:
--

Agreed that this is a problem, but the plan for 7 is to get rid of the 
structured / non-structured distinction entirely, so that structured output is 
*always* what is returned. The output formatting is currently a huge mess, with 
redundant options all over the place, and we are going to clean that up (see 
JOSHUA-289).

https://issues.apache.org/jira/browse/JOSHUA-289


> Enable set structured-output from config file
> -
>
> Key: JOSHUA-314
> URL: https://issues.apache.org/jira/browse/JOSHUA-314
> Project: Joshua
>  Issue Type: Improvement
>  Components: core
>Reporter: Tommaso Teofili
>
> Currently if one sets _use-structured-output = true_ in joshua.config that 
> results in error when parsing the config as it's not explicitly handled by 
> {{JoshuaConfiguration#readConfig}} (it can only be set programmatically), I 
> think it'd be nice to be able to configure it from config file too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-288) Port fast_align to java

2016-10-07 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556244#comment-15556244
 ] 

Matt Post commented on JOSHUA-288:
--

For AER, French is almost certainly the Hansards: 
http://www.isi.edu/natural-language/download/hansard/. I'm not sure for 
Chinese. It doesn't seem that Table 2 is described in the text. I think you 
could benchmark against just the Hansards. Or manually against whatever 
fast_align produces.

> Port fast_align to java
> ---
>
> Key: JOSHUA-288
> URL: https://issues.apache.org/jira/browse/JOSHUA-288
> Project: Joshua
>  Issue Type: New Feature
>Reporter: Matt Post
>Assignee: John Hewitt
>Priority: Minor
> Fix For: 6.2
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> It would be great to have a Java port of fast_align, so that we don't have to 
> worry about compiling it, and could distribute it via Maven.
> https://github.com/clab/fast_align



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-312) Even though alignment is cached, it is always re-done in pipeline re-execution

2016-09-28 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15529750#comment-15529750
 ] 

Matt Post commented on JOSHUA-312:
--

I checked on my end, and I see alignments being cached just fine. Please post 
the output of the pipeline script.

> Even though alignment is cached, it is always re-done in pipeline re-execution
> --
>
> Key: JOSHUA-312
> URL: https://issues.apache.org/jira/browse/JOSHUA-312
> Project: Joshua
>  Issue Type: Improvement
>  Components: alignment
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Priority: Critical
> Fix For: 6.2
>
>
> Say if a pipeline fails after alignment. The alignment result is never cached 
> and it becomes necessary to undertake alignment... again!
> We should investigate the process for caching alignments as it would really 
> speed up rerunning end-to-end pipelines for large input datasets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-268) Phrase-based model error (NullPointerException)

2016-09-28 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15529713#comment-15529713
 ] 

Matt Post commented on JOSHUA-268:
--

I am marking this as closed for the moment. If the problem persists please 
re-open this request.

> Phrase-based model error (NullPointerException)
> ---
>
> Key: JOSHUA-268
> URL: https://issues.apache.org/jira/browse/JOSHUA-268
> Project: Joshua
>  Issue Type: Bug
>  Components: decoders
>Affects Versions: 6.0.5
> Environment: fedora 23
>Reporter: Kyle Richardson
>Assignee: Matt Post
>Priority: Minor
> Fix For: 6.1
>
>
> I'm trying to run the phrase.sh example script (the only modification I made 
> was to take out the --optimizer-runs option, because the system says that 
> this is an "Unknown option"). 
> The error comes at the tuning stage (specifically, it fails at some point in 
> the tuning then complains that it cannot find the "joshua.config.final" 
> file). 
> Looking into the log file (tune/joshua.log), it seems to translate and tune a 
> number of sentences, then it raises the following NullPointerException: 
> Memory used after sentence 7 is 42.5 MB
> Translation 7: -30.617 good how is fine
> Input 2: Collecting options took 0.000 seconds
> Input 8: Collecting options took 0.000 seconds
> Input 2: FATAL UNCAUGHT EXCEPTION: null
> java.lang.NullPointerException
> at joshua.decoder.phrase.Candidate.score(Candidate.java:214)
> at joshua.decoder.phrase.Candidate.compareTo(Candidate.java:136)
> at joshua.decoder.phrase.Candidate.compareTo(Candidate.java:19)
> at java.util.HashMap.compareComparables(HashMap.java:371)
> at java.util.HashMap$TreeNode.treeify(HashMap.java:1920)
> at java.util.HashMap.treeifyBin(HashMap.java:771)
> at java.util.HashMap.putVal(HashMap.java:643)
> at java.util.HashMap.put(HashMap.java:611)
> at java.util.HashSet.add(HashSet.java:219)
> at joshua.decoder.phrase.Stack.addCandidate(Stack.java:125)
> at joshua.decoder.phrase.Stacks.search(Stacks.java:166)
> at joshua.decoder.DecoderThread.translate(DecoderThread.java:113)
> atjoshua.decoder.Decoder$DecoderThreadRunner.run(Decoder.java:218)
> There's nothing informative in the tune/mert.log, it just says that it exited 
> prematurely. The other processes seem to work as expected (although in the 
> giza.log, there are a number of "Sentence mismatch error! Line " warnings). 
> I'm running this on Fedora 23  with Moses.  I had no problems training the 
> hiero model.
> note---
> There appears to be an open ticket for more or less the same problem 
> (JOSHUA-267), the difference however is that in that in this ticket, it 
> appears that the tuner fails on the first input, whereas here, it already 
> decodes/tunes several inputs before failing (see above). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (JOSHUA-268) Phrase-based model error (NullPointerException)

2016-09-28 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post closed JOSHUA-268.

Resolution: Fixed

> Phrase-based model error (NullPointerException)
> ---
>
> Key: JOSHUA-268
> URL: https://issues.apache.org/jira/browse/JOSHUA-268
> Project: Joshua
>  Issue Type: Bug
>  Components: decoders
>Affects Versions: 6.0.5
> Environment: fedora 23
>Reporter: Kyle Richardson
>Assignee: Matt Post
>Priority: Minor
> Fix For: 6.1
>
>
> I'm trying to run the phrase.sh example script (the only modification I made 
> was to take out the --optimizer-runs option, because the system says that 
> this is an "Unknown option"). 
> The error comes at the tuning stage (specifically, it fails at some point in 
> the tuning then complains that it cannot find the "joshua.config.final" 
> file). 
> Looking into the log file (tune/joshua.log), it seems to translate and tune a 
> number of sentences, then it raises the following NullPointerException: 
> Memory used after sentence 7 is 42.5 MB
> Translation 7: -30.617 good how is fine
> Input 2: Collecting options took 0.000 seconds
> Input 8: Collecting options took 0.000 seconds
> Input 2: FATAL UNCAUGHT EXCEPTION: null
> java.lang.NullPointerException
> at joshua.decoder.phrase.Candidate.score(Candidate.java:214)
> at joshua.decoder.phrase.Candidate.compareTo(Candidate.java:136)
> at joshua.decoder.phrase.Candidate.compareTo(Candidate.java:19)
> at java.util.HashMap.compareComparables(HashMap.java:371)
> at java.util.HashMap$TreeNode.treeify(HashMap.java:1920)
> at java.util.HashMap.treeifyBin(HashMap.java:771)
> at java.util.HashMap.putVal(HashMap.java:643)
> at java.util.HashMap.put(HashMap.java:611)
> at java.util.HashSet.add(HashSet.java:219)
> at joshua.decoder.phrase.Stack.addCandidate(Stack.java:125)
> at joshua.decoder.phrase.Stacks.search(Stacks.java:166)
> at joshua.decoder.DecoderThread.translate(DecoderThread.java:113)
> atjoshua.decoder.Decoder$DecoderThreadRunner.run(Decoder.java:218)
> There's nothing informative in the tune/mert.log, it just says that it exited 
> prematurely. The other processes seem to work as expected (although in the 
> giza.log, there are a number of "Sentence mismatch error! Line " warnings). 
> I'm running this on Fedora 23  with Moses.  I had no problems training the 
> hiero model.
> note---
> There appears to be an open ticket for more or less the same problem 
> (JOSHUA-267), the difference however is that in that in this ticket, it 
> appears that the tuner fails on the first input, whereas here, it already 
> decodes/tunes several inputs before failing (see above). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-312) Alignment is never cached

2016-09-22 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513677#comment-15513677
 ] 

Matt Post commented on JOSHUA-312:
--

Hi Lewis — I don't understand. The individual steps of computing the alignment 
are all cached. If alignment completes, it will skip them. I use this all the 
time. Can you be more specific?

> Alignment is never cached
> -
>
> Key: JOSHUA-312
> URL: https://issues.apache.org/jira/browse/JOSHUA-312
> Project: Joshua
>  Issue Type: Improvement
>  Components: alignment
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Priority: Critical
> Fix For: 6.2
>
>
> Say if a pipeline fails after alignment. The alignment result is never cached 
> and it becomes necessary to undertake alignment... again!
> We should investigate the process for caching alignments as it would really 
> speed up rerunning end-to-end pipelines for large input datasets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (JOSHUA-297) List supported versions of Hadoop

2016-09-14 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post closed JOSHUA-297.

Resolution: Cannot Reproduce

> List supported versions of Hadoop
> -
>
> Key: JOSHUA-297
> URL: https://issues.apache.org/jira/browse/JOSHUA-297
> Project: Joshua
>  Issue Type: Task
>Reporter: Bob Paulin
>Assignee: Matt Post
>Priority: Minor
> Fix For: 6.1
>
> Attachments: thrax-hadoop0.20.2.log, thrax-hadoop2.6.4.log
>
>
> When working through the training tutorial I noticed that no version of 
> Hadoop was listed so I tried the latest Hadoop 2.6.4.  The Thrax Job failed 
> on this version.  It worked however with 0.20.2 .  I found this on 
> http://joshua.incubator.apache.org/6.0/pipeline.html by hovering over a link 
> on the Hadoop section.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-310) Derivation visualizer

2016-09-08 Thread Matt Post (JIRA)
Matt Post created JOSHUA-310:


 Summary: Derivation visualizer
 Key: JOSHUA-310
 URL: https://issues.apache.org/jira/browse/JOSHUA-310
 Project: Joshua
  Issue Type: Improvement
Reporter: Matt Post
Priority: Minor
 Fix For: 6.2


The tree visualizer under examples/tree_visualizer should be pulled out into a 
top-level module, maybe joshua-gui or joshua-visualization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-308) Apply consistent formatting to project and remove trailing whitespace

2016-09-08 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15475742#comment-15475742
 ] 

Matt Post commented on JOSHUA-308:
--

And whoever does this, don't forget $JOSHUA/scripts/training/pipeline.pl, which 
has a huge mix of tabs and whitespace.

> Apply consistent formatting to project and remove trailing whitespace
> -
>
> Key: JOSHUA-308
> URL: https://issues.apache.org/jira/browse/JOSHUA-308
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Max Thomas
>Priority: Minor
>
> I suggest that the checked in code format be applied to all files, with the 
> following addition: remove trailing whitespace. Trailing whitespace makes it 
> unnecessarily more difficult to work with the code base.
> I thought that this was part of the format file, but I think it must be a 
> setting I have enabled outside of this in Eclipse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-309) Update CHANGELOG

2016-09-08 Thread Matt Post (JIRA)
Matt Post created JOSHUA-309:


 Summary: Update CHANGELOG
 Key: JOSHUA-309
 URL: https://issues.apache.org/jira/browse/JOSHUA-309
 Project: Joshua
  Issue Type: Improvement
Reporter: Matt Post
Priority: Minor
 Fix For: 6.1


The CHANGELOG file in Joshua root needs to be updated with a list of high-level 
things that have changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-299) Move regression tests to proper unit tests

2016-09-08 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15473794#comment-15473794
 ] 

Matt Post commented on JOSHUA-299:
--

[~maxthomas], here's some information on how to convert these, in case you have 
some time to do it today and tomorrow.

1. Change to $JOSHUA/src/test/resources, and run "bash run-all-tests.sh". This 
searches for all files test*.sh under the current directory, runs those tests, 
and reports success for ones that return 0 to the shell.

2. It should be clear what each test is doing, from the test script itself, and 
from an optional README. If not, you can leave those alone and I'll handle them.

3. As you convert each, commit the new test, and remove the shell script and 
associated extra files. If you could do each test conversion (creation of Java 
file, removal of no-longer-needed files) as its own git commit, it would help 
in tracking what's been done. This will all have to be merged into the 7 branch.

matt

> Move regression tests to proper unit tests
> --
>
> Key: JOSHUA-299
> URL: https://issues.apache.org/jira/browse/JOSHUA-299
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>Assignee: Lewis John McGibbney
> Fix For: 6.1
>
>
> Many of the regression tests (test*.sh under src/test/resources) have been 
> moved to proper unit tests, but this move should be completed, and the 
> regression tests should be deleted. This should be done for 6.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-268) Phrase-based model error (NullPointerException)

2016-09-07 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471398#comment-15471398
 ] 

Matt Post commented on JOSHUA-268:
--

Update: I have removed the (deprecated) --optimizer-runs flags from the example 
runs. I suspect that this issue will be gone with the changes that have taken 
place on the phrase-based decoder, but am testing now and will know soon.

Do you have any updates on your end?

> Phrase-based model error (NullPointerException)
> ---
>
> Key: JOSHUA-268
> URL: https://issues.apache.org/jira/browse/JOSHUA-268
> Project: Joshua
>  Issue Type: Bug
>  Components: decoders
>Affects Versions: 6.0.5
> Environment: fedora 23
>Reporter: Kyle Richardson
>Assignee: Matt Post
>Priority: Minor
> Fix For: 6.1
>
>
> I'm trying to run the phrase.sh example script (the only modification I made 
> was to take out the --optimizer-runs option, because the system says that 
> this is an "Unknown option"). 
> The error comes at the tuning stage (specifically, it fails at some point in 
> the tuning then complains that it cannot find the "joshua.config.final" 
> file). 
> Looking into the log file (tune/joshua.log), it seems to translate and tune a 
> number of sentences, then it raises the following NullPointerException: 
> Memory used after sentence 7 is 42.5 MB
> Translation 7: -30.617 good how is fine
> Input 2: Collecting options took 0.000 seconds
> Input 8: Collecting options took 0.000 seconds
> Input 2: FATAL UNCAUGHT EXCEPTION: null
> java.lang.NullPointerException
> at joshua.decoder.phrase.Candidate.score(Candidate.java:214)
> at joshua.decoder.phrase.Candidate.compareTo(Candidate.java:136)
> at joshua.decoder.phrase.Candidate.compareTo(Candidate.java:19)
> at java.util.HashMap.compareComparables(HashMap.java:371)
> at java.util.HashMap$TreeNode.treeify(HashMap.java:1920)
> at java.util.HashMap.treeifyBin(HashMap.java:771)
> at java.util.HashMap.putVal(HashMap.java:643)
> at java.util.HashMap.put(HashMap.java:611)
> at java.util.HashSet.add(HashSet.java:219)
> at joshua.decoder.phrase.Stack.addCandidate(Stack.java:125)
> at joshua.decoder.phrase.Stacks.search(Stacks.java:166)
> at joshua.decoder.DecoderThread.translate(DecoderThread.java:113)
> atjoshua.decoder.Decoder$DecoderThreadRunner.run(Decoder.java:218)
> There's nothing informative in the tune/mert.log, it just says that it exited 
> prematurely. The other processes seem to work as expected (although in the 
> giza.log, there are a number of "Sentence mismatch error! Line " warnings). 
> I'm running this on Fedora 23  with Moses.  I had no problems training the 
> hiero model.
> note---
> There appears to be an open ticket for more or less the same problem 
> (JOSHUA-267), the difference however is that in that in this ticket, it 
> appears that the tuner fails on the first input, whereas here, it already 
> decodes/tunes several inputs before failing (see above). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-297) List supported versions of Hadoop

2016-09-06 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468580#comment-15468580
 ] 

Matt Post commented on JOSHUA-297:
--

These errors (from 2.6.4) don't look to me like Thrax errors, but errors with 
the underlying Hadoop installation. If Thrax fails, you will see FAILURE 
notices on the Thrax steps. Are you perhaps running standalone Hadoop over NFS? 
This can be problematic. 

> List supported versions of Hadoop
> -
>
> Key: JOSHUA-297
> URL: https://issues.apache.org/jira/browse/JOSHUA-297
> Project: Joshua
>  Issue Type: Task
>Reporter: Bob Paulin
>Assignee: Matt Post
>Priority: Minor
> Fix For: 6.1
>
> Attachments: thrax-hadoop0.20.2.log, thrax-hadoop2.6.4.log
>
>
> When working through the training tutorial I noticed that no version of 
> Hadoop was listed so I tried the latest Hadoop 2.6.4.  The Thrax Job failed 
> on this version.  It worked however with 0.20.2 .  I found this on 
> http://joshua.incubator.apache.org/6.0/pipeline.html by hovering over a link 
> on the Hadoop section.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-297) List supported versions of Hadoop

2016-09-06 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468577#comment-15468577
 ] 

Matt Post commented on JOSHUA-297:
--

[~bobpaulin], do you have any updates on this? Is it still broken for you?

> List supported versions of Hadoop
> -
>
> Key: JOSHUA-297
> URL: https://issues.apache.org/jira/browse/JOSHUA-297
> Project: Joshua
>  Issue Type: Task
>Reporter: Bob Paulin
>Assignee: Matt Post
>Priority: Minor
> Fix For: 6.1
>
> Attachments: thrax-hadoop0.20.2.log, thrax-hadoop2.6.4.log
>
>
> When working through the training tutorial I noticed that no version of 
> Hadoop was listed so I tried the latest Hadoop 2.6.4.  The Thrax Job failed 
> on this version.  It worked however with 0.20.2 .  I found this on 
> http://joshua.incubator.apache.org/6.0/pipeline.html by hovering over a link 
> on the Hadoop section.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-308) Apply consistent formatting to project and remove trailing whitespace

2016-08-30 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449700#comment-15449700
 ] 

Matt Post commented on JOSHUA-308:
--

I'd prefer not to do this for the 6.1 branch, but once 6.1 is released, we 
could apply this to the 7 branch after merging it back into master.

> Apply consistent formatting to project and remove trailing whitespace
> -
>
> Key: JOSHUA-308
> URL: https://issues.apache.org/jira/browse/JOSHUA-308
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Max Thomas
>Priority: Minor
>
> I suggest that the checked in code format be applied to all files, with the 
> following addition: remove trailing whitespace. Trailing whitespace makes it 
> unnecessarily more difficult to work with the code base.
> I thought that this was part of the format file, but I think it must be a 
> setting I have enabled outside of this in Eclipse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (JOSHUA-302) Remove concurrent package and replace with builtins

2016-08-29 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post closed JOSHUA-302.

Resolution: Fixed

This was fixed in an earlier commit. The inclusion of the specified libraries 
was likely done inadvertently through clumsy suggestions by Eclipse.

> Remove concurrent package and replace with builtins
> ---
>
> Key: JOSHUA-302
> URL: https://issues.apache.org/jira/browse/JOSHUA-302
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Max Thomas
>Priority: Minor
>
> According to this site:
> http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html
> this package is essentially replaced by SDK builtins from JDK5.0+. 
> Is there a reason this is still a dependency and cannot be replaced with 
> default library code (which, according to the site, "includes improved, more 
> efficient, standardized versions of the main components in this package")? 
> It seems to only be used in 2 places, one of which is  about 3000 lines of 
> copy/pasted code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-295) Revamp dependency organization in Joshua

2016-08-29 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post updated JOSHUA-295:
-
Fix Version/s: 6.2

> Revamp dependency organization in Joshua
> 
>
> Key: JOSHUA-295
> URL: https://issues.apache.org/jira/browse/JOSHUA-295
> Project: Joshua
>  Issue Type: Improvement
>Affects Versions: 6.2
>Reporter: Kellen Sunderland
> Fix For: 6.2
>
>
> We would like to separate dependencies in Joshua by create a multi-module 
> maven project.  This will allow us to decouple our codebase and make it more 
> modular.  This means consumers of Joshua who are only interested in a core 
> library do not have to pull in dependencies for things like Http servers or 
> database clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-306) Translations.java consumes potentially infinite resources

2016-08-29 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post updated JOSHUA-306:
-
Fix Version/s: 6.2

> Translations.java consumes potentially infinite resources
> -
>
> Key: JOSHUA-306
> URL: https://issues.apache.org/jira/browse/JOSHUA-306
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
> Fix For: 6.2
>
>
> Translations is used to asynchronously provide a caller with access to 
> translations as they are produced. However, it is implemented as an 
> underlying synchronized list that grows and grows. In the presence of an 
> infinite stream (STDIN?), this will eventually consume all resources. This 
> might not be much to worry about, but maybe we should have old translations 
> be expunged as they are consumed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-291) Improve code quality via static analysis

2016-08-29 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post updated JOSHUA-291:
-
Fix Version/s: 6.2

> Improve code quality via static analysis
> 
>
> Key: JOSHUA-291
> URL: https://issues.apache.org/jira/browse/JOSHUA-291
> Project: Joshua
>  Issue Type: Improvement
>  Components: core
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 6.2
>
>
> We can improve code quality / readability leveraging code analysis from tools 
> like FindBugs and others integrated in IDEs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-291) Improve code quality via static analysis

2016-08-29 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447853#comment-15447853
 ] 

Matt Post commented on JOSHUA-291:
--

[~maxthomas], do you want to take a look at the 7 branch? This was merged into 
master but may have gotten dropped when I merged a number of changes on master 
into 7.

> Improve code quality via static analysis
> 
>
> Key: JOSHUA-291
> URL: https://issues.apache.org/jira/browse/JOSHUA-291
> Project: Joshua
>  Issue Type: Improvement
>  Components: core
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 6.2
>
>
> We can improve code quality / readability leveraging code analysis from tools 
> like FindBugs and others integrated in IDEs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-307) Java-based tokenization and normalization

2016-08-29 Thread Matt Post (JIRA)
Matt Post created JOSHUA-307:


 Summary: Java-based tokenization and normalization
 Key: JOSHUA-307
 URL: https://issues.apache.org/jira/browse/JOSHUA-307
 Project: Joshua
  Issue Type: Improvement
Reporter: Matt Post
Priority: Minor
 Fix For: 6.2


Currently, Joshua expects data to be lowercased, normalized, and tokenized 
consistent with the way the training data was prepared before being passed in. 
This requires calling Perl scripts on the input data. It would be nice if these 
Perl scripts (located under $JOSHUA/scripts/preparation) were rewritten in Java 
(under org.apache.joshua.util) so that Joshua could do this normalization 
itself. This would be particularly useful for the language packs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-304) word-align.conf alignment template file not compatible with berkeley aligner

2016-08-29 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post resolved JOSHUA-304.
--
Resolution: Fixed

> word-align.conf alignment template file not compatible with berkeley aligner
> 
>
> Key: JOSHUA-304
> URL: https://issues.apache.org/jira/browse/JOSHUA-304
> Project: Joshua
>  Issue Type: Bug
>  Components: alignment, berkeley, templates
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Priority: Blocker
> Fix For: 6.1
>
>
> It takes me quite some time to debug what was going on and why pipeline's 
> were failing when using the berkeley aligner.
> It turns out that the word-align.conf template provided at
> https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf
> is not compatible with the berkeley aligner. 
> In particular the following lines are non compatible
> https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf#L12-L15
> Evidence of this is provided below
> {code}
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1, HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'JOINT JOINT'; valid choices: FORWARD|REVERSE|BOTH_INDEP|JOINT
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Exception in thread "main" java.lang.NumberFormatException: For input string: 
> "5 5"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:143)
>   at 
> edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:240)
>   at edu.berkeley.nlp.fig.basic.OptInfo.set(OptionsParser.java:294)
>   at 
> edu.berkeley.nlp.fig.basic.OptionsParser.readOptionsFile(OptionsParser.java:555)
>   at 
> edu.berkeley.nlp.fig.basic.OptionsParser.doParse(OptionsParser.java:604)
>   at edu.berkeley.nlp.fig.exec.Execution.init(Execution.java:293)
>   at edu.berkeley.nlp.wordAlignment.Main.main(Main.java:149)
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Cannot create directory: alignments/0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-304) word-align.conf alignment template file not compatible with berkeley aligner

2016-08-29 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446657#comment-15446657
 ] 

Matt Post commented on JOSHUA-304:
--

Sorry, you have to also install the Berkeley aligner jar. See the last two 
lines of download-deps.sh. Then it should work.

> word-align.conf alignment template file not compatible with berkeley aligner
> 
>
> Key: JOSHUA-304
> URL: https://issues.apache.org/jira/browse/JOSHUA-304
> Project: Joshua
>  Issue Type: Bug
>  Components: alignment, berkeley, templates
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Priority: Blocker
> Fix For: 6.1
>
>
> It takes me quite some time to debug what was going on and why pipeline's 
> were failing when using the berkeley aligner.
> It turns out that the word-align.conf template provided at
> https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf
> is not compatible with the berkeley aligner. 
> In particular the following lines are non compatible
> https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf#L12-L15
> Evidence of this is provided below
> {code}
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1, HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'JOINT JOINT'; valid choices: FORWARD|REVERSE|BOTH_INDEP|JOINT
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Exception in thread "main" java.lang.NumberFormatException: For input string: 
> "5 5"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:143)
>   at 
> edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:240)
>   at edu.berkeley.nlp.fig.basic.OptInfo.set(OptionsParser.java:294)
>   at 
> edu.berkeley.nlp.fig.basic.OptionsParser.readOptionsFile(OptionsParser.java:555)
>   at 
> edu.berkeley.nlp.fig.basic.OptionsParser.doParse(OptionsParser.java:604)
>   at edu.berkeley.nlp.fig.exec.Execution.init(Execution.java:293)
>   at edu.berkeley.nlp.wordAlignment.Main.main(Main.java:149)
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Cannot create directory: alignments/0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-304) word-align.conf alignment template file not compatible with berkeley aligner

2016-08-29 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446555#comment-15446555
 ] 

Matt Post commented on JOSHUA-304:
--

It's easiest if you just wipe everything, but you could remove just alignments/ 
and data/train/

> word-align.conf alignment template file not compatible with berkeley aligner
> 
>
> Key: JOSHUA-304
> URL: https://issues.apache.org/jira/browse/JOSHUA-304
> Project: Joshua
>  Issue Type: Bug
>  Components: alignment, berkeley, templates
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Priority: Blocker
> Fix For: 6.1
>
>
> It takes me quite some time to debug what was going on and why pipeline's 
> were failing when using the berkeley aligner.
> It turns out that the word-align.conf template provided at
> https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf
> is not compatible with the berkeley aligner. 
> In particular the following lines are non compatible
> https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf#L12-L15
> Evidence of this is provided below
> {code}
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1, HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'JOINT JOINT'; valid choices: FORWARD|REVERSE|BOTH_INDEP|JOINT
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Exception in thread "main" java.lang.NumberFormatException: For input string: 
> "5 5"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:143)
>   at 
> edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:240)
>   at edu.berkeley.nlp.fig.basic.OptInfo.set(OptionsParser.java:294)
>   at 
> edu.berkeley.nlp.fig.basic.OptionsParser.readOptionsFile(OptionsParser.java:555)
>   at 
> edu.berkeley.nlp.fig.basic.OptionsParser.doParse(OptionsParser.java:604)
>   at edu.berkeley.nlp.fig.exec.Execution.init(Execution.java:293)
>   at edu.berkeley.nlp.wordAlignment.Main.main(Main.java:149)
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Cannot create directory: alignments/0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-304) word-align.conf alignment template file not compatible with berkeley aligner

2016-08-29 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446470#comment-15446470
 ] 

Matt Post commented on JOSHUA-304:
--

I emailed days ago but don't see that it posted here. You need to wipe out your 
old run and re-run. I can see that the new versions of the commands were not 
run.

> word-align.conf alignment template file not compatible with berkeley aligner
> 
>
> Key: JOSHUA-304
> URL: https://issues.apache.org/jira/browse/JOSHUA-304
> Project: Joshua
>  Issue Type: Bug
>  Components: alignment, berkeley, templates
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Priority: Blocker
> Fix For: 6.1
>
>
> It takes me quite some time to debug what was going on and why pipeline's 
> were failing when using the berkeley aligner.
> It turns out that the word-align.conf template provided at
> https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf
> is not compatible with the berkeley aligner. 
> In particular the following lines are non compatible
> https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf#L12-L15
> Evidence of this is provided below
> {code}
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1, HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'JOINT JOINT'; valid choices: FORWARD|REVERSE|BOTH_INDEP|JOINT
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Exception in thread "main" java.lang.NumberFormatException: For input string: 
> "5 5"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:143)
>   at 
> edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:240)
>   at edu.berkeley.nlp.fig.basic.OptInfo.set(OptionsParser.java:294)
>   at 
> edu.berkeley.nlp.fig.basic.OptionsParser.readOptionsFile(OptionsParser.java:555)
>   at 
> edu.berkeley.nlp.fig.basic.OptionsParser.doParse(OptionsParser.java:604)
>   at edu.berkeley.nlp.fig.exec.Execution.init(Execution.java:293)
>   at edu.berkeley.nlp.wordAlignment.Main.main(Main.java:149)
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Cannot create directory: alignments/0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-272) Simplify the packing and usage of phrase-based grammars

2016-08-24 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post resolved JOSHUA-272.
--
Resolution: Fixed

> Simplify the packing and usage of phrase-based grammars
> ---
>
> Key: JOSHUA-272
> URL: https://issues.apache.org/jira/browse/JOSHUA-272
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.1
>
>
> For historical reasons, phrase-based grammars add some complexity to 
> decoding. The complete tree under each top-level trie node in packed grammars 
> has to fit within a single packed grammars slice, which is limited to 2 GB 
> due to constraints on the size of Java byte[] arrays. We used to sort on just 
> the first item in the trie, which was a problem for phrase-based decoding, 
> since phrase-based rules are implemented as left-branching hierarchical 
> rules. In order to pack large grammars, we packed them without the leading 
> [X,1], and then added it when loading the grammars, both for the packed and 
> memory-based grammars. This was a real mess.
> This was all fixed with a commit a while ago that packs and reads packed 
> grammars based on the first two symbols on the source side. So we should 
> remove all the complexity associated with phrases. They should just be 
> regular rules. There is also a lot of redundancy across the codebase in 
> parsing rules, converting them to different formats, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-272) Simplify the packing and usage of phrase-based grammars

2016-08-24 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435616#comment-15435616
 ] 

Matt Post commented on JOSHUA-272:
--

Phrase-based decoding has been changed to no longer use left-branching rules, 
so this no longer applies.

> Simplify the packing and usage of phrase-based grammars
> ---
>
> Key: JOSHUA-272
> URL: https://issues.apache.org/jira/browse/JOSHUA-272
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.1
>
>
> For historical reasons, phrase-based grammars add some complexity to 
> decoding. The complete tree under each top-level trie node in packed grammars 
> has to fit within a single packed grammars slice, which is limited to 2 GB 
> due to constraints on the size of Java byte[] arrays. We used to sort on just 
> the first item in the trie, which was a problem for phrase-based decoding, 
> since phrase-based rules are implemented as left-branching hierarchical 
> rules. In order to pack large grammars, we packed them without the leading 
> [X,1], and then added it when loading the grammars, both for the packed and 
> memory-based grammars. This was a real mess.
> This was all fixed with a commit a while ago that packs and reads packed 
> grammars based on the first two symbols on the source side. So we should 
> remove all the complexity associated with phrases. They should just be 
> regular rules. There is also a lot of redundancy across the codebase in 
> parsing rules, converting them to different formats, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-272) Simplify the packing and usage of phrase-based grammars

2016-08-24 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post updated JOSHUA-272:
-
Fix Version/s: (was: 6.2)
   6.1

> Simplify the packing and usage of phrase-based grammars
> ---
>
> Key: JOSHUA-272
> URL: https://issues.apache.org/jira/browse/JOSHUA-272
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.1
>
>
> For historical reasons, phrase-based grammars add some complexity to 
> decoding. The complete tree under each top-level trie node in packed grammars 
> has to fit within a single packed grammars slice, which is limited to 2 GB 
> due to constraints on the size of Java byte[] arrays. We used to sort on just 
> the first item in the trie, which was a problem for phrase-based decoding, 
> since phrase-based rules are implemented as left-branching hierarchical 
> rules. In order to pack large grammars, we packed them without the leading 
> [X,1], and then added it when loading the grammars, both for the packed and 
> memory-based grammars. This was a real mess.
> This was all fixed with a commit a while ago that packs and reads packed 
> grammars based on the first two symbols on the source side. So we should 
> remove all the complexity associated with phrases. They should just be 
> regular rules. There is also a lot of redundancy across the codebase in 
> parsing rules, converting them to different formats, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-304) word-align.conf alignment template file not compatible with berkeley aligner

2016-08-24 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435601#comment-15435601
 ] 

Matt Post commented on JOSHUA-304:
--

I just pushed up some changes that should fix this. Give it a look? It's on the 
JOSHUA-309 branch. It passes my tests.

> word-align.conf alignment template file not compatible with berkeley aligner
> 
>
> Key: JOSHUA-304
> URL: https://issues.apache.org/jira/browse/JOSHUA-304
> Project: Joshua
>  Issue Type: Bug
>  Components: alignment, berkeley, templates
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Priority: Blocker
> Fix For: 6.1
>
>
> It takes me quite some time to debug what was going on and why pipeline's 
> were failing when using the berkeley aligner.
> It turns out that the word-align.conf template provided at
> https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf
> is not compatible with the berkeley aligner. 
> In particular the following lines are non compatible
> https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf#L12-L15
> Evidence of this is provided below
> {code}
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1, HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'JOINT JOINT'; valid choices: FORWARD|REVERSE|BOTH_INDEP|JOINT
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Exception in thread "main" java.lang.NumberFormatException: For input string: 
> "5 5"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:143)
>   at 
> edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:240)
>   at edu.berkeley.nlp.fig.basic.OptInfo.set(OptionsParser.java:294)
>   at 
> edu.berkeley.nlp.fig.basic.OptionsParser.readOptionsFile(OptionsParser.java:555)
>   at 
> edu.berkeley.nlp.fig.basic.OptionsParser.doParse(OptionsParser.java:604)
>   at edu.berkeley.nlp.fig.exec.Execution.init(Execution.java:293)
>   at edu.berkeley.nlp.wordAlignment.Main.main(Main.java:149)
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Cannot create directory: alignments/0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-278) Alignments printed incorrectly for phrase-based decoder

2016-08-24 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post resolved JOSHUA-278.
--
Resolution: Fixed

> Alignments printed incorrectly for phrase-based decoder
> ---
>
> Key: JOSHUA-278
> URL: https://issues.apache.org/jira/browse/JOSHUA-278
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.1
>
>
> Type this to see the bug:
> echo YUP | $JOSHUA/bin/joshua -lowercase -search stack -project-case 
> -output-format "%s ||| %f ||| %a"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-278) Alignments printed incorrectly for phrase-based decoder

2016-08-24 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434883#comment-15434883
 ] 

Matt Post commented on JOSHUA-278:
--

This has been fixed in master by removing the alignment points in the 
BEGIN_RULE and END_RULE.

> Alignments printed incorrectly for phrase-based decoder
> ---
>
> Key: JOSHUA-278
> URL: https://issues.apache.org/jira/browse/JOSHUA-278
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.1
>
>
> Type this to see the bug:
> echo YUP | $JOSHUA/bin/joshua -lowercase -search stack -project-case 
> -output-format "%s ||| %f ||| %a"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-284) Phrase-based decoding changes

2016-08-22 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post resolved JOSHUA-284.
--
Resolution: Fixed

> Phrase-based decoding changes
> -
>
> Key: JOSHUA-284
> URL: https://issues.apache.org/jira/browse/JOSHUA-284
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.1
>
>
> Joshua's phrase-based decoding creates a lot of complications in the pipeline.
> Currently, phrase-based rules are simply left-branching Hiero rules. This 
> means that, prior to packing or loading, rules have to have a nonterminal 
> prepended to them. For example, Thrax will extract
> [X] ||| yo quiero ||| i want ||| ...
> This has to be changed to
> [X] ||| [X,1] yo quiero ||| [X,1] yo quiero ||| ...
> This means, for one, that phrase tables share a format but are specific to 
> either the hiero or phrase-based decoder.
> Another problem is that the alignments have to be adjusted when packing 
> grammars from Moses or Thrax format, since a symbol is being added. 
> Basically, this choice introduces a host of incompatibilities that require 
> special handling.
> A better idea would be to change the phrase-based decoder a bit so that, 
> instead of using left-branching phrase rules, it made use of proper glue 
> rules, the same way Hiero does. The advantages are:
> - both formalisms would use the same format
> - both formalisms would have a glue grammar
> - there should be no impact in running time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-282) %S output format doesn't remove

2016-08-22 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431656#comment-15431656
 ] 

Matt Post commented on JOSHUA-282:
--

This is fixed with commit bf12adc8b8e130c9f9addc69f47e9cf7e0774f72, which will 
be merged into master for 6.1.

> %S output format doesn't remove 
> ---
>
> Key: JOSHUA-282
> URL: https://issues.apache.org/jira/browse/JOSHUA-282
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.1
>
>
> Using -output-format %S with the phrase-based decoder prevents removal of the 
>  and  tags.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-282) %S output format doesn't remove

2016-08-22 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post resolved JOSHUA-282.
--
Resolution: Fixed

> %S output format doesn't remove 
> ---
>
> Key: JOSHUA-282
> URL: https://issues.apache.org/jira/browse/JOSHUA-282
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.1
>
>
> Using -output-format %S with the phrase-based decoder prevents removal of the 
>  and  tags.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-221) ArrayIndexOutOfBoundsException when passing arguments to JoshuaDecoder.main

2016-08-19 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428114#comment-15428114
 ] 

Matt Post commented on JOSHUA-221:
--

That's done so that the same framework can be used to parse both the config 
file and the command line options. The goal is that there are hard-coded 
defaults, which can be overridden by parameters in the config file, which in 
turn can be overridden by the command line. 

I haven't looked into different frameworks, but I do want to be able to 
override any config-file parameter from the command line using the same names.

> ArrayIndexOutOfBoundsException when passing arguments to JoshuaDecoder.main
> ---
>
> Key: JOSHUA-221
> URL: https://issues.apache.org/jira/browse/JOSHUA-221
> Project: Joshua
>  Issue Type: Bug
>Reporter: Lewis John McGibbney
> Fix For: 6.2
>
>
> {code}
> lmcgibbn@LMC-032857 /usr/local/joshua(master) $ java -jar class/joshua.jar 
> -version
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
>   at joshua.decoder.ArgsParser.(ArgsParser.java:43)
>   at joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:30)
> lmcgibbn@LMC-032857 /usr/local/joshua(master) $ java -jar class/joshua.jar 
> -version -v
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
>   at joshua.decoder.ArgsParser.(ArgsParser.java:43)
>   at joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:30)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-221) ArrayIndexOutOfBoundsException when passing arguments to JoshuaDecoder.main

2016-08-18 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426484#comment-15426484
 ] 

Matt Post commented on JOSHUA-221:
--

That sounds like a nice general way to solve this problem.

> ArrayIndexOutOfBoundsException when passing arguments to JoshuaDecoder.main
> ---
>
> Key: JOSHUA-221
> URL: https://issues.apache.org/jira/browse/JOSHUA-221
> Project: Joshua
>  Issue Type: Bug
>Reporter: Lewis John McGibbney
> Fix For: 6.2
>
>
> {code}
> lmcgibbn@LMC-032857 /usr/local/joshua(master) $ java -jar class/joshua.jar 
> -version
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
>   at joshua.decoder.ArgsParser.(ArgsParser.java:43)
>   at joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:30)
> lmcgibbn@LMC-032857 /usr/local/joshua(master) $ java -jar class/joshua.jar 
> -version -v
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
>   at joshua.decoder.ArgsParser.(ArgsParser.java:43)
>   at joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:30)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-292) Add travis CI build status badge to README.md

2016-08-18 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post resolved JOSHUA-292.
--
Resolution: Fixed

> Add travis CI build status badge to README.md
> -
>
> Key: JOSHUA-292
> URL: https://issues.apache.org/jira/browse/JOSHUA-292
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Max Thomas
>Priority: Trivial
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> Would be nice to see the status of the latest master branch build from the 
> README - many projects do this already. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-300) Lexicalized reordering

2016-08-17 Thread Matt Post (JIRA)
Matt Post created JOSHUA-300:


 Summary: Lexicalized reordering
 Key: JOSHUA-300
 URL: https://issues.apache.org/jira/browse/JOSHUA-300
 Project: Joshua
  Issue Type: New Feature
Reporter: Matt Post
 Fix For: 6.2


Joshua's phrase-based system should have a lexicalized reordering system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-299) Move regression tests to proper unit tests

2016-08-17 Thread Matt Post (JIRA)
Matt Post created JOSHUA-299:


 Summary: Move regression tests to proper unit tests
 Key: JOSHUA-299
 URL: https://issues.apache.org/jira/browse/JOSHUA-299
 Project: Joshua
  Issue Type: Bug
Reporter: Matt Post
 Fix For: 6.1


Many of the regression tests (test*.sh under src/test/resources) have been 
moved to proper unit tests, but this move should be completed, and the 
regression tests should be deleted. This should be done for 6.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-291) Improve code quality via static analysis

2016-08-17 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424280#comment-15424280
 ] 

Matt Post commented on JOSHUA-291:
--

I think either is fine. If you go the sub ticket route, it would be good to 
reference this ticket in the discussion.

One thing to be wary of: we are working on issues for the impending 6.1 
release, but are going to close that out soon. The 7 release (I just created a 
branch) involves a substantial multi-module support change to the codebase. So 
to avoid complex merges in the near future, you should (a) convince us these 
changes should go into 6.1, or (b) start basing your changes off the 7 branch. 
Once 6.1 is released in early September, we'll merge 7 back into master.

> Improve code quality via static analysis
> 
>
> Key: JOSHUA-291
> URL: https://issues.apache.org/jira/browse/JOSHUA-291
> Project: Joshua
>  Issue Type: Improvement
>  Components: core
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
>
> We can improve code quality / readability leveraging code analysis from tools 
> like FindBugs and others integrated in IDEs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-297) List supported versions of Hadoop

2016-08-17 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post updated JOSHUA-297:
-
Fix Version/s: 6.1

> List supported versions of Hadoop
> -
>
> Key: JOSHUA-297
> URL: https://issues.apache.org/jira/browse/JOSHUA-297
> Project: Joshua
>  Issue Type: Task
>Reporter: Bob Paulin
>Assignee: Matt Post
>Priority: Minor
> Fix For: 6.1
>
>
> When working through the training tutorial I noticed that no version of 
> Hadoop was listed so I tried the latest Hadoop 2.6.4.  The Thrax Job failed 
> on this version.  It worked however with 0.20.2 .  I found this on 
> http://joshua.incubator.apache.org/6.0/pipeline.html by hovering over a link 
> on the Hadoop section.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (JOSHUA-297) List supported versions of Hadoop

2016-08-17 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post reassigned JOSHUA-297:


Assignee: Matt Post

> List supported versions of Hadoop
> -
>
> Key: JOSHUA-297
> URL: https://issues.apache.org/jira/browse/JOSHUA-297
> Project: Joshua
>  Issue Type: Task
>Reporter: Bob Paulin
>Assignee: Matt Post
>Priority: Minor
> Fix For: 6.1
>
>
> When working through the training tutorial I noticed that no version of 
> Hadoop was listed so I tried the latest Hadoop 2.6.4.  The Thrax Job failed 
> on this version.  It worked however with 0.20.2 .  I found this on 
> http://joshua.incubator.apache.org/6.0/pipeline.html by hovering over a link 
> on the Hadoop section.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-297) List supported versions of Hadoop

2016-08-17 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424262#comment-15424262
 ] 

Matt Post commented on JOSHUA-297:
--

Hmm, this is strange; we are using 2.6.4 just fine. Can you elaborate on what 
happens? Also, how do you have two different Hadoop versions? Are you setting 
up in standalone mode?

I wonder if was a Java version you were using was the problem? I have run into 
issues where Thrax gets compiled with Java 8 (our default), but Hadoop was 
using Java 7.

> List supported versions of Hadoop
> -
>
> Key: JOSHUA-297
> URL: https://issues.apache.org/jira/browse/JOSHUA-297
> Project: Joshua
>  Issue Type: Task
>Reporter: Bob Paulin
>Priority: Minor
>
> When working through the training tutorial I noticed that no version of 
> Hadoop was listed so I tried the latest Hadoop 2.6.4.  The Thrax Job failed 
> on this version.  It worked however with 0.20.2 .  I found this on 
> http://joshua.incubator.apache.org/6.0/pipeline.html by hovering over a link 
> on the Hadoop section.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-298) Joshua Tutoral command contains a typo

2016-08-17 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424258#comment-15424258
 ] 

Matt Post commented on JOSHUA-298:
--

Fixed, thanks!

> Joshua Tutoral command contains a typo
> --
>
> Key: JOSHUA-298
> URL: https://issues.apache.org/jira/browse/JOSHUA-298
> Project: Joshua
>  Issue Type: Task
>Reporter: Bob Paulin
>Priority: Minor
>
> On the following page: 
> https://cwiki.apache.org/confluence/display/JOSHUA/Joshua+Tutorial
> There are 3 typos.
> 1) 
> {code} 
> $JOSHUA/bin/pipeline.pl \
>   --rundir 2 \
>   --readme "Baseline phrase run" \
>   --source es \
>   --target en \
>   --type phrase \
>   --corpus $FISHER/corpus/ldc/fisher_train \
>   --tune $FISHER/corpus/ldc/fisher_dev \
>   --test $FISHER/corpus/ldc/fisher_dev2 \
>   --maxlen 11 \
>   --maxlen-tune 11 \
>   --maxlent-test 11 \
>   --tuner-iterations 1 \
>   --lm-order 3
> {code}
> There is an extra t in the maxlent-test .  It should be:
> {code}
> $JOSHUA/bin/pipeline.pl \
>   --rundir 2 \
>   --readme "Baseline phrase run" \
>   --source es \
>   --target en \
>   --type phrase \
>   --corpus $FISHER/corpus/ldc/fisher_train \
>   --tune $FISHER/corpus/ldc/fisher_dev \
>   --test $FISHER/corpus/ldc/fisher_dev2 \
>   --maxlen 11 \
>   --maxlen-tune 11 \
>   --maxlen-test 11 \
>   --tuner-iterations 1 \
>   --lm-order 3
> {code}
> Also within the third run there is the extra t again and a missing hyphen
> {code}
> $JOSHUA/bin/pipeline.pl \
>   --rundir 3 \
>   --readme "Baseline phrase run, picking up from run 1" \
>   --source es \
>   --target en \
>   --type hiero \
>   --first step model --no-prepare \
>   --alignment 1/alignments/training.align \
>   --corpus 1/data/train/corpus \
>   --tune 1/data/tune/corpus  \
>   --test 1/data/test/corpus \
>   --maxlen 11 \
>   --maxlen-tune 11 \
>   --maxlent-test 11 \
>   --tuner-iterations 1 \
>   --lm-order 3
> {code}
> Should be:
> {code}
> $JOSHUA/bin/pipeline.pl \
>   --rundir 3 \
>   --readme "Baseline phrase run, picking up from run 1" \
>   --source es \
>   --target en \
>   --type hiero \
>   --first-step model --no-prepare \
>   --alignment 1/alignments/training.align \
>   --corpus 1/data/train/corpus \
>   --tune 1/data/tune/corpus  \
>   --test 1/data/test/corpus \
>   --maxlen 11 \
>   --maxlen-tune 11 \
>   --maxlen-test 11 \
>   --tuner-iterations 1 \
>   --lm-order 3
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-273) Joshua API

2016-08-15 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post updated JOSHUA-273:
-
Fix Version/s: (was: 7)
   6.2

> Joshua API
> --
>
> Key: JOSHUA-273
> URL: https://issues.apache.org/jira/browse/JOSHUA-273
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Matt Post
> Fix For: 6.2
>
>
> We have a lot of work to do to clean up the decoder's internal object 
> pipeline in order to create a nice, clean API.
> (This is just a stub for this issue; I will return soon with a better 
> description and roadmap. Others feel free to edit, as well).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-261) Remove ext directory from source tree

2016-08-14 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post updated JOSHUA-261:
-
Fix Version/s: (was: 6.2)
   6.1

> Remove ext directory from source tree
> -
>
> Key: JOSHUA-261
> URL: https://issues.apache.org/jira/browse/JOSHUA-261
> Project: Joshua
>  Issue Type: Task
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Assignee: Matt Post
>Priority: Blocker
> Fix For: 6.1
>
>
> Right now we have a bunch of cofe bundled in to the 
> [ext|https://github.com/apache/incubator-joshua/tree/master/ext] directory. I 
> don't think any of this code can be shipped with an Apache Joshua 
> (Incubating) release so we need to think about a mechanism for removing it 
> and making Joshua work in other ways.
> Here is a partial roadmap:
> [X] remove GIZA++ and symal
> [ ] update [the developer 
> documentation|https://cwiki.apache.org/confluence/display/JOSHUA/Development] 
> to describe how to install them and put them in the path
> [ ] update the pipeline scripts to not be hard-coded to $JOSHUA/bin
> [X] update the build files to not try to build them



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (JOSHUA-261) Remove ext directory from source tree

2016-08-14 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post reassigned JOSHUA-261:


Assignee: Matt Post

> Remove ext directory from source tree
> -
>
> Key: JOSHUA-261
> URL: https://issues.apache.org/jira/browse/JOSHUA-261
> Project: Joshua
>  Issue Type: Task
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Assignee: Matt Post
>Priority: Blocker
> Fix For: 6.2
>
>
> Right now we have a bunch of cofe bundled in to the 
> [ext|https://github.com/apache/incubator-joshua/tree/master/ext] directory. I 
> don't think any of this code can be shipped with an Apache Joshua 
> (Incubating) release so we need to think about a mechanism for removing it 
> and making Joshua work in other ways.
> Here is a partial roadmap:
> [X] remove GIZA++ and symal
> [ ] update [the developer 
> documentation|https://cwiki.apache.org/confluence/display/JOSHUA/Development] 
> to describe how to install them and put them in the path
> [ ] update the pipeline scripts to not be hard-coded to $JOSHUA/bin
> [X] update the build files to not try to build them



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-249) Joshua Logo

2016-08-14 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post resolved JOSHUA-249.
--
Resolution: Fixed

> Joshua Logo
> ---
>
> Key: JOSHUA-249
> URL: https://issues.apache.org/jira/browse/JOSHUA-249
> Project: Joshua
>  Issue Type: Task
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 6.1
>
> Attachments: apache_joshua_logo.png, apache_joshua_logo.xcf
>
>
> As we discussed on the mailing lists, this issue should gather all proposed 
> Joshua logo's so we can VOTE on one or more of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-249) Joshua Logo

2016-08-13 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420089#comment-15420089
 ] 

Matt Post commented on JOSHUA-249:
--

I don't have any better proposals, and have been using the attached ones, so 
perhaps we can consider this resolved, and maybe revisit it at some later date 
if we want to change it.

> Joshua Logo
> ---
>
> Key: JOSHUA-249
> URL: https://issues.apache.org/jira/browse/JOSHUA-249
> Project: Joshua
>  Issue Type: Task
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 6.1
>
> Attachments: apache_joshua_logo.png, apache_joshua_logo.xcf
>
>
> As we discussed on the mailing lists, this issue should gather all proposed 
> Joshua logo's so we can VOTE on one or more of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-283) Implement fast_align as one of the available alignment options

2016-08-13 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post updated JOSHUA-283:
-
Fix Version/s: (was: 6.1)
   6.2

> Implement fast_align as one of the available alignment options
> --
>
> Key: JOSHUA-283
> URL: https://issues.apache.org/jira/browse/JOSHUA-283
> Project: Joshua
>  Issue Type: Bug
>  Components: alignment, pipeline
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 6.2
>
>
> For some time now, I've been having issues using GIZA++ for alignment whilst 
> running a Joshua pipeline.
> Whilst looking for an alternative [~post] and [~kellen.sunderland] mentioned 
> the berkeley aligner and fast_align respectively.
> Due to the fact that 1) berkeley aligner has not been touched in ~9 years, 
> and 2) no artifact currently exists on Maven Central, I am taking the advice 
> and attempting to use fast_align.
> This issue will augment the alignment code in Joshua to permit use of 
> fast_align which is ALv2.0 licensed.
> https://github.com/clab/fast_align 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-286) Remove presence of all joshua-decoder.org links in codebase

2016-08-13 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post resolved JOSHUA-286.
--
Resolution: Fixed

> Remove presence of all joshua-decoder.org links in codebase
> ---
>
> Key: JOSHUA-286
> URL: https://issues.apache.org/jira/browse/JOSHUA-286
> Project: Joshua
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
> Fix For: 6.1
>
>
> Right now, joshua-decoder.org exists in the following files, we should remove 
> it and replace it with joshua.apache.org
> {code}
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua(JOSHUA-283) $ grep -lr 
> "joshua-decoder.org" .
> ./.gitignore
> ./CHANGELOG
> ./doc/mainpage.md
> ./scripts/support/make-release.sh
> ./src/main/java/org/apache/joshua/decoder/Decoder.java
> ./src/main/java/org/apache/joshua/decoder/ff/lm/KenLM.java
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-292) Add travis CI build status badge to README.md

2016-08-11 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418164#comment-15418164
 ] 

Matt Post commented on JOSHUA-292:
--

Sounds cool; can you point me to one such project so I can see how it's done?

> Add travis CI build status badge to README.md
> -
>
> Key: JOSHUA-292
> URL: https://issues.apache.org/jira/browse/JOSHUA-292
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Max Thomas
>Priority: Trivial
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> Would be nice to see the status of the latest master branch build from the 
> README - many projects do this already. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (JOSHUA-288) Port fast_align to java

2016-08-04 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post reassigned JOSHUA-288:


Assignee: Matt Post

> Port fast_align to java
> ---
>
> Key: JOSHUA-288
> URL: https://issues.apache.org/jira/browse/JOSHUA-288
> Project: Joshua
>  Issue Type: New Feature
>Reporter: Matt Post
>Assignee: Matt Post
>Priority: Minor
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> It would be great to have a Java port of fast_align, so that we don't have to 
> worry about compiling it, and could distribute it via Maven.
> https://github.com/clab/fast_align



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-107) Verbosity levels

2016-08-04 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post resolved JOSHUA-107.
--
Resolution: Fixed

Completed a while ago.

> Verbosity levels
> 
>
> Key: JOSHUA-107
> URL: https://issues.apache.org/jira/browse/JOSHUA-107
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.2
>
>
> Joshua should support verbosity levels with a command-line switch, so it's 
> easy to shut it up with something like {{-v 0}} or {{-q}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (JOSHUA-22) Parallelize MBR computation

2016-08-04 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-22?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post closed JOSHUA-22.
---
Resolution: Fixed

Closing this issue as something unlikely to be done any time soon, and also as 
something that is more properly done inside the decoder over the hypergraph.

> Parallelize MBR computation
> ---
>
> Key: JOSHUA-22
> URL: https://issues.apache.org/jira/browse/JOSHUA-22
> Project: Joshua
>  Issue Type: Bug
>Reporter: Joshua Decoder
> Fix For: 6.2
>
>
> MBR should be multithreaded.  This would be easy to add following the model 
> used in the InputManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (JOSHUA-95) Vocabulary locking

2016-08-04 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post closed JOSHUA-95.
---
Resolution: Fixed

This is a very old issue that I believe has been fixed.

> Vocabulary locking
> --
>
> Key: JOSHUA-95
> URL: https://issues.apache.org/jira/browse/JOSHUA-95
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>Assignee: Juri Ganitkevitch
> Fix For: 6.2
>
>
> Vocabulary::id() is still synchronized and a potential point of contention. 
> It would be nice to resolve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (JOSHUA-100) Add Shen et al. (2008) dependency LM

2016-08-04 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post closed JOSHUA-100.

Resolution: Fixed

> Add Shen et al. (2008) dependency LM
> 
>
> Key: JOSHUA-100
> URL: https://issues.apache.org/jira/browse/JOSHUA-100
> Project: Joshua
>  Issue Type: New Feature
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.2
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-288) Port fast_align to java

2016-07-28 Thread Matt Post (JIRA)
Matt Post created JOSHUA-288:


 Summary: Port fast_align to java
 Key: JOSHUA-288
 URL: https://issues.apache.org/jira/browse/JOSHUA-288
 Project: Joshua
  Issue Type: New Feature
Reporter: Matt Post
Priority: Minor


It would be great to have a Java port of fast_align, so that we don't have to 
worry about compiling it, and could distribute it via Maven.

https://github.com/clab/fast_align



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-285) Not all RuntimeExceptions are caught

2016-07-27 Thread Matt Post (JIRA)
Matt Post created JOSHUA-285:


 Summary: Not all RuntimeExceptions are caught
 Key: JOSHUA-285
 URL: https://issues.apache.org/jira/browse/JOSHUA-285
 Project: Joshua
  Issue Type: Bug
Reporter: Matt Post
 Fix For: 6.1


In many instances Joshua threads will throw a RuntimeException that is not 
caught, causing the decoder to hang indefinitely. These should be caught and, 
if serious enough, cause the decoder to die. An example of an error that is 
caught is running out of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-284) Phrase-based decoding changes

2016-07-27 Thread Matt Post (JIRA)
Matt Post created JOSHUA-284:


 Summary: Phrase-based decoding changes
 Key: JOSHUA-284
 URL: https://issues.apache.org/jira/browse/JOSHUA-284
 Project: Joshua
  Issue Type: Bug
Reporter: Matt Post
 Fix For: 6.1


Joshua's phrase-based decoding creates a lot of complications in the pipeline.

Currently, phrase-based rules are simply left-branching Hiero rules. This means 
that, prior to packing or loading, rules have to have a nonterminal prepended 
to them. For example, Thrax will extract

[X] ||| yo quiero ||| i want ||| ...

This has to be changed to

[X] ||| [X,1] yo quiero ||| [X,1] yo quiero ||| ...

This means, for one, that phrase tables share a format but are specific to 
either the hiero or phrase-based decoder.

A better idea would be to change the phrase-based decoder a bit so that, 
instead of using left-branching phrase rules, it made use of proper glue rules, 
the same way Hiero does. The advantages are:

- both formalisms would use the same format
- both formalisms would have a glue grammar
- there should be no impact in running time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-282) %S output format doesn't remove

2016-07-18 Thread Matt Post (JIRA)
Matt Post created JOSHUA-282:


 Summary: %S output format doesn't remove 
 Key: JOSHUA-282
 URL: https://issues.apache.org/jira/browse/JOSHUA-282
 Project: Joshua
  Issue Type: New Feature
Reporter: Matt Post
Assignee: Matt Post
 Fix For: 6.1


Using -output-format %S with the phrase-based decoder prevents removal of the 
 and  tags.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (JOSHUA-71) OS X installation depends on coreutils to run thrax test

2016-07-11 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post closed JOSHUA-71.
---
Resolution: Fixed

> OS X installation depends on coreutils to run thrax test
> 
>
> Key: JOSHUA-71
> URL: https://issues.apache.org/jira/browse/JOSHUA-71
> Project: Joshua
>  Issue Type: Bug
>Reporter: Luke Orland
> Fix For: 6.2
>
>
> the {{gstat}} command from coreutils is not installed in Darwin by default. 
> One must resolve that dependency via Homebrew, Macports, etc.
> The {{test/thrax/test.sh}} test will fail on an OS X system that does not 
> have coreutils installed. We should either change the test so that it does 
> not require coreutils in Darwin or make it clear in the (developer) 
> installation/setup instructions that coreutils are required for this test, 
> check for coreutils when running the thrax test, and output a helpful message 
> instructing the developer to go install coreutils if {{gstat}} is not found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-71) OS X installation depends on coreutils to run thrax test

2016-07-11 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371386#comment-15371386
 ] 

Matt Post commented on JOSHUA-71:
-

I don't think this issue applies any more.

> OS X installation depends on coreutils to run thrax test
> 
>
> Key: JOSHUA-71
> URL: https://issues.apache.org/jira/browse/JOSHUA-71
> Project: Joshua
>  Issue Type: Bug
>Reporter: Luke Orland
> Fix For: 6.2
>
>
> the {{gstat}} command from coreutils is not installed in Darwin by default. 
> One must resolve that dependency via Homebrew, Macports, etc.
> The {{test/thrax/test.sh}} test will fail on an OS X system that does not 
> have coreutils installed. We should either change the test so that it does 
> not require coreutils in Darwin or make it clear in the (developer) 
> installation/setup instructions that coreutils are required for this test, 
> check for coreutils when running the thrax test, and output a helpful message 
> instructing the developer to go install coreutils if {{gstat}} is not found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-251) Address Website Branding Issues

2016-07-11 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post resolved JOSHUA-251.
--
   Resolution: Fixed
Fix Version/s: (was: 6.2)
   6.1

> Address Website Branding Issues
> ---
>
> Key: JOSHUA-251
> URL: https://issues.apache.org/jira/browse/JOSHUA-251
> Project: Joshua
>  Issue Type: Task
>Reporter: Lewis John McGibbney
>Priority: Critical
> Fix For: 6.1
>
>
> We have a number of Website branding issues which we need to address.
> http://www.apache.org/foundation/marks/pmcs.html#introduction
> Lets work through them here. Please create child issues if appropriate.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-251) Address Website Branding Issues

2016-07-11 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371380#comment-15371380
 ] 

Matt Post commented on JOSHUA-251:
--

I believe this is resolved with the following recent changes:

- Changed prominent mentions of "Joshua (Incubating)" to "Apache Joshua 
(Incubating)"
- Added the incubator disclaimer to the main page
- Added the incubator logo to the main page

> Address Website Branding Issues
> ---
>
> Key: JOSHUA-251
> URL: https://issues.apache.org/jira/browse/JOSHUA-251
> Project: Joshua
>  Issue Type: Task
>Reporter: Lewis John McGibbney
>Priority: Critical
> Fix For: 6.1
>
>
> We have a number of Website branding issues which we need to address.
> http://www.apache.org/foundation/marks/pmcs.html#introduction
> Lets work through them here. Please create child issues if appropriate.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-4) Quasi-synchronous grammar

2016-07-11 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371381#comment-15371381
 ] 

Matt Post commented on JOSHUA-4:


This is outside scope.

> Quasi-synchronous grammar
> -
>
> Key: JOSHUA-4
> URL: https://issues.apache.org/jira/browse/JOSHUA-4
> Project: Joshua
>  Issue Type: New Feature
>Reporter: Courtney Napoles
> Fix For: 6.2
>
>
> In the more long term, I think it would be worth looking into 
> quasi-synchronous grammar support in the decoder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-259) Integration tests are failing

2016-06-23 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347196#comment-15347196
 ] 

Matt Post commented on JOSHUA-259:
--

I agree. I think I'll close this.

> Integration tests are failing
> -
>
> Key: JOSHUA-259
> URL: https://issues.apache.org/jira/browse/JOSHUA-259
> Project: Joshua
>  Issue Type: Bug
>Reporter: Kellen Sunderland
> Fix For: 6.2
>
>
> Several integration tests are currently failing with Joshua.  I have a quick 
> fix coming for one of the tests but just in case we need more discussion 
> around the failures I'll open a bug.
> The currently failing tests for me:
> test/decoder/too-long
> test/server/http
> test/server/tcp-text
> test/thrax/extraction
> and 
> test/decoder/moses-compat (but this is easy to fix, simple extra space in the 
> expected file)
> These are failing under OS X 10.11.  If working under other environments feel 
> free to post a 'works for me'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-278) Alignments printed incorrectly for phrase-based decoder

2016-06-23 Thread Matt Post (JIRA)
Matt Post created JOSHUA-278:


 Summary: Alignments printed incorrectly for phrase-based decoder
 Key: JOSHUA-278
 URL: https://issues.apache.org/jira/browse/JOSHUA-278
 Project: Joshua
  Issue Type: Bug
Reporter: Matt Post
Assignee: Matt Post
 Fix For: 6.1


Type this to see the bug:

echo YUP | $JOSHUA/bin/joshua -lowercase -search stack -project-case 
-output-format "%s ||| %f ||| %a"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-273) Joshua API

2016-06-23 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346780#comment-15346780
 ] 

Matt Post commented on JOSHUA-273:
--

Okay, so in the near future, I plan to make some small changes here in the 
direction of having a better API. My plan is to 

- Extract the output writer code from KBestExtractor, moving it all to 
StructuredTranslationFactory
- Turn KBestExtractor into an iterator

There will likely be other changes that will get triggered with this, but I 
thought I would let you know these plans in case it triggers any comments.

> Joshua API
> --
>
> Key: JOSHUA-273
> URL: https://issues.apache.org/jira/browse/JOSHUA-273
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Matt Post
> Fix For: 7
>
>
> We have a lot of work to do to clean up the decoder's internal object 
> pipeline in order to create a nice, clean API.
> (This is just a stub for this issue; I will return soon with a better 
> description and roadmap. Others feel free to edit, as well).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >