Re: modernmt

2017-07-02 Thread John Hewitt
I found the reference for that 1,000,000 number a bit too late -- according to this more recent paper from Koehn, it's more like 15,000,000 tokens for NMT to meet phrase-based MT, and they omit syntax-based. https://arxiv.org/pdf/1706.03872.pdf -John On Sun, Jul 2, 2017 at 12:38 PM, John H

Re: modernmt

2017-07-02 Thread John Hewitt
I think we can identify with the NMT danger. I still think there > > is a > > > > big niche that deep learning approaches won't reach for a few years, > > > until > > > > GPUs become super prevalent. Which is why I like ModernMT's > approaches, > > > > whi

Re: [ANNOUNCE] - Apache Joshua 6.1 incubating release

2017-06-22 Thread John Hewitt
Related note: I've begun to announce to the Penn NLP communities; I can talk to Mark Liberman at the LDC about getting a note in there as well. -John On Thu, Jun 22, 2017 at 10:11 AM, lewis john mcgibbney wrote: > Hi Tommaso, > EXCELLENT :) > @Matt are you able to Tweet this out and make some t

Re: [VOTE] Release Apache Joshua 6.1 (Incubating) RC4

2017-04-26 Thread John Hewitt
is still in my queue so let's keep this open. > > > > matt > > > > > >> On Mar 16, 2017, at 8:56 PM, John Hewitt > wrote: > >> > >> Lewis is right about the week. Sorry, everyone. This week had a DARPA > >> meeting in Atlanta. I

Re: [VOTE] Release Apache Joshua 6.1 (Incubating) RC4

2017-03-16 Thread John Hewitt
Lewis is right about the week. Sorry, everyone. This week had a DARPA meeting in Atlanta. I'll get my +/-1 out tomorrow. -John On Thu, Mar 16, 2017 at 8:53 PM, Michael A. Hedderich < m...@michael-hedderich.de> wrote: > Hi, > > Thanks Tommaso for putting the release together! > > I was traveling

Re: [VOTE] Release Apache Joshua 6.1 (Incubating)

2017-03-01 Thread John Hewitt
Tommaso, thanks for the RC. Kellen, thanks for checking for the -1. -John On Wed, Mar 1, 2017 at 1:03 PM, kellen sunderland < kellen.sunderl...@gmail.com> wrote: > For a short term fix for the unit test we can delete lines 48 and 50 from > LMGrammarBerkeleyTest.java. > > A bit of a longer term s

Re: Any symal experts?

2017-01-09 Thread John Hewitt
ually clear). I think adding atools to your > port is the way to go, and that it's written in C++ should facilitate that. > > > > > > > > > >> On Nov 23, 2016, at 12:25 PM, John Hewitt > wrote: > >> > >> It'll be a headache beca

Re: modernmt

2016-12-01 Thread John Hewitt
I had a few good conversations over dinner with this team at AMTA in Austin in October. They seem to be in the interesting position where their work is good, but is in danger of being superseded by neural MT as they come out of the gate. Clearly, it has benefits over NMT, and is easier to adopt, bu

Re: [VOTE] Release Apache Joshua 6.1 RC#2

2016-11-25 Thread John Hewitt
+1 On Nov 24, 2016 02:04, "Tommaso Teofili" wrote: > +1 > > Tommaso > > Il giorno mer 23 nov 2016 alle ore 15:25 kellen sunderland < > kellen.sunderl...@gmail.com> ha scritto: > >> +1, many thanks Lewis. >> >> On Wed, Nov 23, 2016 at 2:34 PM, Matt Post wrote: >> >> > +1 Thanks, Lewis! >> > >> >

Re: Any symal experts?

2016-11-23 Thread John Hewitt
ed, Nov 23, 2016 at 12:18 PM, Matt Post wrote: > John — I suggest trying to ditch those GIZA++ tools entirely. fast_align > indeed replaced them with "atools"; how much work would it be to port that? > > > > On Nov 23, 2016, at 12:11 PM, John Hewitt > wrote: > &g

Any symal experts?

2016-11-23 Thread John Hewitt
Hey everyone, I'm packaging up a Java port Fast Align for Joshua and integrating it into the pipeline. Fast Align does not produce symmetrical alignments -- it relies on a tool that I haven't ported to Java. We package symal (which symmetricizes alignments) with Joshua right now for GIZA++, so I'm

Re: Updating Incubator summary

2016-11-17 Thread John Hewitt
@Matt, that sounds like an interesting goal. What's the hook? @Henri, that sounds good. I like the idea of showing people snippets, as MT isn't necessarily intuitive to the average Linux.com reader. On Thu, Nov 17, 2016 at 5:44 AM, Matt Post wrote: > My thinking on that roadmap was a comment Le

Re: [VOTE] Release Apache Joshua (Incubating) 6.1

2016-11-14 Thread John Hewitt
+1 Let's do it. -John On Mon, Nov 14, 2016 at 1:13 PM, kellen sunderland < kellen.sunderl...@gmail.com> wrote: > +1 . Thanks to Lewis and Matt for all the recent work. > > On Nov 14, 2016 7:11 PM, "Matt Post" wrote: > > +1 > > Thanks for starting this off, Lewis! > > > > On Nov 14, 2016, at 12

Re: Pipeline Mystery

2016-10-26 Thread John Hewitt
It seems like MERT isn't writing it's final config file (which is typical of MERT, in my experience). I recall giving up and using kbmira. This final config file is the one used in test, so I can see why skipping to test ends up failing pretty quick. To answer your question, though, I haven't trie

Re: Being realistic about memory usage

2016-10-26 Thread John Hewitt
+1 I've never used Joshua successfully without twiddling around with memory allowances. We'll put a nice warning up about the default memory usage, and an advisory about how to set the maximum lower if the user's box can't handle it. -John On Wed, Oct 26, 2016 at 10:52 PM, lewis john mcgibbney w

Re: openjdk 8 incompatibility

2016-10-25 Thread John Hewitt
in. > > > > On Oct 25, 2016, at 1:11 PM, John Hewitt wrote: > > > > Hi all, > > > > Has anyone been able to compile Joshua with openjdk? I get this message: > > > > /home/john/java/incubator-joshua/src/main/java/org/ > apache/joshua/decoder/ff/lm/

openjdk 8 incompatibility

2016-10-25 Thread John Hewitt
Hi all, Has anyone been able to compile Joshua with openjdk? I get this message: /home/john/java/incubator-joshua/src/main/java/org/apache/joshua/decoder/ff/lm/KenLM.java:[21,19] error: package javafx.scene does not exist And the following link seems to confirm that javafx is not a part of openj

[jira] [Commented] (JOSHUA-288) Port fast_align to java

2016-10-25 Thread John Hewitt (JIRA)
[ https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605667#comment-15605667 ] John Hewitt commented on JOSHUA-288: Replaced gnu-getopt (not Apache lic

[jira] [Updated] (JOSHUA-288) Port fast_align to java

2016-10-25 Thread John Hewitt (JIRA)
[ https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewitt updated JOSHUA-288: --- Description: It would be great to have a Java port of fast_align, so that we don't have to

Re: language pack #1

2016-10-05 Thread John Hewitt
if there's something we could do in the name of making it even clearer. (Potentially checking whether $JOSHUA is the same as $PWD after the directory change in prepare.sh, and printing a warning if it's not?) -John On Wed, Oct 5, 2016 at 11:32 PM, John Hewitt wrote: > Thanks, Matt! > >

Re: language pack #1

2016-10-05 Thread John Hewitt
Thanks, Matt! Some notes: When piping input into prepare.sh, I get the following output: WARNING: No known abbreviations for language 'es', attempting fall-back to English version... ERROR: No abbreviations files found in /nlp/users/johnhew/apache-joshua-es-en-2016-10-05/scripts/preparation/nonb

[jira] [Commented] (JOSHUA-288) Port fast_align to java

2016-10-02 Thread John Hewitt (JIRA)
[ https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15541397#comment-15541397 ] John Hewitt commented on JOSHUA-288: I'm moving to benchmark the port ag

[jira] [Updated] (JOSHUA-288) Port fast_align to java

2016-09-26 Thread John Hewitt (JIRA)
[ https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewitt updated JOSHUA-288: --- Assignee: John Hewitt > Port fast_align to java > --- > >

[jira] [Commented] (JOSHUA-288) Port fast_align to java

2016-09-08 Thread John Hewitt (JIRA)
[ https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15474276#comment-15474276 ] John Hewitt commented on JOSHUA-288: I've found what is possibly a b

[jira] [Commented] (JOSHUA-221) ArrayIndexOutOfBoundsException when passing arguments to JoshuaDecoder.main

2016-08-19 Thread John Hewitt (JIRA)
[ https://issues.apache.org/jira/browse/JOSHUA-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428097#comment-15428097 ] John Hewitt commented on JOSHUA-221: The current command line parsing scheme wr

[jira] [Comment Edited] (JOSHUA-288) Port fast_align to java

2016-08-13 Thread John Hewitt (JIRA)
[ https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420046#comment-15420046 ] John Hewitt edited comment on JOSHUA-288 at 8/13/16 7:3

[jira] [Commented] (JOSHUA-288) Port fast_align to java

2016-08-13 Thread John Hewitt (JIRA)
[ https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420046#comment-15420046 ] John Hewitt commented on JOSHUA-288: Existing direct port of fast_align to Java f

[GitHub] incubator-joshua issue #32: JOSHUA-286 - Replace old joshua-decoder.org link...

2016-07-28 Thread john-hewitt
Github user john-hewitt commented on the issue: https://github.com/apache/incubator-joshua/pull/32 @lewismc Improvements addressed. Happy to help. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] incubator-joshua pull request #32: JOSHUA-286 - Replace old joshua-decoder.o...

2016-07-27 Thread john-hewitt
GitHub user john-hewitt opened a pull request: https://github.com/apache/incubator-joshua/pull/32 JOSHUA-286 - Replace old joshua-decoder.org links with joshua.apache.org - Update links to documentation and support to reflect the move to Apache. - keep Gitignore entry for