I found the reference for that 1,000,000 number a bit too late -- according
to this more recent paper from Koehn, it's more like 15,000,000 tokens for
NMT to meet phrase-based MT, and they omit syntax-based.
https://arxiv.org/pdf/1706.03872.pdf
-John
On Sun, Jul 2, 2017 at 12:38 PM, John H
I think we can identify with the NMT danger. I still think there
> > is a
> > > > big niche that deep learning approaches won't reach for a few years,
> > > until
> > > > GPUs become super prevalent. Which is why I like ModernMT's
> approaches,
> > > > whi
Related note: I've begun to announce to the Penn NLP communities; I can
talk to Mark Liberman at the LDC about getting a note in there as well.
-John
On Thu, Jun 22, 2017 at 10:11 AM, lewis john mcgibbney
wrote:
> Hi Tommaso,
> EXCELLENT :)
> @Matt are you able to Tweet this out and make some t
is still in my queue so let's keep this open.
> >
> > matt
> >
> >
> >> On Mar 16, 2017, at 8:56 PM, John Hewitt
> wrote:
> >>
> >> Lewis is right about the week. Sorry, everyone. This week had a DARPA
> >> meeting in Atlanta. I
Lewis is right about the week. Sorry, everyone. This week had a DARPA
meeting in Atlanta. I'll get my +/-1 out tomorrow.
-John
On Thu, Mar 16, 2017 at 8:53 PM, Michael A. Hedderich <
m...@michael-hedderich.de> wrote:
> Hi,
>
> Thanks Tommaso for putting the release together!
>
> I was traveling
Tommaso, thanks for the RC.
Kellen, thanks for checking for the -1.
-John
On Wed, Mar 1, 2017 at 1:03 PM, kellen sunderland <
kellen.sunderl...@gmail.com> wrote:
> For a short term fix for the unit test we can delete lines 48 and 50 from
> LMGrammarBerkeleyTest.java.
>
> A bit of a longer term s
ually clear). I think adding atools to your
> port is the way to go, and that it's written in C++ should facilitate that.
> >
> >
> >
> >
> >> On Nov 23, 2016, at 12:25 PM, John Hewitt
> wrote:
> >>
> >> It'll be a headache beca
I had a few good conversations over dinner with this team at AMTA in Austin
in October.
They seem to be in the interesting position where their work is good, but
is in danger of being superseded by neural MT as they come out of the gate.
Clearly, it has benefits over NMT, and is easier to adopt, bu
+1
On Nov 24, 2016 02:04, "Tommaso Teofili" wrote:
> +1
>
> Tommaso
>
> Il giorno mer 23 nov 2016 alle ore 15:25 kellen sunderland <
> kellen.sunderl...@gmail.com> ha scritto:
>
>> +1, many thanks Lewis.
>>
>> On Wed, Nov 23, 2016 at 2:34 PM, Matt Post wrote:
>>
>> > +1 Thanks, Lewis!
>> >
>> >
ed, Nov 23, 2016 at 12:18 PM, Matt Post wrote:
> John — I suggest trying to ditch those GIZA++ tools entirely. fast_align
> indeed replaced them with "atools"; how much work would it be to port that?
>
>
> > On Nov 23, 2016, at 12:11 PM, John Hewitt
> wrote:
> &g
Hey everyone,
I'm packaging up a Java port Fast Align for Joshua and integrating it into
the pipeline.
Fast Align does not produce symmetrical alignments -- it relies on a tool
that I haven't ported to Java.
We package symal (which symmetricizes alignments) with Joshua right now for
GIZA++, so I'm
@Matt, that sounds like an interesting goal. What's the hook?
@Henri, that sounds good. I like the idea of showing people snippets, as MT
isn't necessarily intuitive to the average Linux.com reader.
On Thu, Nov 17, 2016 at 5:44 AM, Matt Post wrote:
> My thinking on that roadmap was a comment Le
+1 Let's do it.
-John
On Mon, Nov 14, 2016 at 1:13 PM, kellen sunderland <
kellen.sunderl...@gmail.com> wrote:
> +1 . Thanks to Lewis and Matt for all the recent work.
>
> On Nov 14, 2016 7:11 PM, "Matt Post" wrote:
>
> +1
>
> Thanks for starting this off, Lewis!
>
>
> > On Nov 14, 2016, at 12
It seems like MERT isn't writing it's final config file (which is typical
of MERT, in my experience). I recall giving up and using kbmira. This final
config file is the one used in test, so I can see why skipping to test ends
up failing pretty quick.
To answer your question, though, I haven't trie
+1 I've never used Joshua successfully without twiddling around with memory
allowances. We'll put a nice warning up about the default memory usage, and
an advisory about how to set the maximum lower if the user's box can't
handle it.
-John
On Wed, Oct 26, 2016 at 10:52 PM, lewis john mcgibbney
w
in.
>
>
> > On Oct 25, 2016, at 1:11 PM, John Hewitt wrote:
> >
> > Hi all,
> >
> > Has anyone been able to compile Joshua with openjdk? I get this message:
> >
> > /home/john/java/incubator-joshua/src/main/java/org/
> apache/joshua/decoder/ff/lm/
Hi all,
Has anyone been able to compile Joshua with openjdk? I get this message:
/home/john/java/incubator-joshua/src/main/java/org/apache/joshua/decoder/ff/lm/KenLM.java:[21,19]
error: package javafx.scene does not exist
And the following link seems to confirm that javafx is not a part of
openj
[
https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605667#comment-15605667
]
John Hewitt commented on JOSHUA-288:
Replaced gnu-getopt (not Apache lic
[
https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
John Hewitt updated JOSHUA-288:
---
Description:
It would be great to have a Java port of fast_align, so that we don't have to
if there's something we could do in the name of making it even
clearer. (Potentially checking whether $JOSHUA is the same as $PWD after
the directory change in prepare.sh, and printing a warning if it's not?)
-John
On Wed, Oct 5, 2016 at 11:32 PM, John Hewitt wrote:
> Thanks, Matt!
>
>
Thanks, Matt!
Some notes:
When piping input into prepare.sh, I get the following output:
WARNING: No known abbreviations for language 'es', attempting fall-back to
English version...
ERROR: No abbreviations files found in
/nlp/users/johnhew/apache-joshua-es-en-2016-10-05/scripts/preparation/nonb
[
https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15541397#comment-15541397
]
John Hewitt commented on JOSHUA-288:
I'm moving to benchmark the port ag
[
https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
John Hewitt updated JOSHUA-288:
---
Assignee: John Hewitt
> Port fast_align to java
> ---
>
>
[
https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15474276#comment-15474276
]
John Hewitt commented on JOSHUA-288:
I've found what is possibly a b
[
https://issues.apache.org/jira/browse/JOSHUA-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428097#comment-15428097
]
John Hewitt commented on JOSHUA-221:
The current command line parsing scheme wr
[
https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420046#comment-15420046
]
John Hewitt edited comment on JOSHUA-288 at 8/13/16 7:3
[
https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420046#comment-15420046
]
John Hewitt commented on JOSHUA-288:
Existing direct port of fast_align to Java f
Github user john-hewitt commented on the issue:
https://github.com/apache/incubator-joshua/pull/32
@lewismc Improvements addressed. Happy to help.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
GitHub user john-hewitt opened a pull request:
https://github.com/apache/incubator-joshua/pull/32
JOSHUA-286 - Replace old joshua-decoder.org links with joshua.apache.org
- Update links to documentation and support to reflect the
move to Apache.
- keep Gitignore entry for
29 matches
Mail list logo