date:20220411

Re: Training of MaxEnt Model with large corpora fails with java.io.UTFDataFormatException

2022-04-11 Thread Jeff Zemerick

Great, thanks. I was able to reproduce the problem. I'll take a look and keep this thread updated. Thanks, Jeff On Mon, Apr 11, 2022 at 10:22 AM Zowalla, Richard < richard.zowa...@hs-heilbronn.de> wrote: > Hi Jeff, > > thanks for the quick reply. Here it is: > https://issues.apache.org/jira/brow

[GitHub] [opennlp] jzonthemtn opened a new pull request, #411: OPENNLP-1354: Fixing javadoc generation on Java 11.

2022-04-11 Thread GitBox

jzonthemtn opened a new pull request, #411: URL: https://github.com/apache/opennlp/pull/411 I ran into this issue while cutting the 2.0 release. Javadocs on JDK 11 would not build without these changes. The source 8 looks weird since we're on Java 11 but is a recommended workaround for the

Re: Training of MaxEnt Model with large corpora fails with java.io.UTFDataFormatException

2022-04-11 Thread Zowalla, Richard

Hi Jeff, thanks for the quick reply. Here it is: https://issues.apache.org/jira/browse/OPENNLP-1366 Using the treebank from Tübingen might not be feasable as it consumes around 2 TB RAM ;) - the mentioned link in the ticket points to a smaller dataset, which should reproduce the issue with a fea

Re: Training of MaxEnt Model with large corpora fails with java.io.UTFDataFormatException

2022-04-11 Thread Jeff Zemerick

Hi Richard, Thanks for reporting this. A Jira issue with steps to reproduce it would be fantastic. https://issues.apache.org/jira/projects/OPENNLP Please create one and reply back here with its ID once you do. I can take a look and see what can be done. Thanks, Jeff On Mon, Apr 11, 2022 at 8:47

Training of MaxEnt Model with large corpora fails with java.io.UTFDataFormatException

2022-04-11 Thread Zowalla, Richard

Hi all, we are working on training a large opennlp maxent model for lemmatizing German texts. We use a wikipedia tree bank from Tübingen. This works fine for mid size corpora (just need a little bit of RAM and time). However, we are running into the exception mentioned in [1]. Debugging into the

Training of MaxEnt Model with large corpora fails with java.io.UTFDataFormatException

2022-04-11 Thread Zowalla, Richard

Hi all, we are working on training a large opennlp maxent model for lemmatizing German texts. We use a wikipedia tree bank from Tübingen. This works fine for mid size corpora (just need a little bit of RAM and time). However, we are running into the exception mentioned in [1]. Debugging into the

[GitHub] [opennlp] jzonthemtn merged pull request #410: OPENNLP-1351: Moving onnx models for testing. Fixing expected value.

2022-04-11 Thread GitBox

jzonthemtn merged PR #410: URL: https://github.com/apache/opennlp/pull/410 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@opennlp.apach

[GitHub] [opennlp] jzonthemtn commented on a diff in pull request #410: OPENNLP-1351: Moving onnx models for testing. Fixing expected value.

2022-04-11 Thread GitBox

jzonthemtn commented on code in PR #410: URL: https://github.com/apache/opennlp/pull/410#discussion_r847234944 ## opennlp-dl/src/test/java/opennlp/dl/namefinder/NameFinderDLEval.java: ## @@ -54,7 +54,7 @@ public void tokenNameFinder1Test() throws Exception { Assert.assertEq

Re: Training of MaxEnt Model with large corpora fails with java.io.UTFDataFormatException

[GitHub] [opennlp] jzonthemtn opened a new pull request, #411: OPENNLP-1354: Fixing javadoc generation on Java 11.

Re: Training of MaxEnt Model with large corpora fails with java.io.UTFDataFormatException

Re: Training of MaxEnt Model with large corpora fails with java.io.UTFDataFormatException

Training of MaxEnt Model with large corpora fails with java.io.UTFDataFormatException

Training of MaxEnt Model with large corpora fails with java.io.UTFDataFormatException

[GitHub] [opennlp] jzonthemtn merged pull request #410: OPENNLP-1351: Moving onnx models for testing. Fixing expected value.

[GitHub] [opennlp] jzonthemtn commented on a diff in pull request #410: OPENNLP-1351: Moving onnx models for testing. Fixing expected value.

8 matches

Site Navigation

Mail list logo

Footer information