[GitHub] [opennlp] kinow commented on a change in pull request #337: OPENNLP-1185: Tokenizers should be able to output a new line token

2022-03-29 Thread GitBox
kinow commented on a change in pull request #337: URL: https://github.com/apache/opennlp/pull/337#discussion_r837801128 ## File path: opennlp-tools/src/main/java/opennlp/tools/tokenize/SimpleTokenizer.java ## @@ -101,4 +107,12 @@ else if (Character.isDigit(c)) { } re

OPENNLP-1185: Tokenizers should be able to output a new line token

2022-03-29 Thread Jeff Zemerick
There is a JIRA task [1] that Jörn wrote a few years ago that calls for allowing the tokenizers to output new line tokens and there is a PR [2] for it. The PR does not change the interfaces and just adds a keepNewLines boolean to the tokenizers. It doesn't look like this change would affect any ex

[GitHub] [opennlp] jzonthemtn commented on pull request #337: OPENNLP-1185 Implementation of an option to emit tokens for new line characters fo…

2022-03-29 Thread GitBox
jzonthemtn commented on pull request #337: URL: https://github.com/apache/opennlp/pull/337#issuecomment-1081922737 I built and tested this branch without problems. The changes don't affect the current behavior of the tokenizers nor change the interfaces. A very long time getting to this bu

[GitHub] [opennlp] jzonthemtn commented on pull request #353: OPENNLP-1264: Trivial fixes for building on Windows

2022-03-29 Thread GitBox
jzonthemtn commented on pull request #353: URL: https://github.com/apache/opennlp/pull/353#issuecomment-1081879805 I believe these issues have since been resolved for building on Windows. This branch fails the 4 BRAT tests mentioned in [OPENNLP-1358](https://issues.apache.org/jira/browse/O

[GitHub] [opennlp] jzonthemtn closed pull request #353: OPENNLP-1264: Trivial fixes for building on Windows

2022-03-29 Thread GitBox
jzonthemtn closed pull request #353: URL: https://github.com/apache/opennlp/pull/353 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr.

[GitHub] [opennlp] jzonthemtn commented on pull request #348: OPENNLP-1257 fixed splitting in LemmaSampleStream

2022-03-29 Thread GitBox
jzonthemtn commented on pull request #348: URL: https://github.com/apache/opennlp/pull/348#issuecomment-1081857893 I believe splitting the tags by spaces is the desired input. A task will be created to determine if the documentation needs updated. I am closing this pull request. Please com

[GitHub] [opennlp] jzonthemtn closed pull request #348: OPENNLP-1257 fixed splitting in LemmaSampleStream

2022-03-29 Thread GitBox
jzonthemtn closed pull request #348: URL: https://github.com/apache/opennlp/pull/348 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr.

[GitHub] [opennlp] jzonthemtn closed pull request #355: OPENNLP-1266 -- Limit regexes in UrlCharSequenceNormalizer

2022-03-29 Thread GitBox
jzonthemtn closed pull request #355: URL: https://github.com/apache/opennlp/pull/355 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr.

[GitHub] [opennlp] jzonthemtn commented on pull request #355: OPENNLP-1266 -- Limit regexes in UrlCharSequenceNormalizer

2022-03-29 Thread GitBox
jzonthemtn commented on pull request #355: URL: https://github.com/apache/opennlp/pull/355#issuecomment-1081851392 The previously referenced PR #399 was merged. Since this PR has been open for some time without updates I am going to close it. If there is a need to reopen it please feel fre