[GitHub] [opennlp] jzonthemtn merged pull request #337: OPENNLP-1185: Tokenizers should be able to output a new line token

2022-04-02 Thread GitBox
jzonthemtn merged pull request #337: URL: https://github.com/apache/opennlp/pull/337 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr.

[GitHub] [opennlp] jzonthemtn commented on a change in pull request #337: OPENNLP-1185: Tokenizers should be able to output a new line token

2022-04-02 Thread GitBox
jzonthemtn commented on a change in pull request #337: URL: https://github.com/apache/opennlp/pull/337#discussion_r841110488 ## File path: opennlp-tools/src/main/java/opennlp/tools/tokenize/SimpleTokenizer.java ## @@ -101,4 +107,12 @@ else if (Character.isDigit(c)) { }

[GitHub] [opennlp] kinow commented on a change in pull request #337: OPENNLP-1185: Tokenizers should be able to output a new line token

2022-03-29 Thread GitBox
kinow commented on a change in pull request #337: URL: https://github.com/apache/opennlp/pull/337#discussion_r837801128 ## File path: opennlp-tools/src/main/java/opennlp/tools/tokenize/SimpleTokenizer.java ## @@ -101,4 +107,12 @@ else if (Character.isDigit(c)) { } re

OPENNLP-1185: Tokenizers should be able to output a new line token

2022-03-29 Thread Jeff Zemerick
There is a JIRA task [1] that Jörn wrote a few years ago that calls for allowing the tokenizers to output new line tokens and there is a PR [2] for it. The PR does not change the interfaces and just adds a keepNewLines boolean to the tokenizers. It doesn't look like this change would affect any ex