jzonthemtn merged pull request #337:
URL: https://github.com/apache/opennlp/pull/337
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr.
jzonthemtn commented on a change in pull request #337:
URL: https://github.com/apache/opennlp/pull/337#discussion_r841110488
##
File path:
opennlp-tools/src/main/java/opennlp/tools/tokenize/SimpleTokenizer.java
##
@@ -101,4 +107,12 @@ else if (Character.isDigit(c)) {
}
kinow commented on a change in pull request #337:
URL: https://github.com/apache/opennlp/pull/337#discussion_r837801128
##
File path:
opennlp-tools/src/main/java/opennlp/tools/tokenize/SimpleTokenizer.java
##
@@ -101,4 +107,12 @@ else if (Character.isDigit(c)) {
}
re
There is a JIRA task [1] that Jörn wrote a few years ago that calls for
allowing the tokenizers to output new line tokens and there is a PR [2] for
it.
The PR does not change the interfaces and just adds a keepNewLines boolean
to the tokenizers. It doesn't look like this change would affect any
ex