Joseph K. Bradley created SPARK-11069: -----------------------------------------
Summary: Add RegexTokenizer option to convert to lowercase Key: SPARK-11069 URL: https://issues.apache.org/jira/browse/SPARK-11069 Project: Spark Issue Type: New Feature Components: ML Reporter: Joseph K. Bradley Priority: Minor Tokenizer converts strings to lowercase automatically, but RegexTokenizer does not. It would be nice to add an option to RegexTokenizer to convert to lowercase. Proposal: * call the Boolean Param "toLowercase" * set default to false (so behavior does not change) *Q*: Should conversion to lowercase happen before or after regex matching? * Before: This is simpler. * After: This gives the user full control since they can have the regex treat upper/lower case differently. --> I'd vote for conversion before matching. If a user needs full control, they can convert to lowercase manually. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org