[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9092 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-155248366 merging with master, branch-1.6 Thank you for the PR! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-155243776 **[Test build #45439 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45439/consoleFull)** for PR 9092 at commit [`2663cbf`](https://git

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-155244180 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-155239682 LGTM pending tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this f

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-155231560 **[Test build #45439 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45439/consoleFull)** for PR 9092 at commit [`2663cbf`](https://gith

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-155230865 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-155230847 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not h

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/9092#discussion_r44311675 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala --- @@ -100,10 +100,25 @@ class RegexTokenizer(override val uid: String) /

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-09 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-155150009 Looks good except that one outdated doc line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pr

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-154907032 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-154906966 **[Test build #45328 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45328/consoleFull)** for PR 9092 at commit [`43fd8e9`](https://git

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-154903263 **[Test build #45328 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45328/consoleFull)** for PR 9092 at commit [`43fd8e9`](https://gith

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-154902998 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-154902986 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not h

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-154861500 **[Test build #45314 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45314/consoleFull)** for PR 9092 at commit [`0c07366`](https://git

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-154861510 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-154859228 **[Test build #45314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45314/consoleFull)** for PR 9092 at commit [`0c07366`](https://gith

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-154858828 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-154858819 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not h

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-08 Thread hhbyyh
Github user hhbyyh commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-154832573 Yes, I agree. 1. Tokenizer and RegexTokenizer should have consistent behavior. 2. Whether to set toLower to true is a matter of preference. I assume for ML applica

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-07 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-154743585 I'm wondering now if we should set it to convert to lowercase by default. I know it breaks behavior, but otherwise, we'll introduce an inconsistency in the API (betwe

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/9092#discussion_r44216481 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/TokenizerSuite.scala --- @@ -69,6 +69,18 @@ class RegexTokenizerSuite extends SparkFunSuite with

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-11-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/9092#discussion_r44216479 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala --- @@ -100,10 +100,25 @@ class RegexTokenizer(override val uid: String) /

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-147640440 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-147640443 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-10-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-147639962 [Test build #43630 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43630/console) for PR 9092 at commit [`ce09ef5`](https://github.

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-10-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-147628224 [Test build #43630 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43630/consoleFull) for PR 9092 at commit [`ce09ef5`](https://gith

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-147627765 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9092#issuecomment-147627734 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not h

[GitHub] spark pull request: [SPARK-11069] [ML] Add RegexTokenizer option t...

2015-10-13 Thread hhbyyh
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/9092 [SPARK-11069] [ML] Add RegexTokenizer option to convert to lowercase jira: https://issues.apache.org/jira/browse/SPARK-11069 quotes from jira: Tokenizer converts strings to lowercase automat