[GitHub] spark pull request: [SPARK-2316] Avoid O(blocks) operations in lis...

2014-08-01 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1679#issuecomment-50955955 I've merged this, thanks. It could be worth back porting into branch-1.0 as well, but I didn't do that yet. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-1470][SPARK-1842] Use the scala-logging...

2014-08-01 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1369#issuecomment-50955929 I accidentally merged this in lieu of another patch. The merge has been reverted. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-2316] Avoid O(blocks) operations in lis...

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1679#issuecomment-50955867 QA results for PR 1679:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds the following public classes (experimental):class StorageStatus(val blo

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

2014-08-01 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/1715#issuecomment-50955875 Discussed with @pwendell offline, record a summary here: 1. We leave the current option passing mode as is to keep downward compatibility 1. Introducing `--

[GitHub] spark pull request: [SPARK-1470][SPARK-1842] Use the scala-logging...

2014-08-01 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1369#issuecomment-50955760 Hey on this one - it's helpful to see what this looks like - but my instinct is actually to move away from scala-logging entirely. We can upgrade ourselves, but all that

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-08-01 Thread cfregly
Github user cfregly commented on a diff in the pull request: https://github.com/apache/spark/pull/1434#discussion_r15727693 --- Diff: dev/audit-release/audit_release.py --- @@ -105,7 +105,7 @@ def get_url(url): "spark-core", "spark-bagel", "spark-mllib", "spark-streaming",

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-08-01 Thread cfregly
Github user cfregly commented on a diff in the pull request: https://github.com/apache/spark/pull/1434#discussion_r15727694 --- Diff: dev/audit-release/sbt_app_kinesis/build.sbt --- @@ -0,0 +1,30 @@ +// +// Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-08-01 Thread cfregly
Github user cfregly commented on a diff in the pull request: https://github.com/apache/spark/pull/1434#discussion_r15727688 --- Diff: examples/src/main/scala/org/apache/spark/examples/streaming/KinesisWordCountASL.scala --- @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

2014-08-01 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/1723#discussion_r15727686 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -112,6 +112,7 @@ class StreamingContext private[streaming] (

[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-50955658 QA results for PR 1623:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds the following public classes (experimental):class AutoSerializer(Pickle

[GitHub] spark pull request: [SPARK-2478] [mllib] DecisionTree Python API

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1727#issuecomment-50955594 QA tests have started for PR 1727. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17758/consoleFull --- If

[GitHub] spark pull request: [SPARK-2454] Do not ship spark home to Workers

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1734#issuecomment-50955592 QA tests have started for PR 1734. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17757/consoleFull --- If

[GitHub] spark pull request: [SPARK-2454] Do not ship spark home to Workers

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1734#issuecomment-50955567 QA results for PR 1734:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-2478] [mllib] DecisionTree Python API

2014-08-01 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1727#issuecomment-50955507 @mengxr Hopefully good to go if Jenkins agrees. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-1740] [PySpark] kill the python worker

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1643#issuecomment-50955448 QA results for PR 1643:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-08-01 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/1434#discussion_r15727602 --- Diff: extras/kinesis-asl/pom.xml --- @@ -0,0 +1,99 @@ + + +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instan

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-08-01 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/1434#discussion_r15727599 --- Diff: extras/kinesis-asl/pom.xml --- @@ -0,0 +1,99 @@ + + +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instan

[GitHub] spark pull request: [SPARK-2454] Do not ship spark home to Workers

2014-08-01 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1734#discussion_r15727592 --- Diff: python/pyspark/context.py --- @@ -126,8 +126,6 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,

[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple

2014-08-01 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-50955318 Because of GIL, in most cases, Python threads will not run concurrently. And this patch will replace first, then patch the classes, the process can be interrupted without

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-08-01 Thread cfregly
Github user cfregly commented on a diff in the pull request: https://github.com/apache/spark/pull/1434#discussion_r15727587 --- Diff: extras/kinesis-asl/pom.xml --- @@ -0,0 +1,99 @@ + + +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-ins

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-08-01 Thread cfregly
Github user cfregly commented on a diff in the pull request: https://github.com/apache/spark/pull/1434#discussion_r15727580 --- Diff: extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-08-01 Thread cfregly
Github user cfregly commented on a diff in the pull request: https://github.com/apache/spark/pull/1434#discussion_r15727578 --- Diff: bin/run-example --- @@ -29,7 +29,9 @@ if [ -n "$1" ]; then else echo "Usage: ./bin/run-example [example-args]" 1>&2 echo " - set

[GitHub] spark pull request: [SPARK-2454] Do not ship spark home to Workers

2014-08-01 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1734#discussion_r15727575 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -81,7 +81,8 @@ private[spark] class Worker( @volatile var registere

[GitHub] spark pull request: [SPARK-2801][MLlib]: DistributionGenerator ren...

2014-08-01 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1732 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: StatCounter on NumPy arrays [PYSPARK][SPARK-20...

2014-08-01 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1725 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-1470][SPARK-1842] Use the scala-logging...

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1369#issuecomment-50955230 QA results for PR 1369:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple

2014-08-01 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-50955204 Just to cover all possible cases, are there any thread-safety issues here? Will be be in trouble if a user creates a new `namedtuple` instance while `_hack_namedtuple(

[GitHub] spark pull request: [SPARK-2454] Do not ship spark home to Workers

2014-08-01 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1734#discussion_r15727555 --- Diff: python/pyspark/context.py --- @@ -126,8 +126,6 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,

[GitHub] spark pull request: [SPARK-2454] Do not ship spark home to Workers

2014-08-01 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1734#discussion_r15727539 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -81,7 +81,8 @@ private[spark] class Worker( @volatile var registered

[GitHub] spark pull request: [SPARK-2316] Avoid O(blocks) operations in lis...

2014-08-01 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1679#issuecomment-50955077 LGTM pending tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this fe

[GitHub] spark pull request: [MLlib] [SPARK-2510]Word2Vec: Distributed Repr...

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1719#issuecomment-50955037 QA tests have started for PR 1719. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17756/consoleFull --- If

[GitHub] spark pull request: [SQL] Set outputPartitioning of BroadcastHashJ...

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1735#issuecomment-50954832 QA tests have started for PR 1735. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17754/consoleFull --- If

[GitHub] spark pull request: [SPARK-2316] Avoid O(blocks) operations in lis...

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1679#issuecomment-50954833 QA tests have started for PR 1679. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17755/consoleFull --- If

[GitHub] spark pull request: [SQL] Set outputPartitioning of BroadcastHashJ...

2014-08-01 Thread yhuai
GitHub user yhuai opened a pull request: https://github.com/apache/spark/pull/1735 [SQL] Set outputPartitioning of BroadcastHashJoin correctly. I think we will not generate the plan triggering this bug at this moment. But, let me explain it... Right now, we are using `left.

[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-50954747 QA tests have started for PR 1623. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17753/consoleFull --- If

[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple

2014-08-01 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-50954746 @JoshRosen Good point, I had managed to replace all the reference of namedtuple to new one, so this hijack only need once. Because it's only related to pickle seri

[GitHub] spark pull request: [SPARK-2316] Avoid O(blocks) operations in lis...

2014-08-01 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1679#issuecomment-50954735 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [WIP] SPARK-2157 Ability to write tight firewa...

2014-08-01 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1107#issuecomment-50954725 Yeah good call, we need to cover that one as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If yo

[GitHub] spark pull request: [SPARK-2316] Avoid O(blocks) operations in lis...

2014-08-01 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1679#issuecomment-50954718 Great to know @shivaram. Thanks for testing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pr

[GitHub] spark pull request: StatCounter on NumPy arrays [PYSPARK][SPARK-20...

2014-08-01 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/1725#issuecomment-50954666 @JoshRosen @davies great, thanks guys! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [WIP] SPARK-2157 Ability to write tight firewa...

2014-08-01 Thread ash211
Github user ash211 commented on the pull request: https://github.com/apache/spark/pull/1107#issuecomment-50954618 Looking at netstat more closely, we realized that there is still a port that's not configurable: the port that the driver connects to the executor on with Akka. The worke

[GitHub] spark pull request: [SPARK-2801][MLlib]: DistributionGenerator ren...

2014-08-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1732#issuecomment-50954562 LGTM. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: StatCounter on NumPy arrays [PYSPARK][SPARK-20...

2014-08-01 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1725#issuecomment-50954571 This looks good. At first, I was concerned that element-wise operations might change behavior for calling `stats()` on an RDD of Python lists of numbers (`sc.paralleli

[GitHub] spark pull request: [SPARK-2454] Do not ship spark home to Workers

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1734#issuecomment-50954492 QA tests have started for PR 1734. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17752/consoleFull --- If

[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-50954423 QA results for PR 1623:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds the following public classes (experimental):class AutoSerializer(Pickle

[GitHub] spark pull request: [SPARK-2454] Do not assume drivers and executo...

2014-08-01 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1472#issuecomment-50954417 Closing this in favor of #1734. Please disregard this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark pull request: [SPARK-1740] [PySpark] kill the python worker

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1643#issuecomment-50954410 QA tests have started for PR 1643. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17751/consoleFull --- If

[GitHub] spark pull request: [SPARK-2635] Fix race condition at SchedulerBa...

2014-08-01 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/1525#issuecomment-50954397 My preference is also to remove this for standalone mode (as mentioned in the original PR, #900) -- but adding @tgravescs who looked quite a bit at the original PR

[GitHub] spark pull request: [SPARK-2454] Do not ship spark home to Workers

2014-08-01 Thread andrewor14
GitHub user andrewor14 opened a pull request: https://github.com/apache/spark/pull/1734 [SPARK-2454] Do not ship spark home to Workers When standalone Workers launch executors, they inherit the Spark home set by the driver. This means if the worker machines do not share the same di

[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-50954390 QA results for PR 1623:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds the following public classes (experimental):class AutoSerializer(Pickle

[GitHub] spark pull request: [SPARK-1740] [PySpark] kill the python worker

2014-08-01 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1643#issuecomment-50954392 @JoshRosen I had redo this PR based your cleanup, plz review again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple

2014-08-01 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-50954307 Calling `_hack_namedtuple()` should set up pickling for any `namedtuple` subclasses defined up to that point. It looks like we re-assign to `collections.namedtuple`, b

[GitHub] spark pull request: [SPARK-2801][MLlib]: DistributionGenerator ren...

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1732#issuecomment-50954230 QA results for PR 1732:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-1672][WIP] Separate partitioning in ALS

2014-08-01 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/593 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[GitHub] spark pull request: [SPARK-1580] Estimate ALS communication and co...

2014-08-01 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/493 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[GitHub] spark pull request: [SPARK-1580][MLLIB] Estimate ALS communication...

2014-08-01 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1731 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: StatCounter on NumPy arrays [PYSPARK][SPARK-20...

2014-08-01 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1725#issuecomment-50954198 lgtm @JoshRosen Could you help to take a look at this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as w

[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple

2014-08-01 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-50954102 User may call namedtuple to create class at any time, so this hack should delay to call pickle, so we have to check many times. --- If your project is set up for it, you

[GitHub] spark pull request: Add timestamps to block manager events.

2014-08-01 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/654#issuecomment-50954032 Do you mind opening a [JIRA](http://issues.apache.org/jira/browse/SPARK) issue for this and updating your pull request title to reference that issue (e.g. [SPARK-XXX] My

[GitHub] spark pull request: StatCounter on NumPy arrays [PYSPARK][SPARK-20...

2014-08-01 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/1725#discussion_r15727298 --- Diff: python/pyspark/tests.py --- @@ -38,12 +38,19 @@ from pyspark.shuffle import Aggregator, InMemoryMerger, ExternalMerger _have_scip

[GitHub] spark pull request: [SPARK-1470][SPARK-1842] Use the scala-logging...

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1369#issuecomment-50953931 QA tests have started for PR 1369. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17748/consoleFull --- If

[GitHub] spark pull request: [SPARK-1470][SPARK-1842] Use the scala-logging...

2014-08-01 Thread witgo
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1369#issuecomment-50953905 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have thi

[GitHub] spark pull request: [SPARK-2752]spark sql cli should not exit when...

2014-08-01 Thread scwf
Github user scwf closed the pull request at: https://github.com/apache/spark/pull/1661 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enable

[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple

2014-08-01 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-50953790 Your latest commit improves things, but I still think the static method approach would be better, since that way we wouldn't wind up calling `_hack_namedtuple()` so oft

[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1624#issuecomment-50953756 I re-opened the JIRA. Please use the same JIRA number for your new PR. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appea

[GitHub] spark pull request: Adding OWL-QN optimizer for L1 regularizations...

2014-08-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/840#issuecomment-50953718 @codedeft Could you add `[SPARK-1892][MLLIB]` to the title of this PR? So it shows up in the result if people search for the JIRA or `[MLLIB]`. Thanks! --- If your project

[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-01 Thread miccagiann
Github user miccagiann commented on the pull request: https://github.com/apache/spark/pull/1624#issuecomment-50953699 Xiangrui, I see that the JIRA issue is closed. Should we create a new one for the `LogisticRegressionWithSGD` and for `SVMWithSGD`? --- If your project is se

[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-01 Thread miccagiann
Github user miccagiann commented on the pull request: https://github.com/apache/spark/pull/1624#issuecomment-50953670 Alright, I was fixing my branches so as my new commits to be included correctly in the new PR I am going to create. --- If your project is set up for it, you can repl

[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-50953649 QA tests have started for PR 1623. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17747/consoleFull --- If

[GitHub] spark pull request: [SPARK-2752]spark sql cli should not exit when...

2014-08-01 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/1661#issuecomment-50953641 @marmbrus Yes, we can close this. @scwf Thanks all the same for bringing this issue to the table and working on this! --- If your project is set up for it, you can rep

[GitHub] spark pull request: [SPARK-1580][MLLIB] Estimate ALS communication...

2014-08-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1731#issuecomment-50953592 Merged into master. Thanks @tmyklebu for the work! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-50953556 QA tests have started for PR 1623. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17746/consoleFull --- If

[GitHub] spark pull request: [SPARK-2212][SQL] Hash Outer Join (follow-up b...

2014-08-01 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1721#issuecomment-50953475 That's cool, thank you @yhuai :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project do

[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple

2014-08-01 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-50953479 I see, CloudPickle also need this hack. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project doe

[GitHub] spark pull request: [MLlib] [SPARK-2510]word2vec: Distributed Repr...

2014-08-01 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1719#discussion_r15727134 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -0,0 +1,401 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] spark pull request: [SPARK-2801][MLlib]: DistributionGenerator ren...

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1732#issuecomment-50953416 QA tests have started for PR 1732. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17745/consoleFull --- If

[GitHub] spark pull request: [MLlib] [SPARK-2510]word2vec: Distributed Repr...

2014-08-01 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1719#discussion_r15727095 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -0,0 +1,401 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] spark pull request: [MLlib] [SPARK-2510]word2vec: Distributed Repr...

2014-08-01 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1719#discussion_r15727075 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -0,0 +1,401 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-08-01 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1361 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-01 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1624 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1624#issuecomment-50953229 Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have t

[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1624#issuecomment-50953197 QA results for PR 1624:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-1580][MLLIB] Estimate ALS communication...

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1731#issuecomment-50953188 QA results for PR 1731:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds the following public classes (experimental):case class OutLinkBlock(ele

[GitHub] spark pull request: [SPARK-1812] remove default args to overloaded...

2014-08-01 Thread avati
Github user avati commented on the pull request: https://github.com/apache/spark/pull/1704#issuecomment-50953179 On Fri, Aug 1, 2014 at 6:52 PM, Patrick Wendell wrote: > Is this a comprehensive list of cases that need to be addressed? One issue > is that this will brea

[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple

2014-08-01 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-50953161 Did you run that exact file with PySpark? The important bits are that namedtuple is imported and an instance is created before any PySpark imports, and we launch a job

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

2014-08-01 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/1715#discussion_r15727020 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -311,6 +311,15 @@ private[spark] class SparkSubmitArguments(args: S

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

2014-08-01 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/1715#discussion_r15727018 --- Diff: bin/spark-sql --- @@ -26,11 +26,16 @@ set -o posix # Figure out where Spark is installed FWDIR="$(cd `dirname $0`/..; pwd)" -if

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

2014-08-01 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/1715#discussion_r15727021 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -311,6 +311,15 @@ private[spark] class SparkSubmitArguments(args: S

[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

2014-08-01 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/1715#discussion_r15727017 --- Diff: bin/beeline --- @@ -17,29 +17,14 @@ # limitations under the License. # -# Figure out where Spark is installed --- End diff

[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1733#issuecomment-50953135 QA results for PR 1733:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1733#issuecomment-50953122 QA tests have started for PR 1733. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17744/consoleFull --- If

[GitHub] spark pull request: [SPARK-1470][SPARK-1842] Use the scala-logging...

2014-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1369#issuecomment-50953076 QA results for PR 1369:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-2515][mllib] Chi Squared test

2014-08-01 Thread dorx
GitHub user dorx opened a pull request: https://github.com/apache/spark/pull/1733 [SPARK-2515][mllib] Chi Squared test You can merge this pull request into a Git repository by running: $ git pull https://github.com/dorx/spark chisquare Alternatively you can review and apply t

[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple

2014-08-01 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-50952992 It works in my Mac, have you apply the patch? It should be registerd before dumps. --- If your project is set up for it, you can reply to this email and have your reply a

[GitHub] spark pull request: [SPARK-2316] Avoid O(blocks) operations in lis...

2014-08-01 Thread shivaram
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/1679#issuecomment-50952987 @pwendell @andrewor14 Yes - the run went fine. I didn't see any listener bus overflows and the UI was fine. Also I used to previously see 1 CPU fully occupied b

[GitHub] spark pull request: StatCounter on NumPy arrays [PYSPARK][SPARK-20...

2014-08-01 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1725#discussion_r15726945 --- Diff: python/pyspark/tests.py --- @@ -38,12 +38,19 @@ from pyspark.shuffle import Aggregator, InMemoryMerger, ExternalMerger _have_scipy = F

[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...

2014-08-01 Thread javadba
Github user javadba commented on a diff in the pull request: https://github.com/apache/spark/pull/1586#discussion_r15726900 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala --- @@ -208,6 +211,96 @@ case class EndsWith(left: Exp

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-08-01 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1434#discussion_r15726898 --- Diff: bin/run-example --- @@ -29,7 +29,9 @@ if [ -n "$1" ]; then else echo "Usage: ./bin/run-example [example-args]" 1>&2 echo " - se

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-08-01 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/1434#discussion_r15726872 --- Diff: extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache So

[GitHub] spark pull request: [SPARK-2764] Simplify daemon.py process struct...

2014-08-01 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1680#discussion_r15726868 --- Diff: python/pyspark/daemon.py --- @@ -174,20 +116,41 @@ def handle_sigchld(*args): # Initialization complete sys.stdout.close()

  1   2   3   4   5   6   7   8   >