[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-11-09 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/5267#issuecomment-155231687 awesome, nice job all! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1267][PYSPARK] Adds pip installer for p...

2015-08-20 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/8318#discussion_r37580512 --- Diff: python/pyspark/__init__.py --- @@ -36,6 +36,31 @@ Finer-grained cache persistence levels. +import os +import sys

[GitHub] spark pull request: [WIP] [SPARK-9805] [MLLIB] [PYTHON] [STREAMING...

2015-08-11 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/8087#issuecomment-130090318 Nice! I think this is a solid strategy. Maybe in the next round of changes make that `20.0`, which will presumably be used throughout, a var shared by all the tests

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-07-30 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/5267#issuecomment-126520577 sorry for the delay @yu-iskw, i'm going through it today, comments soon --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-07-14 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/5267#issuecomment-121499180 @yu-iskw @mengxr definitely! I'll take a look by Friday as well. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-4127] [MLlib] [PySpark] Python bindings...

2015-06-24 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/6744#discussion_r33204154 --- Diff: docs/mllib-linear-methods.md --- @@ -768,6 +768,58 @@ will get better! /div +div data-lang=python markdown=1

[GitHub] spark pull request: [SPARK-4127] [MLlib] [PySpark] Python bindings...

2015-06-24 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/6744#discussion_r33203877 --- Diff: python/pyspark/mllib/regression.py --- @@ -570,6 +571,92 @@ def train(cls, data, isotonic=True): return IsotonicRegressionModel

[GitHub] spark pull request: [SPARK-4127] [MLlib] [PySpark] Python bindings...

2015-06-24 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/6744#discussion_r33203819 --- Diff: python/pyspark/mllib/regression.py --- @@ -570,6 +571,92 @@ def train(cls, data, isotonic=True): return IsotonicRegressionModel

[GitHub] spark pull request: [SPARK-4127] [MLlib] [PySpark] Python bindings...

2015-06-24 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/6744#discussion_r33203800 --- Diff: python/pyspark/mllib/regression.py --- @@ -570,6 +571,92 @@ def train(cls, data, isotonic=True): return IsotonicRegressionModel

[GitHub] spark pull request: [SPARK-4118] [MLlib] [PySpark] Python bindings...

2015-06-18 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/6499#discussion_r32750538 --- Diff: python/pyspark/mllib/clustering.py --- @@ -264,6 +270,192 @@ def train(cls, rdd, k, convergenceTol=1e-3, maxIterations=100, seed=None, initia

[GitHub] spark pull request: [SPARK-4118] [MLlib] [PySpark] Python bindings...

2015-06-18 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/6499#discussion_r32751544 --- Diff: python/pyspark/mllib/tests.py --- @@ -863,6 +876,107 @@ def test_model_transform(self): eprod.transform(sparsevec

[GitHub] spark pull request: [SPARK-4118] [MLlib] [PySpark] Python bindings...

2015-06-18 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/6499#discussion_r32751786 --- Diff: python/pyspark/mllib/tests.py --- @@ -863,6 +876,107 @@ def test_model_transform(self): eprod.transform(sparsevec

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-06-12 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5267#discussion_r32355824 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,631 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-06-12 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5267#discussion_r32355773 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,631 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-06-12 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5267#discussion_r32355804 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,631 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-06-12 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5267#discussion_r32355863 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,631 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-06-12 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5267#discussion_r32355952 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,631 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-06-12 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5267#discussion_r32356016 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,631 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-06-12 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5267#discussion_r32357421 --- Diff: python/pyspark/mllib/clustering.py --- @@ -192,6 +196,107 @@ def train(cls, rdd, k, convergenceTol=1e-3, maxIterations=100, seed=None

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-06-12 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5267#discussion_r32359748 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,631 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-06-12 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5267#discussion_r32357267 --- Diff: python/pyspark/mllib/clustering.py --- @@ -192,6 +196,107 @@ def train(cls, rdd, k, convergenceTol=1e-3, maxIterations=100, seed=None

[GitHub] spark pull request: [SPARK-4118] [MLlib] [PySpark] Python bindings...

2015-06-12 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/6499#issuecomment-111632458 Ok great, thanks @MechCoder ! I'll give it a full go-over this weekend. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-06-12 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/5267#issuecomment-111632563 Thanks for the effort @yu-iskw I think this has come together nicely! The new linkageMatrix and adjacencyList features are very nice. I left a few comments almost

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-06-12 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5267#discussion_r32356208 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,631 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-4118] [MLlib] [PySpark] Python bindings...

2015-06-01 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/6499#issuecomment-107819707 Cool, excited to look at this! Can definitely take a pass after you update. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-05-01 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/5267#issuecomment-98132265 @yu-iskw that makes sense! I do think the linkage matrix / merge list is a general enough data structure for this algorithm that it's definitely worth having

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-04-26 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5267#discussion_r29121200 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,574 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-04-26 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5267#discussion_r29121407 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClusteringModel.scala --- @@ -0,0 +1,116 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-04-26 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5267#discussion_r29121409 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClusteringModel.scala --- @@ -0,0 +1,116 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-04-26 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/5267#issuecomment-96512684 @yu-iskw I'm still going through the patch, but so far it's looking good! I've also been testing it locally. Is there a reason you removed the `toMergeList

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-04-26 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5267#discussion_r29120850 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,574 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-04-26 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5267#discussion_r29121114 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,574 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-04-26 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5267#discussion_r29121153 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,574 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-04-26 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/5267#issuecomment-96511859 @yu-iskw I'm not familiar with any other self-contained metrics (there are a bunch of metrics for relating estimated clusters to some known ground-truth clustering

[GitHub] spark pull request: [SPARK-6998][MLlib] Make StreamingKMeans 'Seri...

2015-04-19 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/5582#issuecomment-94347795 yup, seems fine to me too, this was probably an unintentional omission, as both `StreamingLinearRegressionWithSGD` and `StreamingLogisticRegressionWithSGD` do extend

[GitHub] spark pull request: [SPARK-3147][MLLib] A/B testing

2015-04-07 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/4716#issuecomment-90807615 @mengxr @feynmanliang sure thing! This looks really cool, will try to go through it in the next couple days. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-6345][STREAMING][MLLIB] Fix for trainin...

2015-04-02 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/5037#issuecomment-89164396 Great thanks @tdas @mengxr ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-6517][mllib] Implement the Algorithm of...

2015-04-01 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/5267#issuecomment-88386004 @yu-iskw great putting this new version together, I'd be happy to do a review (especially re: the algorithm), should be able to get to it in the next few days

[GitHub] spark pull request: [SPARK-6345][STREAMING][MLLIB] Fix for trainin...

2015-03-26 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/5037#issuecomment-86736605 Agreed @tdas ! Discussed with @mengxr offline today and I think we're satisfied with this fix. Any additional suggestions, or extra tests? I want to make

[GitHub] spark pull request: [SPARK-6345][STREAMING][MLLIB] Fix for trainin...

2015-03-16 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5037#discussion_r26500716 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearRegressionWithSGD.scala --- @@ -59,6 +59,8 @@ class

[GitHub] spark pull request: [SPARK-6345][STREAMING][MLLIB] Fix for trainin...

2015-03-16 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5037#discussion_r26498476 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearRegressionWithSGD.scala --- @@ -59,6 +59,8 @@ class

[GitHub] spark pull request: [SPARK-6345][STREAMING][MLLIB] Fix for trainin...

2015-03-16 Thread freeman-lab
GitHub user freeman-lab opened a pull request: https://github.com/apache/spark/pull/5037 [SPARK-6345][STREAMING][MLLIB] Fix for training with prediction This patch fixes a reported bug causing model updates to not properly propagate to model predictions during streaming regression

[GitHub] spark pull request: [SPARK-6345][STREAMING][MLLIB] Fix for trainin...

2015-03-16 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5037#discussion_r26520257 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearAlgorithm.scala --- @@ -114,7 +114,7 @@ abstract class

[GitHub] spark pull request: [SPARK-6345][STREAMING][MLLIB] Fix for trainin...

2015-03-16 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/5037#discussion_r26520319 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearRegressionWithSGD.scala --- @@ -59,6 +59,8 @@ class

[GitHub] spark pull request: [SPARK-5601][MLLIB] make streaming linear algo...

2015-02-08 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/4432#issuecomment-73317241 @mengxr nice patch! I left two pretty minor comments about expanding the tests, otherwise looking good. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-5601][MLLIB] make streaming linear algo...

2015-02-08 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/4432#discussion_r24276343 --- Diff: mllib/src/test/java/org/apache/spark/mllib/regression/JavaStreamingLinearRegressionSuite.java --- @@ -0,0 +1,80 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-5601][MLLIB] make streaming linear algo...

2015-02-08 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/4432#discussion_r24273893 --- Diff: mllib/src/test/java/org/apache/spark/mllib/regression/JavaStreamingLinearRegressionSuite.java --- @@ -0,0 +1,80 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-5601][MLLIB] make streaming linear algo...

2015-02-08 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/4432#discussion_r24273989 --- Diff: mllib/src/test/java/org/apache/spark/mllib/regression/JavaStreamingLinearRegressionSuite.java --- @@ -0,0 +1,80 @@ +/* --- End diff

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-02-03 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/3803#issuecomment-72793042 Thanks for the detailed look @tdas! Think I addressed both nits. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-02-03 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/3803#discussion_r24063184 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala --- @@ -210,6 +211,20 @@ class JavaStreamingContext

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-02-03 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/3803#discussion_r24063473 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -671,7 +674,11 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-02-03 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/3803#discussion_r24064747 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -671,7 +674,11 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request: [SPARK-4979][MLLIB] Streaming logisitic regres...

2015-02-03 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/4306#discussion_r24038692 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearAlgorithm.scala --- @@ -58,14 +58,14 @@ abstract class

[GitHub] spark pull request: [SPARK-4979][MLLIB] Streaming logisitic regres...

2015-02-02 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/4306#issuecomment-72555492 Thanks for the review! I think I dealt with everything, and in a couple places I tweaked the corresponding point in `StreamingLinearRegression` for parity

[GitHub] spark pull request: [SPARK-4979][MLLIB] Streaming logisitic regres...

2015-02-02 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/4306#discussion_r23966222 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/StreamingLogisticRegression.scala --- @@ -0,0 +1,74 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-4979][MLLIB] Streaming logisitic regres...

2015-02-02 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/4306#discussion_r23965614 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/StreamingLogisticRegressionWithSGD.scala --- @@ -0,0 +1,97

[GitHub] spark pull request: [SPARK-4979][MLLIB] Streaming logisitic regres...

2015-02-02 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/4306#issuecomment-72583404 Should we just set `initialWeights=Vectors.dense(0.0)` by default? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-02-02 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/3803#discussion_r23981497 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -361,6 +363,25 @@ class StreamingContext private[streaming

[GitHub] spark pull request: [SPARK-4979][MLLIB] Streaming logisitic regres...

2015-02-02 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/4306#issuecomment-72580780 @mengxr the test failure is real, I think there's a conflict due to this recent change to master (https://github.com/apache/spark/commit

[GitHub] spark pull request: [SPARK-4979][MLLIB] Streaming logisitic regres...

2015-02-02 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/4306#issuecomment-72588966 Nice idea! Just took a stab, see what you think. Was reproducing the test failure locally, and it is now fixed with this change. --- If your project is set up

[GitHub] spark pull request: [SPARK-4979][MLLIB] Streaming logisitic regres...

2015-02-02 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/4306#issuecomment-72581477 Ok, the problem is that we've been setting `initialWeights` to `null` by default and then checking that weights are defined before starting training (because we

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-02-02 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/3803#discussion_r23980541 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -361,6 +363,25 @@ class StreamingContext private[streaming

[GitHub] spark pull request: [SPARK-4979][MLLIB] Streaming logisitic regres...

2015-02-02 Thread freeman-lab
GitHub user freeman-lab opened a pull request: https://github.com/apache/spark/pull/4306 [SPARK-4979][MLLIB] Streaming logisitic regression This adds support for streaming logistic regression with stochastic gradient descent, in the same manner as the existing implementation

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-02-01 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/3803#issuecomment-72391803 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-01-30 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/3803#issuecomment-72285668 @JoshRosen I finished the refactored tests and added better handling of the `getBytes` based on your suggestion. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-01-30 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/3803#discussion_r23886182 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -657,6 +657,10 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-01-28 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/3803#issuecomment-71929328 Great thanks @JoshRosen will finish this up ASAP! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-08 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22640591 --- Diff: data/mllib/sample_hierarchical_data.csv --- @@ -0,0 +1,150 @@ +5.1,3.5,1.4,0.2 --- End diff -- Good point =) Leave

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22633425 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,627 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22633847 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClusteringModel.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22633951 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClusteringModel.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22633997 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClusteringModel.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22634802 --- Diff: examples/src/main/python/mllib/hierarchical_clustering.py --- @@ -0,0 +1,84 @@ +# +# Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/2906#issuecomment-69134341 Hi @yu-iskw and @rnowling , I've spent time reviewing the code and using it in both Python and Scala. Overall great work, terrific to see my little gist turned

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22634203 --- Diff: python/pyspark/mllib/clustering.py --- @@ -88,6 +92,162 @@ def train(cls, rdd, k, maxIterations=100, runs=1, initializationMode=k-means

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22633758 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,627 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22633778 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,627 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22634674 --- Diff: docs/mllib-clustering.md --- @@ -154,6 +156,175 @@ section of the Spark Quick Start guide. Be sure to also include *spark-mllib* to your

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22634865 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,627 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22634887 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,627 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22634895 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,627 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22634890 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,627 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22632101 --- Diff: data/mllib/sample_hierarchical_data.csv --- @@ -0,0 +1,150 @@ +5.1,3.5,1.4,0.2 --- End diff -- It might be nice if this could

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22632146 --- Diff: examples/src/main/java/org/apache/spark/examples/mllib/JavaHierarchicalClustering.java --- @@ -0,0 +1,73 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22632182 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClusteringModel.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22632172 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,627 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22632194 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/clustering/HierarchicalClusteringSuite.scala --- @@ -0,0 +1,330 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22632220 --- Diff: examples/src/main/java/org/apache/spark/examples/mllib/JavaHierarchicalClustering.java --- @@ -0,0 +1,73 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22632265 --- Diff: python/pyspark/mllib/clustering.py --- @@ -88,6 +92,162 @@ def train(cls, rdd, k, maxIterations=100, runs=1, initializationMode=k-means

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22632512 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,627 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22632647 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,627 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22632654 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,627 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22632678 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,627 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22632686 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClusteringModel.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22632804 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,627 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-07 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/2906#discussion_r22632919 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala --- @@ -0,0 +1,627 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-05 Thread freeman-lab
Github user freeman-lab commented on the pull request: https://github.com/apache/spark/pull/2906#issuecomment-68794407 Hey all, thanks for the nudge =) I've been going through it, will get you feedback ASAP. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-01-05 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/3803#discussion_r22495402 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala --- @@ -233,6 +236,47 @@ class InputStreamsSuite extends

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-01-05 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/3803#discussion_r22495425 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala --- @@ -233,6 +236,47 @@ class InputStreamsSuite extends

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-01-05 Thread freeman-lab
Github user freeman-lab commented on a diff in the pull request: https://github.com/apache/spark/pull/3803#discussion_r22496437 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -373,6 +393,25 @@ class StreamingContext private[streaming

  1   2   >