Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/5267#issuecomment-155231687
awesome, nice job all!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/8318#discussion_r37580512
--- Diff: python/pyspark/__init__.py ---
@@ -36,6 +36,31 @@
Finer-grained cache persistence levels.
+import os
+import sys
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/8087#issuecomment-130090318
Nice! I think this is a solid strategy. Maybe in the next round of changes
make that `20.0`, which will presumably be used throughout, a var shared by all
the tests
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/5267#issuecomment-126520577
sorry for the delay @yu-iskw, i'm going through it today, comments soon
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/5267#issuecomment-121499180
@yu-iskw @mengxr definitely! I'll take a look by Friday as well.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/6744#discussion_r33204154
--- Diff: docs/mllib-linear-methods.md ---
@@ -768,6 +768,58 @@ will get better!
/div
+div data-lang=python markdown=1
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/6744#discussion_r33203877
--- Diff: python/pyspark/mllib/regression.py ---
@@ -570,6 +571,92 @@ def train(cls, data, isotonic=True):
return IsotonicRegressionModel
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/6744#discussion_r33203819
--- Diff: python/pyspark/mllib/regression.py ---
@@ -570,6 +571,92 @@ def train(cls, data, isotonic=True):
return IsotonicRegressionModel
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/6744#discussion_r33203800
--- Diff: python/pyspark/mllib/regression.py ---
@@ -570,6 +571,92 @@ def train(cls, data, isotonic=True):
return IsotonicRegressionModel
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/6499#discussion_r32750538
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -264,6 +270,192 @@ def train(cls, rdd, k, convergenceTol=1e-3,
maxIterations=100, seed=None, initia
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/6499#discussion_r32751544
--- Diff: python/pyspark/mllib/tests.py ---
@@ -863,6 +876,107 @@ def test_model_transform(self):
eprod.transform(sparsevec
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/6499#discussion_r32751786
--- Diff: python/pyspark/mllib/tests.py ---
@@ -863,6 +876,107 @@ def test_model_transform(self):
eprod.transform(sparsevec
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5267#discussion_r32355824
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,631 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5267#discussion_r32355773
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,631 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5267#discussion_r32355804
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,631 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5267#discussion_r32355863
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,631 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5267#discussion_r32355952
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,631 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5267#discussion_r32356016
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,631 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5267#discussion_r32357421
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -192,6 +196,107 @@ def train(cls, rdd, k, convergenceTol=1e-3,
maxIterations=100, seed=None
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5267#discussion_r32359748
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,631 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5267#discussion_r32357267
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -192,6 +196,107 @@ def train(cls, rdd, k, convergenceTol=1e-3,
maxIterations=100, seed=None
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/6499#issuecomment-111632458
Ok great, thanks @MechCoder ! I'll give it a full go-over this weekend.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/5267#issuecomment-111632563
Thanks for the effort @yu-iskw I think this has come together nicely! The
new linkageMatrix and adjacencyList features are very nice. I left a few
comments almost
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5267#discussion_r32356208
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,631 @@
+/*
+ * Licensed
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/6499#issuecomment-107819707
Cool, excited to look at this! Can definitely take a pass after you update.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/5267#issuecomment-98132265
@yu-iskw that makes sense! I do think the linkage matrix / merge list is a
general enough data structure for this algorithm that it's definitely worth
having
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5267#discussion_r29121200
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,574 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5267#discussion_r29121407
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClusteringModel.scala
---
@@ -0,0 +1,116 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5267#discussion_r29121409
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClusteringModel.scala
---
@@ -0,0 +1,116 @@
+/*
+ * Licensed
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/5267#issuecomment-96512684
@yu-iskw I'm still going through the patch, but so far it's looking good!
I've also been testing it locally.
Is there a reason you removed the `toMergeList
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5267#discussion_r29120850
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,574 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5267#discussion_r29121114
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,574 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5267#discussion_r29121153
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,574 @@
+/*
+ * Licensed
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/5267#issuecomment-96511859
@yu-iskw I'm not familiar with any other self-contained metrics (there are
a bunch of metrics for relating estimated clusters to some known ground-truth
clustering
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/5582#issuecomment-94347795
yup, seems fine to me too, this was probably an unintentional omission, as
both `StreamingLinearRegressionWithSGD` and
`StreamingLogisticRegressionWithSGD` do extend
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/4716#issuecomment-90807615
@mengxr @feynmanliang sure thing! This looks really cool, will try to go
through it in the next couple days.
---
If your project is set up for it, you can reply
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/5037#issuecomment-89164396
Great thanks @tdas @mengxr !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/5267#issuecomment-88386004
@yu-iskw great putting this new version together, I'd be happy to do a
review (especially re: the algorithm), should be able to get to it in the next
few days
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/5037#issuecomment-86736605
Agreed @tdas ! Discussed with @mengxr offline today and I think we're
satisfied with this fix. Any additional suggestions, or extra tests?
I want to make
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5037#discussion_r26500716
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearRegressionWithSGD.scala
---
@@ -59,6 +59,8 @@ class
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5037#discussion_r26498476
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearRegressionWithSGD.scala
---
@@ -59,6 +59,8 @@ class
GitHub user freeman-lab opened a pull request:
https://github.com/apache/spark/pull/5037
[SPARK-6345][STREAMING][MLLIB] Fix for training with prediction
This patch fixes a reported bug causing model updates to not properly
propagate to model predictions during streaming regression
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5037#discussion_r26520257
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearAlgorithm.scala
---
@@ -114,7 +114,7 @@ abstract class
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/5037#discussion_r26520319
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearRegressionWithSGD.scala
---
@@ -59,6 +59,8 @@ class
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/4432#issuecomment-73317241
@mengxr nice patch! I left two pretty minor comments about expanding the
tests, otherwise looking good.
---
If your project is set up for it, you can reply
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/4432#discussion_r24276343
--- Diff:
mllib/src/test/java/org/apache/spark/mllib/regression/JavaStreamingLinearRegressionSuite.java
---
@@ -0,0 +1,80 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/4432#discussion_r24273893
--- Diff:
mllib/src/test/java/org/apache/spark/mllib/regression/JavaStreamingLinearRegressionSuite.java
---
@@ -0,0 +1,80 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/4432#discussion_r24273989
--- Diff:
mllib/src/test/java/org/apache/spark/mllib/regression/JavaStreamingLinearRegressionSuite.java
---
@@ -0,0 +1,80 @@
+/*
--- End diff
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/3803#issuecomment-72793042
Thanks for the detailed look @tdas! Think I addressed both nits.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/3803#discussion_r24063184
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
---
@@ -210,6 +211,20 @@ class JavaStreamingContext
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/3803#discussion_r24063473
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -671,7 +674,11 @@ class SparkContext(config: SparkConf) extends Logging
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/3803#discussion_r24064747
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -671,7 +674,11 @@ class SparkContext(config: SparkConf) extends Logging
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/4306#discussion_r24038692
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearAlgorithm.scala
---
@@ -58,14 +58,14 @@ abstract class
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/4306#issuecomment-72555492
Thanks for the review! I think I dealt with everything, and in a couple
places I tweaked the corresponding point in `StreamingLinearRegression` for
parity
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/4306#discussion_r23966222
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/StreamingLogisticRegression.scala
---
@@ -0,0 +1,74 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/4306#discussion_r23965614
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/StreamingLogisticRegressionWithSGD.scala
---
@@ -0,0 +1,97
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/4306#issuecomment-72583404
Should we just set `initialWeights=Vectors.dense(0.0)` by default?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/3803#discussion_r23981497
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
@@ -361,6 +363,25 @@ class StreamingContext private[streaming
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/4306#issuecomment-72580780
@mengxr the test failure is real, I think there's a conflict due to this
recent change to master
(https://github.com/apache/spark/commit
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/4306#issuecomment-72588966
Nice idea! Just took a stab, see what you think. Was reproducing the test
failure locally, and it is now fixed with this change.
---
If your project is set up
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/4306#issuecomment-72581477
Ok, the problem is that we've been setting `initialWeights` to `null` by
default and then checking that weights are defined before starting training
(because we
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/3803#discussion_r23980541
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
@@ -361,6 +363,25 @@ class StreamingContext private[streaming
GitHub user freeman-lab opened a pull request:
https://github.com/apache/spark/pull/4306
[SPARK-4979][MLLIB] Streaming logisitic regression
This adds support for streaming logistic regression with stochastic
gradient descent, in the same manner as the existing implementation
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/3803#issuecomment-72391803
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/3803#issuecomment-72285668
@JoshRosen I finished the refactored tests and added better handling of the
`getBytes` based on your suggestion.
---
If your project is set up for it, you can reply
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/3803#discussion_r23886182
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -657,6 +657,10 @@ class SparkContext(config: SparkConf) extends Logging
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/3803#issuecomment-71929328
Great thanks @JoshRosen will finish this up ASAP!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22640591
--- Diff: data/mllib/sample_hierarchical_data.csv ---
@@ -0,0 +1,150 @@
+5.1,3.5,1.4,0.2
--- End diff --
Good point =) Leave
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22633425
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,627 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22633847
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClusteringModel.scala
---
@@ -0,0 +1,126 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22633951
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClusteringModel.scala
---
@@ -0,0 +1,126 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22633997
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClusteringModel.scala
---
@@ -0,0 +1,126 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22634802
--- Diff: examples/src/main/python/mllib/hierarchical_clustering.py ---
@@ -0,0 +1,84 @@
+#
+# Licensed to the Apache Software Foundation (ASF
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/2906#issuecomment-69134341
Hi @yu-iskw and @rnowling , I've spent time reviewing the code and using it
in both Python and Scala. Overall great work, terrific to see my little gist
turned
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22634203
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -88,6 +92,162 @@ def train(cls, rdd, k, maxIterations=100, runs=1,
initializationMode=k-means
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22633758
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,627 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22633778
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,627 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22634674
--- Diff: docs/mllib-clustering.md ---
@@ -154,6 +156,175 @@ section of the Spark
Quick Start guide. Be sure to also include *spark-mllib* to your
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22634865
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,627 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22634887
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,627 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22634895
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,627 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22634890
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,627 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22632101
--- Diff: data/mllib/sample_hierarchical_data.csv ---
@@ -0,0 +1,150 @@
+5.1,3.5,1.4,0.2
--- End diff --
It might be nice if this could
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22632146
--- Diff:
examples/src/main/java/org/apache/spark/examples/mllib/JavaHierarchicalClustering.java
---
@@ -0,0 +1,73 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22632182
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClusteringModel.scala
---
@@ -0,0 +1,126 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22632172
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,627 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22632194
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/clustering/HierarchicalClusteringSuite.scala
---
@@ -0,0 +1,330 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22632220
--- Diff:
examples/src/main/java/org/apache/spark/examples/mllib/JavaHierarchicalClustering.java
---
@@ -0,0 +1,73 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22632265
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -88,6 +92,162 @@ def train(cls, rdd, k, maxIterations=100, runs=1,
initializationMode=k-means
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22632512
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,627 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22632647
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,627 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22632654
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,627 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22632678
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,627 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22632686
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClusteringModel.scala
---
@@ -0,0 +1,126 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22632804
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,627 @@
+/*
+ * Licensed
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/2906#discussion_r22632919
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
---
@@ -0,0 +1,627 @@
+/*
+ * Licensed
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/2906#issuecomment-68794407
Hey all, thanks for the nudge =) I've been going through it, will get you
feedback ASAP.
---
If your project is set up for it, you can reply to this email and have
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/3803#discussion_r22495402
--- Diff:
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala ---
@@ -233,6 +236,47 @@ class InputStreamsSuite extends
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/3803#discussion_r22495425
--- Diff:
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala ---
@@ -233,6 +236,47 @@ class InputStreamsSuite extends
Github user freeman-lab commented on a diff in the pull request:
https://github.com/apache/spark/pull/3803#discussion_r22496437
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
@@ -373,6 +393,25 @@ class StreamingContext private[streaming
1 - 100 of 155 matches
Mail list logo