Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-02 Thread DB Tsai
Hi Deb, The PR is here https://github.com/apache/spark/pull/53 Hi Evan, I think we need to refactor the optimization methods and also the way we write algorithms. For example, if I want to use the new optimization method in LogisticRegression.scala, I need to implement

[GitHub] spark pull request: Remove remaining references to incubation

2014-03-02 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/51#issuecomment-36448947 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Update io.netty from 4.0.13 Final to 4.0.17.Fi...

2014-03-02 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/41#issuecomment-36449234 Sorry @ngbinh you misunderstood me. I think the problem is the git commit metadata doesn't actually contain the author information. It could be that the email or the author

[GitHub] spark pull request: SPARK-1084.2 (resubmitted)

2014-03-02 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/32#issuecomment-36453964 Done, rebased. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-1084.2 (resubmitted)

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/32#issuecomment-36455220 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12952/ --- If your project

[GitHub] spark pull request: Ignore RateLimitedOutputStreamSuite for now.

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/54#issuecomment-36469429 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12953/ --- If your project

[GitHub] spark pull request: Ignore RateLimitedOutputStreamSuite for now.

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/54#issuecomment-36469428 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1084.2 (resubmitted)

2014-03-02 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/32#issuecomment-36469726 Thanks, merged into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1145: Memory mapping with many small blo...

2014-03-02 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/43#discussion_r10198184 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -84,12 +84,27 @@ private class DiskStore(blockManager: BlockManager, diskManager:

[GitHub] spark pull request: SPARK-1102: Create a saveAsNewAPIHadoopDataset...

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12#issuecomment-36471485 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1102: Create a saveAsNewAPIHadoopDataset...

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12#issuecomment-36471529 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12954/ --- If your

[GitHub] spark pull request: SPARK-1102: Create a saveAsNewAPIHadoopDataset...

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12#issuecomment-36471528 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1102: Create a saveAsNewAPIHadoopDataset...

2014-03-02 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/12#issuecomment-36471736 exceed with 5 charssorry.fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-1158: Fix flaky RateLimitedOutputStreamS...

2014-03-02 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/55 SPARK-1158: Fix flaky RateLimitedOutputStreamSuite. There was actually a problem with the RateLimitedOutputStream implementation where the first second doesn't write anything because of integer

[GitHub] spark pull request: SPARK-1158: Fix flaky RateLimitedOutputStreamS...

2014-03-02 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/55#issuecomment-36473419 @tdas @pwendell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-1158: Fix flaky RateLimitedOutputStreamS...

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/55#issuecomment-36473443 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-1158: Fix flaky RateLimitedOutputStreamS...

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/55#issuecomment-36473445 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Add Shortest-path computations to graphx.lib w...

2014-03-02 Thread andy327
Github user andy327 commented on the pull request: https://github.com/apache/spark/pull/10#issuecomment-36473467 See JIRA [SPARK-1159]: https://spark-project.atlassian.net/browse/SPARK-1159 --- If your project is set up for it, you can reply to this email and have your reply appear

Re: Development methodology

2014-03-02 Thread Debasish Das
Hi Reynold, I checked and atlassian stash also has pull request feature... In the spark README it says: Contributions via GitHub pull requests are gladly accepted from their original author. What happens if a spark pull requests comes from stash ? will you guys accept it or all pull requests

Re: Development methodology

2014-03-02 Thread Reynold Xin
Hi Deb, I am not sure how you can create a pull request coming from stash. Maybe I am not understanding this correctly, but there are only two official Spark repositories: 1. Apache git 2. Github On Sun, Mar 2, 2014 at 4:41 PM, Debasish Das debasish.da...@gmail.comwrote: Hi Reynold, I

[GitHub] spark pull request: SPARK-1158: Fix flaky RateLimitedOutputStreamS...

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/55#issuecomment-36474899 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12955/ --- If your project

[GitHub] spark pull request: Add Jekyll tag to isolate production-only do...

2014-03-02 Thread pwendell
GitHub user pwendell opened a pull request: https://github.com/apache/spark/pull/57 Add Jekyll tag to isolate production-only doc components. (0.9 version) You can merge this pull request into a Git repository by running: $ git pull https://github.com/pwendell/spark

[GitHub] spark pull request: fix #SPARK-1149 Bad partitioners can cause Spa...

2014-03-02 Thread witgo
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/44#discussion_r10199454 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -950,6 +952,8 @@ class SparkContext( resultHandler: (Int, U) = Unit,

[GitHub] spark pull request: Add Jekyll tag to isolate production-only do...

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/56#issuecomment-36476860 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: Add Jekyll tag to isolate production-only do...

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/57#issuecomment-36476859 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Add Jekyll tag to isolate production-only do...

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/56#issuecomment-36476861 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Add Jekyll tag to isolate production-only do...

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/57#issuecomment-36476906 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12956/ --- If your

[GitHub] spark pull request: Add Jekyll tag to isolate production-only do...

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/57#issuecomment-36476905 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Remove the remoteFetchTime metric.

2014-03-02 Thread kayousterhout
GitHub user kayousterhout opened a pull request: https://github.com/apache/spark/pull/62 Remove the remoteFetchTime metric. This metric is confusing: it adds up all of the time to fetch shuffle inputs, but fetches often happen in parallel, so remoteFetchTime can be much

[GitHub] spark pull request: SPARK-1145: Memory mapping with many small blo...

2014-03-02 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/43#discussion_r10200502 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -84,12 +84,27 @@ private class DiskStore(blockManager: BlockManager, diskManager:

[GitHub] spark pull request: Remove the remoteFetchTime metric.

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/62#issuecomment-36480711 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: Removed accidentally checked in comment

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/61#issuecomment-36480713 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: Remove the remoteFetchTime metric.

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/62#issuecomment-36480712 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Removed accidentally checked in comment

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/61#issuecomment-36480714 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: simplify the implementation of CoarseGrainedSc...

2014-03-02 Thread CodingCat
GitHub user CodingCat opened a pull request: https://github.com/apache/spark/pull/63 simplify the implementation of CoarseGrainedSchedulerBackend There are 5 main data structures in the class, after reading the source code, I found that some of them are actually not used, some of

[GitHub] spark pull request: simplify the implementation of CoarseGrainedSc...

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/63#issuecomment-36481016 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-1102: Create a saveAsNewAPIHadoopDataset...

2014-03-02 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/12#issuecomment-36481419 Jenkins, this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1156: allow user to login into a cluster...

2014-03-02 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/58#issuecomment-36481449 Why do we want to support this then? Maybe we should just make the spark-ec2 script not let you launch a cluster without slaves. --- If your project is set up for it, you

[GitHub] spark pull request: Add role and checkpoint support for Mesos back...

2014-03-02 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/60#issuecomment-36481638 Jenkins, this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Add role and checkpoint support for Mesos back...

2014-03-02 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/60#issuecomment-36481666 CC @benh --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: Add role and checkpoint support for Mesos back...

2014-03-02 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/60#issuecomment-36481768 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-1156: allow user to login into a cluster...

2014-03-02 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/58#issuecomment-36481890 I think the better way to fix this is, not allow user to start non-slave cluster, but allow them to login to a all-slaves-lost cluster? --- If your project is set up

[GitHub] spark pull request: SPARK-1156: allow user to login into a cluster...

2014-03-02 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/58#issuecomment-36481941 Anyway maybe let's do it like this: if you test it with this change and see that all the commands (stop, resume, etc) still work, then we can keep it. But we should also

[GitHub] spark pull request: Remove the remoteFetchTime metric.

2014-03-02 Thread shivaram
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/62#issuecomment-36481963 Hmm -- I have been confused by this before, but if I am reading the comment right, this could be useful for to get an estimate of the raw network bandwidth used for

[GitHub] spark pull request: [SPARK-972] Added detailed callsite info for V...

2014-03-02 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/34#issuecomment-36482065 @mateiz Do you think this is a good case for a [namedtuple](http://docs.python.org/2/library/collections.html#collections.namedtuple)? --- If your project is set up for

[GitHub] spark pull request: Remove the remoteFetchTime metric.

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/62#issuecomment-36482478 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12959/ --- If your project

[GitHub] spark pull request: Remove the remoteFetchTime metric.

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/62#issuecomment-36482477 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1102: Create a saveAsNewAPIHadoopDataset...

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12#issuecomment-36482496 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/50#discussion_r10201076 --- Diff: core/src/test/scala/org/apache/spark/storage/FlatmapIteratorSuite.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/50#discussion_r10201080 --- Diff: core/src/test/scala/org/apache/spark/storage/FlatmapIteratorSuite.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/50#discussion_r10201097 --- Diff: core/src/test/scala/org/apache/spark/storage/FlatmapIteratorSuite.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/50#discussion_r10201177 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -71,10 +71,21 @@ private[spark] class CacheManager(blockManager: BlockManager) extends

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/50#discussion_r10201181 --- Diff: core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala --- @@ -23,9 +23,27 @@ import java.nio.ByteBuffer import

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/50#discussion_r10201182 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -534,8 +539,9 @@ private[spark] class BlockManager( // If we're

[GitHub] spark pull request: Remove the remoteFetchTime metric.

2014-03-02 Thread shivaram
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/62#issuecomment-36483177 Okay -- Thats seems like a separate conversation. This change looks good to me. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/50#issuecomment-36483263 Hey Kyle, thanks for bringing this to the new repo. I looked through it and made a few comments. Another concern though is that it would be good to make this work for

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/50#discussion_r10201268 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -549,34 +555,43 @@ private[spark] class BlockManager( var

[GitHub] spark pull request: SPARK-1102: Create a saveAsNewAPIHadoopDataset...

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12#issuecomment-36484337 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12962/ --- If your project

[GitHub] spark pull request: SPARK-1173. Improve scala streaming docs.

2014-03-02 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/64#issuecomment-36486791 Thanks Aaron. I've merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [Proposal] SPARK-1171: simplify the implementa...

2014-03-02 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/63#issuecomment-36486838 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [Proposal] SPARK-1171: simplify the implementa...

2014-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/63#issuecomment-36486939 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1173. Improve scala streaming docs.

2014-03-02 Thread kimballa
Github user kimballa commented on the pull request: https://github.com/apache/spark/pull/64#issuecomment-36487463 Here you go --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-1173. Improve scala streaming docs.

2014-03-02 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/64#issuecomment-36487507 Actually you will need to submit another PR. I've already merged this one (but github is laggy because it is waiting for the asf git bot to synchronize). Sorry about the

[GitHub] spark pull request: SPARK-1173. (#2) Fix typo in Java streaming ex...

2014-03-02 Thread kimballa
GitHub user kimballa opened a pull request: https://github.com/apache/spark/pull/65 SPARK-1173. (#2) Fix typo in Java streaming example. Companion commit to pull request #64, fix the typo on the Java side of the docs. You can merge this pull request into a Git repository by

[GitHub] spark pull request: SPARK-1173. (#2) Fix typo in Java streaming ex...

2014-03-02 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/65#issuecomment-36487798 I merged this one too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this