[GitHub] spark pull request: [SPARK-3889] Attempt to avoid SIGBUS by not mm...

2014-10-10 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/2742#issuecomment-58728319 Note: this is reqd since there are heap and vm limits enforced, so we juggle available memory around so that jobs can run to completion! On 11-Oct-2014 4:56 am

[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

2015-02-27 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/4818#issuecomment-76471465 This is specific to vcores and not mem iirc. A solution might be to check vcores returned and modify it to what we requested if found to be 1 when flag is set (we

[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

2015-02-27 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/4818#issuecomment-76487019 Looks good to me - pending addressing Tom's comment about what the default should be. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

2015-02-27 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/4818#issuecomment-76515153 @sryza When cpu scheduling is enabled (ref @tgravescs comment here and in jira) it must be validated. Just as we validate memory and while prioritizing based on

[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

2015-02-28 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/4818#issuecomment-76520133 AFAIK this is not documented or part of the YARN interfaces/public contract : I would prefer that spark depended on defined interfaces which are reasonably stable

[GitHub] spark pull request: [spark] [SPARK-6168] Expose some of the collec...

2015-03-18 Thread mridulm
GitHub user mridulm opened a pull request: https://github.com/apache/spark/pull/5084 [spark] [SPARK-6168] Expose some of the collection classes as experimental You can merge this pull request into a Git repository by running: $ git pull https://github.com/mridulm/spark master

[GitHub] spark pull request: [spark] [SPARK-6168] Expose some of the collec...

2015-03-18 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/5084#issuecomment-83194558 @pwendell Can we merge this into 1.3 as well ? Else we will have to wait for 1.4 ... --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [spark] [SPARK-6168] Expose some of the collec...

2015-03-21 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/5084#issuecomment-84309441 These are not generic scala collections - but specific to using spark at scale. Since we already have them in spark core, it is better to expose them as experimental

[GitHub] spark pull request: [spark] [SPARK-6168] Expose some of the collec...

2015-03-25 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/5084#issuecomment-86221601 Looks like I was not getting notifications for this PR - so could not participate in the discussion : sorry for the delay ! There are a few issues to be

[GitHub] spark pull request: [spark] [SPARK-6168] Expose some of the collec...

2015-03-25 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/5084#issuecomment-86239409 @srowen This is not a "wish" - having lead (and leading) multiple efforts which have been nontrivial use of spark, I do think this is required ch

[GitHub] spark pull request: [spark] [SPARK-6168] Expose some of the collec...

2015-03-27 Thread mridulm
Github user mridulm closed the pull request at: https://github.com/apache/spark/pull/5084 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [spark] [SPARK-6168] Expose some of the collec...

2015-03-27 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/5084#issuecomment-87048930 Closing based on internal discussions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-4741] Do not destroy FileInputStream an...

2014-12-09 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/3600#issuecomment-66407791 -1 This is broken change for multiple reasons - finalize of out of scope variable can trigger close of underlying fd, potential state issue with vars not being null when

[GitHub] spark pull request: [SPARK-4609] Blacklist hosts rather than execu...

2014-12-10 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/3541#issuecomment-66424149 On Thu, Dec 4, 2014 at 2:57 AM, Davies Liu wrote: > @davies <https://github.com/davies> I am not sure I completely understood > your comment.

[GitHub] spark pull request: [SPARK-4741] Do not destroy FileInputStream an...

2014-12-10 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/3600#issuecomment-66426107 Hmm, might be tricky to explain if you do not have sufficient context, let me give it a shot. a) Streams in java are not usually multiplexed - unless explicitly

[GitHub] spark pull request: [SPARK-4741] Do not destroy FileInputStream an...

2014-12-10 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/3600#issuecomment-66443297 I think you are missing the point - we should not rely on specific implementation details on whether it is currently done or not - that leads to brittle codebase

[GitHub] spark pull request: [SPARK-4741] Do not destroy FileInputStream an...

2014-12-10 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/3600#issuecomment-66451641 I think I did say this will not go into spark at the very begining of my review :-) In the assumption that you would want to continue to improve spark IO, I wanted

[GitHub] spark pull request: [SPARK-4609] Blacklist hosts rather than execu...

2014-12-02 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/3541#issuecomment-65272551 Thx @kayousterhout for the ping ! We are fairly aggressively using blacklisting executors - not hosts. The assumption that a task failed on an executor in a host

[GitHub] spark pull request: [SPARK-4609] Blacklist hosts rather than execu...

2014-12-02 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/3541#issuecomment-65301702 @davies when you can have multiple executors per host or executor restarted on host on failure, then this can manifest ... please refer to the comments that

[GitHub] spark pull request: [SPARK-4609] Blacklist hosts rather than execu...

2014-12-02 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/3541#issuecomment-65309912 Note: I am ignoring deterministic failure reasons here (which will fail on any host and usually points to bug in user or spark codebase). Task failure could be due to

[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-40707049 Most of the changes in the diff look unrelated to what is mentioned in the summary. In addition, they introduce additional bugs. Please cleanup the diffs and

[GitHub] spark pull request: Fix thread leak

2014-04-23 Thread mridulm
GitHub user mridulm opened a pull request: https://github.com/apache/spark/pull/504 Fix thread leak mvn test fails (intermittently) due to thread leak - since scalatest runs all tests in same vm. You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: Windows fixes

2014-04-23 Thread mridulm
GitHub user mridulm opened a pull request: https://github.com/apache/spark/pull/505 Windows fixes Unfortunately, this is not exhaustive - particularly hive tests still fail due to path issues. You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request: [SPARK-1485][MLLIB] Implement Butterfly AllRed...

2014-04-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/506#issuecomment-41144433 Might be a good idea to move this out of mllib and push this into core itself. The utility of this PR seems more fundamental than just for ML (assuming it does

[GitHub] spark pull request: SPARK-1586 Windows build fixes

2014-04-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/505#issuecomment-41181073 I am not sure why this has failed - since it works locally on both linux and windows for me. Not sure why StringEscapeUtils.escapeJava is failing for this .. let me

[GitHub] spark pull request: SPARK-1586 Windows build fixes

2014-04-23 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/505#discussion_r11914906 --- Diff: bin/compute-classpath.cmd --- @@ -1,69 +1,88 @@ -@echo off - -rem -rem Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] spark pull request: SPARK-1587 Fix thread leak

2014-04-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/504#issuecomment-41195989 CC @tdas probably leftovers from the gc patch ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-1485][MLLIB] Implement Butterfly AllRed...

2014-04-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/506#issuecomment-41206844 Agree, I was not suggesting that this specific change per-se makes it into core. Just that there are a lot of applications for all-reduce support in spark : and if it

[GitHub] spark pull request: SPARK-1586 Windows build fixes

2014-04-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/505#issuecomment-41208762 CC @mateiz @rxin a lot of these changes have to do with incorrect string -> byte conversions. Plus the classpath computation fix. Please note that this is

[GitHub] spark pull request: SPARK-1588 Re-add support for SPARK_YARN_USER_...

2014-04-23 Thread mridulm
GitHub user mridulm opened a pull request: https://github.com/apache/spark/pull/514 SPARK-1588 Re-add support for SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS This is what I did to unblock our jobs - please feel free to modify or create a new PR based on this in case this wont suffice

[GitHub] spark pull request: SPARK-1587 Fix thread leak

2014-04-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/504#issuecomment-41211197 not some, i think most :-) wanted to run this past you since you have better context in case i am missing something --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-1588 Re-add support for SPARK_YARN_USER_...

2014-04-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/514#issuecomment-41214136 Unrelated failures imo ? Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-1588 Re-add support for SPARK_YARN_USER_...

2014-04-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/514#issuecomment-41214164 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1588 Re-add support for SPARK_YARN_USER_...

2014-04-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/514#issuecomment-41214374 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1588 Re-add support for SPARK_YARN_USER_...

2014-04-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/514#issuecomment-41219973 again unrelated failures imo ... CC @pwendell ... any idea why these tests keep failing ? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: SPARK-1586 Windows build fixes

2014-04-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/505#issuecomment-41224920 The failures are while executing testcases from hive project in windows. Mostly paths getting mangled and so on - I do have cygwin, so that is not the issue (which is

[GitHub] spark pull request: SPARK-1586 Windows build fixes

2014-04-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/505#issuecomment-41244129 Quite a lot of them still fail - all failures now are from sql/hive - and all of them are path related issues iirc. I am not sure if they were due to the paths

[GitHub] spark pull request: SPARK-1588 Re-add support for SPARK_YARN_USER_...

2014-04-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/514#issuecomment-41244177 ah ok, that sucks ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-1586 Windows build fixes

2014-04-24 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/505#issuecomment-41360908 sounds good, thanks ! i was testing windows to ensure there are no encoding/etc issues with the 2G patch actually - and this is spin off of that :-) --- If your

[GitHub] spark pull request: SPARK-1588 Re-add support for SPARK_YARN_USER_...

2014-04-24 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/514#issuecomment-41360996 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1588 Re-add support for SPARK_YARN_USER_...

2014-04-24 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/514#issuecomment-41363418 Thanks patrick - looks like the tests passed now ! This was just a quick and dirty change based on what I did to unblock our immediate jobs - so might be suboptimal

[GitHub] spark pull request: Add recursive directory file search to fileInp...

2014-04-27 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/537#discussion_r12029172 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -327,18 +327,18 @@ class StreamingContext private[streaming

[GitHub] spark pull request: SPARK-1623. Broadcast cleaner should use getCa...

2014-04-27 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/546#discussion_r12029270 --- Diff: core/src/main/scala/org/apache/spark/broadcast/HttpBroadcast.scala --- @@ -229,7 +229,7 @@ private[spark] object HttpBroadcast extends Logging

[GitHub] spark pull request: SPARK-1623. Broadcast cleaner should use getCa...

2014-04-27 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/546#issuecomment-41497440 Are you actually seeing problems or is this a cleanup exercise to use appropriate api ? Creation of the file happens from within spark and is not externally provided

[GitHub] spark pull request: SPARK-1632. Remove unnecessary boxing in compa...

2014-04-27 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/559#discussion_r12029894 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -337,8 +337,8 @@ class ExternalAppendOnlyMap[K, V, C

[GitHub] spark pull request: SPARK-1648 Support closing JIRA's as part of m...

2014-04-27 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/570#issuecomment-41501708 Not review of the PR, but this is a great addition ! Btw, does it also update the branch(es) that the PR was committed to ? What about leaving the JIRA unresolved

[GitHub] spark pull request: SPARK-1623. Broadcast cleaner should use getCa...

2014-04-27 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/546#issuecomment-41503498 It goes back to the problem we are trying to solve. If the set/map can contain arbitrary paths then file.getCanonical is unavoidable. But then (multiple) IO and

[GitHub] spark pull request: SPARK-1648 Support closing JIRA's as part of m...

2014-04-27 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/570#issuecomment-41503564 I meant, depending on which branch it is committed to, update Fix Version in jira ? Usual expectation is for this field to be appropriately updated : though I am

[GitHub] spark pull request: SPARK-1648 Support closing JIRA's as part of m...

2014-04-27 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/570#issuecomment-41511160 Yes, that is exactly what I was referring to - since the committer specifies the branches to commit to; use that to populate the JIRA field. If this is being done

[GitHub] spark pull request: SPARK-1623. Broadcast cleaner should use getCa...

2014-05-03 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/546#issuecomment-42110226 It is not about a few uses here or there - either spark codebase as a whole moves to a) canonical path always; or always sticks to b) paths relative to cwd and/or what is

[GitHub] spark pull request: SPARK-1656: Fix potential resource leaks

2014-05-03 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/577#discussion_r12257197 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -77,7 +77,12 @@ private class DiskStore(blockManager: BlockManager, diskManager

[GitHub] spark pull request: SPARK-1656: Fix potential resource leaks

2014-05-03 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/577#issuecomment-42110570 Might be good idea to abstract out the try/finally idiom out. @mateiz, any thoughts ? We have a bunch of places where resource cleanup does not happen properly - which

[GitHub] spark pull request: SPARK-1667 re-fetch fails occasionally

2014-05-03 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/604#discussion_r12257233 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockFetcherIterator.scala --- @@ -203,14 +203,22 @@ object BlockFetcherIterator { // these

[GitHub] spark pull request: SPARK-1667 re-fetch fails occasionally

2014-05-03 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/604#discussion_r12257245 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -71,13 +71,13 @@ private[spark] class DiskBlockManager(shuffleManager

[GitHub] spark pull request: SPARK-1667 re-fetch fails occasionally

2014-05-03 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/604#issuecomment-4256 Can you add a testcase to verify this ? Where None is returned and validated. I suspect we have also observed in the past too - but I never got around to

[GitHub] spark pull request: Fix SPARK-1629: Spark should inline use of com...

2014-05-03 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/569#issuecomment-42115677 I am still catching up on PR's and bugs. Why was this changed ? Hacky solutions based on string parsing of properties lead to fragility in case of chang

[GitHub] spark pull request: Remove hardcoded Spark version string to use S...

2014-05-03 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/600#issuecomment-42115749 Please ensure it works in both sbt and maven case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: Fix SPARK-1629: Spark should inline use of com...

2014-05-03 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/569#issuecomment-42120375 Maven dependency still shows org.apache.commons:commons-lang3:jar - am I missing something here ? Btw, we do depend on it for repl tests too ... --- If your

[GitHub] spark pull request: Fix SPARK-1629: Spark should inline use of com...

2014-05-04 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/569#issuecomment-42132244 If we are depending on something and bundling it, we might as well use it instead of duplicating code and having to maintain the changes : assuming it is intutive

[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-05-04 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/634#issuecomment-42143958 Better would be to make this configurable (and increase it - not remove). Currently, we actually sleep for much longer in our jobs to ensure a minimum number of

[GitHub] spark pull request: Fix SPARK-1629: Spark should inline use of com...

2014-05-04 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/569#issuecomment-42144008 Yes, lang3 - not lang : some of the methods (for example, the escape method used in repl) is actually broken in lang, but works in lang3. On Sun, May 4

[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-05-05 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/634#issuecomment-42172635 The scheduling and data locality come later. This preceeds all that. To give example - suppose we need 200 containers to run a job; as soon as we start, we

[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-05-05 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/634#issuecomment-42172774 Btw I dont recall exactly why the actual value of 3 seconds is there; I think earlier spark used to crash in case there are no containers available when job is going to

[GitHub] spark pull request: [SPARK-1460] Returning SchemaRDD instead of no...

2014-05-06 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/448#discussion_r12346057 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/EdgeRDD.scala --- @@ -51,18 +51,12 @@ class EdgeRDD[@specialized ED: ClassTag]( override

[GitHub] spark pull request: [SPARK-1460] Returning SchemaRDD instead of no...

2014-05-06 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/448#discussion_r12345988 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -138,7 +138,7 @@ abstract class RDD[T: ClassTag]( * it is computed. This can only

[GitHub] spark pull request: SPARK-1713. Use a thread pool for launching ex...

2014-05-06 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/663#issuecomment-42356011 That sounds reasonable approach - will know more when PR is updated. The initial idea behind separate Thread for each was exactly that - we expected low hundreds

[GitHub] spark pull request: [SPARK-1460] Returning SchemaRDD instead of no...

2014-05-06 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/448#discussion_r12347791 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -138,7 +138,7 @@ abstract class RDD[T: ClassTag]( * it is computed. This can only

[GitHub] spark pull request: [SPARK-1460] Returning SchemaRDD instead of no...

2014-05-06 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/448#discussion_r12347808 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/EdgeRDD.scala --- @@ -51,18 +51,12 @@ class EdgeRDD[@specialized ED: ClassTag]( override

[GitHub] spark pull request: [SPARK-1460] Returning SchemaRDD instead of no...

2014-05-06 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/448#discussion_r12348272 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -138,7 +138,7 @@ abstract class RDD[T: ClassTag]( * it is computed. This can only

[GitHub] spark pull request: [SPARK-1460] Returning SchemaRDD instead of no...

2014-05-06 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/448#discussion_r12357557 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -138,7 +138,7 @@ abstract class RDD[T: ClassTag]( * it is computed. This can only

[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/636#discussion_r12381217 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -523,6 +504,90 @@ private[spark] class Master

[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/636#discussion_r12380305 --- Diff: core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala --- @@ -20,7 +20,7 @@ package org.apache.spark.deploy private[spark

[GitHub] spark pull request: SPARK-1588 Re-add support for SPARK_YARN_USER_...

2014-05-11 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/514#issuecomment-42525085 Closing this assuming it has been resolved through other PR's. Please let me know in case this is not the case. --- If your project is set up for it, you can rep

[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-11 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/731#discussion_r12511528 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -466,30 +466,14 @@ private[spark] class Master( * launched an

[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-11 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/731#discussion_r12511541 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -532,6 +516,99 @@ private[spark] class Master

[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-11 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/731#discussion_r12511580 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -532,6 +516,99 @@ private[spark] class Master

[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-11 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/731#discussion_r12511602 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -466,30 +466,14 @@ private[spark] class Master( * launched an

[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-11 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/731#discussion_r12511553 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -532,6 +516,99 @@ private[spark] class Master

[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-11 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/731#discussion_r12512027 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -532,6 +516,99 @@ private[spark] class Master

[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-11 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/731#discussion_r12512000 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -466,30 +466,14 @@ private[spark] class Master( * launched an

[GitHub] spark pull request: use Iterator#size in RDD#count

2014-05-13 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/736#issuecomment-42968273 This is not equivalent performance wise from casual look. Even assuming everything is same, it is still invoking function in loop versus direct addition. --- If your

[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-14 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/636#discussion_r12380922 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -523,6 +504,90 @@ private[spark] class Master

[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-15 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/636#discussion_r12381063 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -523,6 +504,90 @@ private[spark] class Master

[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-15 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/636#discussion_r12380632 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -523,6 +504,90 @@ private[spark] class Master

[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-15 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/636#discussion_r12380369 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -457,35 +457,16 @@ private[spark] class Master( * launched an

[GitHub] spark pull request: SPARK-1588 Re-add support for SPARK_YARN_USER_...

2014-05-15 Thread mridulm
Github user mridulm closed the pull request at: https://github.com/apache/spark/pull/514 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-16 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/636#discussion_r12380276 --- Diff: core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala --- @@ -28,6 +28,7 @@ private[spark] class ApplicationDescription

[GitHub] spark pull request: improve performance of MemoryStore#tryToPut by...

2014-05-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43387715 IMO this make things fragile. First off, not MT safe. Secondly does not handle corner cases - for example exception handling. --- If your project is set up for

[GitHub] spark pull request: SPARK-1813. Add a utility to SparkConf that ma...

2014-05-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/789#issuecomment-43386951 As I mentioned in the jira, I don't see value in this change - it is a corner case trying to save about 5 lines of straightforward code while adding to the publi

[GitHub] spark pull request: SPARK-1813. Add a utility to SparkConf that ma...

2014-05-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/789#issuecomment-43388853 I have elaborated in the jira, but I will repeat it again for clarity: This is adding an api for a specific case - it assumes single serialization type

[GitHub] spark pull request: SPARK-1864 Look in spark conf instead of syste...

2014-05-17 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/808#issuecomment-43401491 Though I don't think there is any issue with this change specifically, would be better if there is atleast someone else reviews a PR and gives a go b

[GitHub] spark pull request: SPARK-1864 Look in spark conf instead of syste...

2014-05-17 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/808#issuecomment-43401687 Ah ! Great :-) Would have been more clear if at least asfgit had indicated who committed it in the message. On 17-May-2014 2:22 pm, "Aaron Davidson&qu

[GitHub] spark pull request: improve performance of MemoryStore#tryToPut by...

2014-05-17 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43413071 Use of dropping is not my safe On 17-May-2014 9:42 pm, "Wenchen Fan" wrote: > This is thread safe. tryToPut call ensureFreeSpace in a synchronize

[GitHub] spark pull request: improve performance of MemoryStore#tryToPut by...

2014-05-19 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43485552 It should read MT safe - phone "autocorrected" it, sigh. There could be any number of reasons for dropping block to fail (including disk issues, etc).

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-20 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43618603 It is not MT safe because the PR is checking/modifiying shared state (like dropping variable) in an unsafe manner. I will comment in detail on the patch later today

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-20 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/791#discussion_r12840665 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -166,45 +166,51 @@ private class MemoryStore(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-20 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/791#discussion_r12840780 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -166,45 +166,51 @@ private class MemoryStore(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-20 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/791#discussion_r12840885 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -243,10 +250,13 @@ private class MemoryStore(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-20 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/791#discussion_r12841208 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -166,45 +166,51 @@ private class MemoryStore(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/791#discussion_r12888619 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -243,10 +250,13 @@ private class MemoryStore(blockManager: BlockManager

<    4   5   6   7   8   9   10   11   12   13   >