[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

2016-08-30 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/9571 sorry, didn't see that one. Will fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #14835: [SPARK-17243] [Web UI] Spark 2.0 History Server w...

2016-08-30 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14835#discussion_r76764190 --- Diff: core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala --- @@ -100,6 +100,7 @@ class HistoryServerSuite extends

[GitHub] spark issue #14718: [SPARK-16711] YarnShuffleService doesn't re-init properl...

2016-08-30 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14718 LevelDB is JNI so you can't shade it; there's been some careful review so that YARN NMs and Spark shuffle are in sync here. It's jackson versions which break things. --- If your project

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2016-08-29 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r76593850 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -244,6 +244,31 @@ class SparkHadoopUtil extends Logging

[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

2016-08-28 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/9571 Test failures timeout related; unlikely to be due to this patch ``` Test Result (2 failures / +2) org.apache.spark.sql.hive.HiveSparkSubmitSuite.dir

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-08-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r76515026 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -225,14 +274,26 @@ class HistoryServer

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2016-08-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r76514866 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -244,6 +244,31 @@ class SparkHadoopUtil extends Logging

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-08-27 Thread steveloughran
GitHub user steveloughran reopened a pull request: https://github.com/apache/spark/pull/9571 [SPARK-11373] [CORE] Add metrics to the History Server and FsHistoryProvider This adds metrics to the history server, with the `FsHistoryProvider` metering its load, performance

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-08-27 Thread steveloughran
Github user steveloughran closed the pull request at: https://github.com/apache/spark/pull/9571 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #14835: [SPARK-17243] [Web UI] Spark 2.0 History Server won't lo...

2016-08-26 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14835 I'll let other people review the source code in detail, except note that people calling the REST API may want to ask for all entries, even if the web view asks for less. 1. the rest

[GitHub] spark pull request #14835: [SPARK-17243] [Web UI] Spark 2.0 History Server w...

2016-08-26 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14835#discussion_r76491726 --- Diff: dev/.rat-excludes --- @@ -101,3 +101,4 @@ org.apache.spark.scheduler.ExternalClusterManager .*\.sql .Rbuildignore

[GitHub] spark pull request #14827: [SPARK-17259] [build] [WiP] Hadoop 2.7 profile to...

2016-08-26 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14827#discussion_r76401596 --- Diff: pom.xml --- @@ -2511,8 +2511,11 @@ hadoop-2.7 + --- End diff -- How can I set this profile

[GitHub] spark issue #14827: [SPARK-17259] [build] [WiP] Hadoop 2.7 profile to depend...

2016-08-26 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14827 This patch tries to set the default version to 2.7; I'll see if SBT picks it up. This is *not* something I'm proposing for the final merge; there I expect people to still go

[GitHub] spark pull request #14827: [SPARK-17259] [build] [WiP] Hadoop 2.7 profile to...

2016-08-26 Thread steveloughran
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/14827 [SPARK-17259] [build] [WiP] Hadoop 2.7 profile to depend on Hadoop 2.7.3 ## What changes were proposed in this pull request? increment the `hadoop.version` value in the `hadoop-2.7

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Having looked at the source code, `FileSystem.globStatus()` uses the glob patterns, which are not the same as the posix regexp ones. [org.apache.hadoop.fs.GlobPattern](http://grepcode.com

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2016-08-24 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r76110009 --- Diff: docs/streaming-programming-guide.md --- @@ -644,13 +644,39 @@ methods for creating DStreams from files as input sources

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2016-08-24 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r76105141 --- Diff: docs/streaming-programming-guide.md --- @@ -644,13 +644,39 @@ methods for creating DStreams from files as input sources

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2016-08-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r75945926 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala --- @@ -196,29 +192,33 @@ class FileInputDStream[K, V, F

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2016-08-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r75945790 --- Diff: docs/streaming-programming-guide.md --- @@ -644,13 +644,39 @@ methods for creating DStreams from files as input sources

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 The logic has got complex enough it merits unit tests. Pulling into SparkHadoopUtils itself and writing some for the possible: simple, glob matches one , glob matches 1+, glob doesn't match

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 1. updated the code to bypass the glob routine when there is no wildcard; this bypasses something fairly inefficient. 1. reporting FNFE on that base dir differently; skip the stack trace

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 I've now done the [s3a streaming test/example](https://github.com/steveloughran/spark/blob/features/SPARK-7481-cloud/cloud/src/main/scala/org/apache/spark/cloud/s3/examples/S3Streaming.scala

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Actually, I've just noticed that DStream behaviour isn't in sync with the streaming programming guide, which says "files written in nested directories not supported)". That is: S

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 LGTM. I was trying to see if there was a way to create a good test here by triggering the takes-too-long codepath and having a counter, but there's no obvious way to do that deterministically

[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...

2016-08-22 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14038 There's no performance problem from filtering just on names. It's when people try to filter on more complex things (file type, timestamp) they need to call `getFileStatus(path)` and that's

[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...

2016-08-22 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14038 Oh, i don't want to take on any more work...I just think you should make the predicate passed in something that goes `FileStatus => Boolean` instead of `String => Boolean`, and

[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-21 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14659 Chris: maybe the CallerContext class could check for bad characters, including spaces, newlines, "," and quotation marks .. the usual things to break parsers. Th

[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-21 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14659 having some problems adding you as a contributor; JIRA scale issues, browser problems , I've asked others to try and do it. Start with the coding; I'll sort out the contributor entry

[GitHub] spark issue #14601: [SPARK-13979][Core] Killed executor is re spawned withou...

2016-08-21 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14601 I'd like to propose that the list of filesystem properties to propagate is actually defined as a list in a spark property, default could be "fs.s3a, fs.s3n, fs.s3, fs.swift, fs

[GitHub] spark pull request #14601: [SPARK-13979][Core] Killed executor is re spawned...

2016-08-20 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14601#discussion_r75584298 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -107,6 +107,14 @@ class SparkHadoopUtil extends Logging

[GitHub] spark issue #12695: [SPARK-14914] Normalize Paths/URIs for windows.

2016-08-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12695 As #13868 does adopt `org.apache.hadoop.io.Path`, I don't see this patch being needed —though it may highlight some places where the new code may need applying --- If your project is set

[GitHub] spark issue #12695: [SPARK-14914] Normalize Paths/URIs for windows.

2016-08-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12695 If you are working with windows paths; Hadoop's Path class contains the code to do this, stabilised and addressing the corner cases --- If your project is set up for it, you can reply

[GitHub] spark pull request #14601: [SPARK-13979][Core] Killed executor is re spawned...

2016-08-20 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14601#discussion_r75584303 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -102,11 +102,19 @@ class SparkHadoopUtil extends Logging

[GitHub] spark issue #14718: [SPARK-16711] YarnShuffleService doesn't re-init properl...

2016-08-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14718 Moving the jackson/leveldb dependencies isn't going to create problems on the yarn shuffle CP are they? Given the versions aren't changing, I'm not too worried —I just want to make sure

[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14659 That Caller context doesn't list Spark as one of the users in its LimitedPrivate scope. Add a Hadoop patch there and I'll get it in. This avoids arguments later when someone breaks the API

[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...

2016-08-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14038 Path filtering in Hadoop FS calls on anything other than filename is very suboptimal; in #14731 you can see where the filtering has been postoned until after the listing, when the full

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2016-08-20 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r75584026 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala --- @@ -293,8 +290,8 @@ class FileInputDStream[K, V, F

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2016-08-20 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r75584030 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala --- @@ -241,16 +233,21 @@ class FileInputDStream[K, V, F

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 to be precise: the caching of file modification times is superfluous. It's there to avoid the cost of executing `getFileStatus()` on previously scanned files. Once you use the FileStatus

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 # I'm going to scan through and tune them elsewhere; really I'm going by uses of the listFiles calls There's actually no significant use elsewhere that I can see; just a couple

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2016-08-20 Thread steveloughran
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/14731 [SPARK-17159] [streaming]: optimise check for new files in FileInputDStream ## What changes were proposed in this pull request? This PR optimises the filesystem metadata reads

[GitHub] spark issue #14371: [SPARK-16736] [core] + SQL purge superfluous fs calls

2016-08-17 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14371 ..rebased the patch against master; addressed @vanzin's comments. the `mkdirs()` change in `HDFSBackedStateStoreProvider` done after reviewing code in Hadoop, esp. HDFS and RawLocal. When

[GitHub] spark pull request #14371: [SPARK-16736] [core] + SQL purge superfluous fs c...

2016-08-17 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14371#discussion_r75114999 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/FileBasedWriteAheadLog.scala --- @@ -231,13 +232,17 @@ private[streaming] class

[GitHub] spark pull request #14371: [SPARK-16736] [core] + SQL purge superfluous fs c...

2016-08-17 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14371#discussion_r75114918 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala --- @@ -443,6 +445,9 @@ private

[GitHub] spark pull request #14371: [SPARK-16736] [core] + SQL purge superfluous fs c...

2016-08-17 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14371#discussion_r75114636 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala --- @@ -278,14 +278,15 @@ private

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-08-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r74963037 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -226,6 +259,135 @@ class HistoryServer

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-08-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r74962913 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -226,6 +259,135 @@ class HistoryServer

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-08-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r74962863 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -226,6 +259,135 @@ class HistoryServer

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-08-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r74960995 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -226,6 +259,135 @@ class HistoryServer

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-08-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r74911975 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -114,28 +123,45 @@ class HistoryServer( * this UI

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-08-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r74910950 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -667,6 +710,123 @@ private[history] class

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-08-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r74910734 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -667,6 +710,123 @@ private[history] class

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-08-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r74910796 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -667,6 +710,123 @@ private[history] class

[GitHub] spark issue #14646: [SPARK-17058] [build] Add maven snapshots-and-staging pr...

2016-08-16 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14646 I'm adding the ability to test against staged releases, such as Hadoop 2.7.3 RC1. Add this profile and testing that spark runs with the new RC is a matter of setting the version with a -D

[GitHub] spark issue #14646: [SPARK-17058] [build] Add maven snapshots-and-staging pr...

2016-08-15 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14646 I'd be against making it default for a few reasons 1. You don't want to accidentally pick up some staging artifact or upstream snapshot. 2. I don't know how SBT/Ivy handles remote

[GitHub] spark issue #14646: [SPARK-17058] [build] Add maven snapshots-and-staging pr...

2016-08-15 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14646 Note that Jenkins, being SBT-based, isn't going to explore the codepath here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #14646: [SPARK-17058] [build] Add maven snapshots-and-sta...

2016-08-15 Thread steveloughran
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/14646 [SPARK-17058] [build] Add maven snapshots-and-staging profile to build/test against staging artifacts ## What changes were proposed in this pull request? Adds a `snapshots

[GitHub] spark pull request #13830: [SPARK-16121] ListingFileCatalog does not list in...

2016-08-13 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/13830#discussion_r74684998 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala --- @@ -73,21 +73,67 @@ class

[GitHub] spark pull request #14371: [SPARK-16736] Core+ SQL superfluous fs calls

2016-08-13 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14371#discussion_r74684971 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/FileBasedWriteAheadLog.scala --- @@ -231,13 +232,17 @@ private[streaming] class

[GitHub] spark issue #14371: [SPARK-16736] Core+ SQL superfluous fs calls

2016-08-07 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14371 Pulled the WiP; happy for full reviews —though I'm on vacation right now, so can't handle feedback just yet --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #14371: [SPARK-16736] WiP Core+ SQL superfluous fs calls

2016-07-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14371#discussion_r72452587 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala --- @@ -340,13 +341,15 @@ private

[GitHub] spark pull request #14371: [SPARK-16736] WiP Core+ SQL superfluous fs calls

2016-07-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14371#discussion_r72451806 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -301,14 +303,23 @@ private[spark] object

[GitHub] spark pull request #14371: [SPARK-16736] WiP Core+ SQL superfluous fs calls

2016-07-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14371#discussion_r72451702 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -90,8 +90,13 @@ private[spark] class EventLoggingListener

[GitHub] spark pull request #14371: [SPARK-16736] WiP Core+ SQL superfluous fs calls

2016-07-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14371#discussion_r72451554 --- Diff: core/src/main/scala/org/apache/spark/rdd/ReliableCheckpointRDD.scala --- @@ -240,7 +248,7 @@ private[spark] object ReliableCheckpointRDD

[GitHub] spark pull request #14371: [SPARK-16736] WiP Core+ SQL superfluous fs calls

2016-07-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14371#discussion_r72450542 --- Diff: core/src/main/scala/org/apache/spark/rdd/ReliableCheckpointRDD.scala --- @@ -166,17 +166,25 @@ private[spark] object ReliableCheckpointRDD

[GitHub] spark pull request #14371: [SPARK-16736] WiP Core+ SQL superfluous fs calls

2016-07-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14371#discussion_r72446633 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1410,10 +1410,12 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request #14371: [SPARK-16736] WiP Core+ SQL superfluous fs calls

2016-07-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14371#discussion_r72429947 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1410,10 +1410,12 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request #14371: [SPARK-16736] WiP Core+ SQL superfluous fs calls

2016-07-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14371#discussion_r72426957 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -301,14 +303,23 @@ private[spark] object

[GitHub] spark pull request #14371: [SPARK-16736] WiP Core+ SQL superfluous fs calls

2016-07-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14371#discussion_r72426791 --- Diff: core/src/main/scala/org/apache/spark/rdd/ReliableCheckpointRDD.scala --- @@ -166,17 +166,25 @@ private[spark] object ReliableCheckpointRDD

[GitHub] spark pull request #14371: [SPARK-16736] WiP Core+ SQL superfluous fs calls

2016-07-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14371#discussion_r72426474 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -404,6 +404,27 @@ class SparkHadoopUtil extends Logging

[GitHub] spark pull request #14371: [SPARK-16736] WiP Core+ SQL superfluous fs calls

2016-07-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14371#discussion_r72425952 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1410,10 +1410,12 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request #14371: [SPARK-16736] WiP Core+ SQL superfluous fs calls

2016-07-26 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14371#discussion_r72302178 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -90,8 +90,13 @@ private[spark] class EventLoggingListener

[GitHub] spark pull request #14371: [SPARK-16736] WiP Core+ SQL superfluous fs calls

2016-07-26 Thread steveloughran
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/14371 [SPARK-16736] WiP Core+ SQL superfluous fs calls ## What changes were proposed in this pull request? A review of the code, working back from Hadoop's `FileSystem.exists

[GitHub] spark pull request #13830: [SPARK-16121] ListingFileCatalog does not list in...

2016-07-26 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/13830#discussion_r72242750 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala --- @@ -73,21 +73,67 @@ class

[GitHub] spark issue #14163: [SPARK-15923][YARN] Spark Application rest api returns '...

2016-07-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14163 LGTM. Clarifies that it is yarn-cluster mode only, not in client. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

2016-07-14 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/9571 This patch adds separate average values of the load times vs merge times per event; this shows ~2x difference in replay from load in the test case., These `.time` gauges are little

[GitHub] spark issue #12004: [SPARK-7481][build] [WIP] Add Hadoop 2.6+ spark-cloud mo...

2016-06-28 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 downgrading to a WIP as to work reliably it needs [HADOOP-12636](https://issues.apache.org/jira/browse/HADOOP-12636) on the hadoop code else the presence of `hadoop-aws.jar` on the CP without

[GitHub] spark pull request #13218: [SPARK-15440] [Core] [Deploy] Add CSRF Filter for...

2016-06-17 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/13218#discussion_r67489509 --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/RestCsrfPreventionFilter.scala --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #13218: [SPARK-15440] [Core] [Deploy] Add CSRF Filter for REST A...

2016-06-17 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/13218 I can see there is fear of breaking things, especially with third party clients. There's also the risk of cross-version submissions; the REST API is meant to be stable enough for backwards

[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

2016-06-16 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/9571 One other metric set I'm thinking of relates to a JIRA on app UIs not being visible: making the time of last scan a metric, both as an epoch time and diff from current time. That would let

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-06-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r67326183 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -395,7 +429,8 @@ private[history] class FsHistoryProvider

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-06-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r67325987 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/ApplicationHistoryProvider.scala --- @@ -110,3 +127,87 @@ private[history] abstract

[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

2016-06-10 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/9571 Updated patch. Addresses indentation, found and eliminated one more call to {{initialize()}} outside of constructor. Adds a whole new counter, `event.replay.count`, which counts

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-06-10 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r66614402 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -667,6 +700,90 @@ private[history] class FsHistoryProvider

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-06-10 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r66613829 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -278,6 +303,9 @@ private[history] class FsHistoryProvider

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-06-10 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r66612954 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -114,28 +123,45 @@ class HistoryServer( * this UI

[GitHub] spark issue #7786: [SPARK-9468][Yarn][Core] Avoid scheduling tasks on preemp...

2016-06-10 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/7786 > But I'm just trying to point out that the current change doesn't really make things better. Without killing the executor, you'll still be holding on to resources, except now you would

[GitHub] spark pull request #13579: [SPARK-15844] [core] HistoryServer doesn't come u...

2016-06-09 Thread steveloughran
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/13579 [SPARK-15844] [core] HistoryServer doesn't come up if spark.authenticate = true ## What changes were proposed in this pull request? During history server startup, the spark

[GitHub] spark issue #7786: [SPARK-9468][Yarn][Core] Avoid scheduling tasks on preemp...

2016-06-09 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/7786 @vanzin I suspect that if you get told you are being pre-empted, you aren't likely to get containers elsewhere —pre-emption is a sign of demand being too high, and your queue lower priority

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2016-06-08 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r66204561 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -114,28 +123,45 @@ class HistoryServer( * this UI

[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...

2016-05-26 Thread steveloughran
Github user steveloughran commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-221970852 thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [MINOR][SPARKR][DOC] Add a description for run...

2016-05-20 Thread steveloughran
Github user steveloughran commented on the pull request: https://github.com/apache/spark/pull/13217#issuecomment-220592819 1. It's nice to see someone sitting down to deal with the windows test problem. 1. Hadoop 2.8+ will fail meaningfully here, with an exception including

[GitHub] spark pull request: [MINOR][SPARKR][DOC] Add a description for run...

2016-05-20 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/13217#discussion_r64032005 --- Diff: R/WINDOWS.md --- @@ -11,3 +11,19 @@ include Rtools and R in `PATH`. directory in Maven in `PATH`. 4. Set `MAVEN_OPTS` as described

[GitHub] spark pull request: [SPARK-7481][build][WIP] Add Hadoop 2.6+ spark...

2016-05-06 Thread steveloughran
Github user steveloughran commented on the pull request: https://github.com/apache/spark/pull/12004#issuecomment-217536672 For anyone trying to run these tests, they'll need a test xml file and refer to it ``` mvn test -Phadoop-2.6 -Dcloud.test.configuration.file

[GitHub] spark pull request: [SPARK-13232][YARN] Fix executor node label

2016-05-06 Thread steveloughran
Github user steveloughran commented on the pull request: https://github.com/apache/spark/pull/11129#issuecomment-217520612 It's actually being fixed right now in Hadoop 2.8, which will take a while to surface. [YARN-4925](https://issues.apache.org/jira/browse/YARN-4925

[GitHub] spark pull request: [SPARK-13148] [YARN] document zero-keytab Oozi...

2016-04-26 Thread steveloughran
Github user steveloughran commented on the pull request: https://github.com/apache/spark/pull/11033#issuecomment-214831899 @tgravescs the logging bit of the patch is in sync with master/ . Is there anything else you want me to do regarding the documentation to get

[GitHub] spark pull request: [SPARK-13513] [SQL] verify Feb 29 works on a l...

2016-04-26 Thread steveloughran
Github user steveloughran closed the pull request at: https://github.com/apache/spark/pull/11394 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request: [SPARK-7481][build][WIP] Add Hadoop 2.6+ spark...

2016-04-26 Thread steveloughran
Github user steveloughran commented on the pull request: https://github.com/apache/spark/pull/12004#issuecomment-214801641 Note that as this module only builds on Hadoop >= 2.6; jenkins won't be compiling it. The tests are designed to skip running if no config file to cl

[GitHub] spark pull request: [SPARK-7481][build][WIP] Add Hadoop 2.6+ spark...

2016-04-26 Thread steveloughran
Github user steveloughran commented on the pull request: https://github.com/apache/spark/pull/12004#issuecomment-214753851 Oh, and there's an initial documentation page on spark + cloud infrastructure, which tries to make clear that object stores are not real filesystems

[GitHub] spark pull request: [SPARK-7481][build][WIP] Add Hadoop 2.6+ spark...

2016-04-26 Thread steveloughran
Github user steveloughran commented on the pull request: https://github.com/apache/spark/pull/12004#issuecomment-214747671 The latest version of this does, among other things, call FileSystem.toString after operations. In HADOOP-13028, along with seek optimisation

<    1   2   3   4   5   6   7   8   9   10   >