[GitHub] spark issue #17872: [SPARK-20608] allow standby namenodes in spark.yarn.acce...

2017-05-05 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17872 at a glance, patch LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #17872: [SPARK-20608] allow standby namenodes in spark.ya...

2017-05-05 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17872#discussion_r114985141 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/security/HadoopFSCredentialProvider.scala --- @@ -81,8 +90,15

[GitHub] spark pull request #17834: [SPARK-7481] [build] Add spark-hadoop-cloud modul...

2017-05-05 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17834#discussion_r114982772 --- Diff: hadoop-cloud/pom.xml --- @@ -0,0 +1,185 @@ + + +http://maven.apache.org/POM/4.0.0; + xmlns:xsi="http://www.w3.org

[GitHub] spark pull request #17834: [SPARK-7481] [build] Add spark-hadoop-cloud modul...

2017-05-05 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17834#discussion_r114982436 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,203 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #17834: [SPARK-7481] [build] Add spark-hadoop-cloud modul...

2017-05-05 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17834#discussion_r114982357 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,203 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #17870: [SPARK-20608] allow standby namenodes in spark.ya...

2017-05-05 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17870#discussion_r114972572 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/security/HDFSCredentialProvider.scala --- @@ -75,8 +84,15 @@ private[security] class

[GitHub] spark issue #17834: [SPARK-7481] [build] Add spark-hadoop-cloud module to pu...

2017-05-04 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17834 OK, now I understand. let me revert that bit of the patch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17834: [SPARK-7481] [build] Add spark-hadoop-cloud module to pu...

2017-05-03 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17834 I've just pushed up an update which changes the module name; tested in maven and SBT; hadoop cloud JAR dependencies pulled down. A JAR is created, it's just a stub one. As a result

[GitHub] spark pull request #17834: [SPARK-7481] [build] Add spark-hadoop-cloud modul...

2017-05-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17834#discussion_r114652036 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,190 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #17834: [SPARK-7481] [build] Add spark-hadoop-cloud modul...

2017-05-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17834#discussion_r114646578 --- Diff: cloud/pom.xml --- @@ -0,0 +1,106 @@ + --- End diff -- OK --- If your project is set up for it, you can reply

[GitHub] spark pull request #17834: [SPARK-7481] [build] Add spark-hadoop-cloud modul...

2017-05-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17834#discussion_r114644953 --- Diff: pom.xml --- @@ -1145,6 +1150,70 @@ + + --- End diff -- OK, I'll

[GitHub] spark issue #17834: [SPARK-7481] [build] Add spark-hadoop-cloud module to pu...

2017-05-03 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17834 The last one was on all the doc comments, and believe I've addressed them both with the little typos and by focusing the docs on the main points for Spark users: how stores differ from

[GitHub] spark pull request #17834: [SPARK-7481] [build] Add spark-hadoop-cloud modul...

2017-05-02 Thread steveloughran
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/17834 [SPARK-7481] [build] Add spark-hadoop-cloud module to pull in object store access. ## What changes were proposed in this pull request? Add a new `spark-hadoop-cloud ` module

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-05-02 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r114331264 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,512 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-05-02 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r114330941 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,512 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-05-02 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 github isn't letting me reopen this, so I'm going to submit the patch with reworked docs as a new PR. The machines do not like me today. --- If your project is set up for it, you can reply

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-29 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r114056158 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,512 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113997967 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,512 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113995508 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,512 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113979717 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,512 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113979701 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,512 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113976778 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,512 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113976861 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,512 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113976699 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,512 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113976264 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,512 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113971968 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,512 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113970275 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,512 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113969133 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,512 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113967945 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,512 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113962864 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,512 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113950929 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,512 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113950943 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,512 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-28 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113950840 --- Diff: docs/cloud-integration.md --- @@ -0,0 +1,512 @@ +--- +layout: global +displayTitle: Integration with Cloud Infrastructures

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113750071 --- Diff: pom.xml --- @@ -1145,6 +1150,70 @@ + + --- End diff -- I'm

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113749971 --- Diff: pom.xml --- @@ -621,6 +621,11 @@ ${fasterxml.jackson.version

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113749781 --- Diff: cloud/pom.xml --- @@ -0,0 +1,117 @@ + + +http://maven.apache.org/POM/4.0.0; + xmlns:xsi="http://www.w3.org/2001/XMLS

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113749812 --- Diff: docs/storage-openstack-swift.md --- @@ -19,41 +20,32 @@ Although not mandatory, it is recommended to configure the proxy server of Swift

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113749428 --- Diff: cloud/pom.xml --- @@ -0,0 +1,158 @@ + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLS

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113725231 --- Diff: cloud/pom.xml --- @@ -0,0 +1,117 @@ + + +http://maven.apache.org/POM/4.0.0; + xmlns:xsi="http://www.w3.org/2001/XMLS

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/12004#discussion_r113725132 --- Diff: assembly/pom.xml --- @@ -226,5 +226,19 @@ provided + + + + cloud --- End

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-04-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Reynold, I know very much about the time of reviewers, I put 1+h a day on the hadoop codebase reviewing stuff, generally trying to review the work of non-colleagues, so as to pull

[GitHub] spark pull request #12004: [SPARK-7481] [build] Add spark-cloud module to pu...

2017-04-24 Thread steveloughran
Github user steveloughran closed the pull request at: https://github.com/apache/spark/pull/12004 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #17747: [SPARK-11373] [CORE] Add metrics to the FsHistory...

2017-04-24 Thread steveloughran
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/17747 [SPARK-11373] [CORE] Add metrics to the FsHistoryProvider ## What changes were proposed in this pull request? This adds metrics to the history server, with the `FsHistoryProvider

[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

2017-04-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/9571 I'm going to close this PR and start one based on a reapplication of this patch onto master; gets rid of all the merge pain and is intended to be more minimal. The latest comments of this one

[GitHub] spark pull request #17745: [SPARK-17159][Streaming] optimise check for new f...

2017-04-24 Thread steveloughran
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/17745 [SPARK-17159][Streaming] optimise check for new files in FileInputDStream ## What changes were proposed in this pull request? Changes to `FileInputDStream` to eliminate multiple

[GitHub] spark pull request #17743: [SPARK-20448][DOCS] Document how FileInputDStream...

2017-04-24 Thread steveloughran
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/17743 [SPARK-20448][DOCS] Document how FileInputDStream works with object storage Change-Id: I88c272444ca734dc2cbc2592607c11287b90a383 ## What changes were proposed in this pull request

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-04-24 Thread steveloughran
Github user steveloughran closed the pull request at: https://github.com/apache/spark/pull/14731 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-04-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Ok. what is the way? Do I write a formal proposal? Because right now there is no reliable way to get the full dependency graph of Spark + hadoop cloud JARs + direct cloud provider

[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

2017-04-18 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/9571 > I was secretly hoping you'd just give up on this patch, since it will generate a lot of conflicts with the code I'm working on in parallel.. No. Sorry I do susp

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2017-04-18 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r111942037 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -310,77 +338,87 @@ private[history] class

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2017-04-18 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r111934303 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -729,6 +778,116 @@ private[history] class

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2017-04-18 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r111914872 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/ApplicationHistoryProvider.scala --- @@ -99,6 +104,19 @@ private[history] abstract

[GitHub] spark pull request #9571: [SPARK-11373] [CORE] Add metrics to the History Se...

2017-04-18 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9571#discussion_r111913717 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala --- @@ -410,34 +409,25 @@ private[history] class CacheMetrics

[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...

2017-04-14 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-04-13 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17342#discussion_r111365777 --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala --- @@ -1021,4 +1021,19 @@ class UtilsSuite extends SparkFunSuite

[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

2017-04-12 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/9571 Line lengths fixed, tests all happy. @vanzin —any chance of adding this to your review list? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

2017-04-12 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17149#discussion_r04548 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -285,7 +285,7 @@ private[spark] class

[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...

2017-04-10 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 @squito Is this ready to go in? Like I warned, I'm not going to add tests for this, not on its own --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-04-08 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 @srowen anything else I need to do here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-04-08 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 @srowen anything else I need to do here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-04-08 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17342#discussion_r110517523 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int

[GitHub] spark pull request #17149: [SPARK-19257][SQL]location for table/partition/da...

2017-04-07 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17149#discussion_r110420176 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -285,7 +285,7 @@ private[spark] class

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-03-29 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 Any comments on the latest patch? Anyone? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-03-29 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Is there anything else I need to do here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...

2017-03-29 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 I don't have a time/plans to do the test here, as it's a fairly complex piece of test setup for what a review should show isn't doing anything other than guarantee the outcome pf

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-03-22 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17342#discussion_r107381697 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int

[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...

2017-03-21 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 looking some more, yes, as `tryWithSafeFinallyAndFailureCallbacks` wraps task commit, it guarantees that the original cause doesn't get lost. The abortJob code isn't so well guarded

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-03-21 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r107194624 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala --- @@ -27,7 +27,8 @@ import scala.collection.JavaConverters

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-03-21 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r107152771 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala --- @@ -27,7 +27,8 @@ import scala.collection.JavaConverters

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-03-21 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r107152263 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala --- @@ -557,4 +557,16 @@ trait TestSuiteBase extends SparkFunSuite

[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...

2017-03-21 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 Created [SPARK-20045](https://issues.apache.org/jira/browse/SPARK-20045). I think there's room to improve resilience in the abort code, primarily to ensure that the underlying failure cause

[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...

2017-03-21 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 I haven't reviewed that bit of code: make it a separate JIRA and assign to me. This one I came across in the HADOOP-2.8.0 RC3 testing; the underlying fix there is going in, but the spark code

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-03-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 The latest patch embraces the fact that 2.6 is the base hadoop version so the `hadoop-aws` JAR is always pulled in, dependencies set up. One thing to bear in mind here that the [Phase I

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-03-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Any more comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...

2017-03-20 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 Note that as [the exception handler](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L244) tries to close

[GitHub] spark pull request #17364: [SPARK-20038] [core]: move the currentWriter=null...

2017-03-20 Thread steveloughran
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/17364 [SPARK-20038] [core]: move the currentWriter=null assignments into finally {} … ## What changes were proposed in this pull request? have

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-03-20 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17342#discussion_r107001274 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-03-19 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 I haven't forgotten this; I've just been trying to make the module POM-only, while adding support for Hadoop 2.6 builds, which is causing some issues downstream. Specifically, my downstream

[GitHub] spark issue #17163: [SPARK-16617][BUILD][CORE] Upgrade to Avro 1.8.x

2017-03-14 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17163 FWIW, if there is something related to serialization that people should be pushing for in Hadoop 3, it is making all the little types serializable, such as `Path`, `FileStatus` and the like

[GitHub] spark issue #17163: [SPARK-16617][BUILD][CORE] Upgrade to Avro 1.8.x

2017-03-14 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17163 oh, this sucks. Find anyone who experienced "The great protobuf update of 2012" and ask them if they want to do it again. Looking at the issues, AVRO-997 catches out &

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-03-10 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 The Hadoop FS Spec has now been updated to declare exactly what HDFS does w.r.t timestamps, and warn that what other filesystems and object stores do are implementation and installation

[GitHub] spark issue #17163: [SPARK-16617][BUILD][CORE] Upgrade to Avro 1.8.x

2017-03-06 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17163 Checj with @busbey about binary compatibility with older generated/compiled classes; that's the recurrent problem with protobuf --- If your project is set up for it, you can reply

[GitHub] spark issue #17163: [SPARK-16617][BUILD][CORE] Upgrade to Avro 1.8.x

2017-03-05 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17163 Looking @ hadoop source, there's not much in hadoop common terms of use of `import org.apache.avro`, ut the `avro.Utf8` surfaces, and someone has tagged `fs.Path` as `@Stringable`, which

[GitHub] spark issue #17163: [SPARK-16617][BUILD][CORE] Upgrade to Avro 1.8.x

2017-03-05 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17163 It's invariably the transient stuff isn't it? Mvnrepo on Avro 1.8.1 logs [jackson as a a compile time dependency](http://mvnrepository.com/artifact/org.apache.avro/avro/1.8.1); that's

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-03-04 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #17120: [SPARK-19715][Structured Streaming] Option to Str...

2017-03-04 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/17120#discussion_r104286483 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -1253,8 +1253,26 @@ class FileStreamSourceSuite

[GitHub] spark issue #17120: [SPARK-19715][Structured Streaming] Option to Strip Path...

2017-03-03 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17120 I know that's the *current* use case, but I'm thinking about future confusion, especially as the use case you espoused, "move from s3n to s3a within the same window" is

[GitHub] spark issue #17080: [SPARK-19739][CORE] propagate S3 session token to cluser

2017-03-03 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17080 thanks. One thing I realised last night is that logging the session token, even at debug level, would have been a security risk. So it's very good that the log statement got cut, even

[GitHub] spark issue #17080: [SPARK-19739][CORE] propagate S3 session token to cluser

2017-03-02 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17080 @srowen dont worry, been tracking this: I filed the JIRA. Core code is good (i.e. property/env var names). One thing to bear in mind, the existing code propagates the env vars even

[GitHub] spark issue #17120: [SPARK-19715][Structured Streaming] Option to Strip Path...

2017-03-01 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17120 -1, non binding I understand the rationale for this, to aid migration from s3/s3n to s3a, but given the need is schema independence, you should be using the full path name from

[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...

2017-03-01 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/9571 Style police. FWIW I think the lines that failed were already >100 chars, it was just they got indented slightly more. ``` Scalastyle checks failed at following occurrences: [er

[GitHub] spark issue #17080: [SPARK-19739][CORE] propagate S3 session token to cluser

2017-02-28 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17080 I agree. I was just checking the files to make sure the strings were consistent/correct, rather than trusting the documentation --- If your project is set up for it, you can reply

[GitHub] spark issue #17080: [SPARK-19739][CORE] propagate S3 session token to cluser

2017-02-27 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17080 LGTM. Verified option name in `org.apache.hadoop.fs.s3a.Constants` file; env var name in `com.amazonaws.SDKGlobalConfiguration' --- If your project is set up for it, you can reply

[GitHub] spark pull request #16990: [SPARK-19660][CORE][SQL] Replace the configuratio...

2017-02-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/16990#discussion_r103185158 --- Diff: sql/hive/src/test/resources/ql/src/test/queries/clientpositive/smb_mapjoin_25.q --- @@ -19,7 +19,7 @@ select * from (select a.key from

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-02-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r103184528 --- Diff: docs/streaming-programming-guide.md --- @@ -615,35 +615,114 @@ which creates a DStream from text data received over a TCP socket

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-02-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14731#discussion_r103183646 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala --- @@ -140,7 +137,7 @@ class FileInputDStream[K, V, F

[GitHub] spark pull request #16990: [SPARK-19660][CORE][SQL] Replace the configuratio...

2017-02-27 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/16990#discussion_r103183030 --- Diff: sql/hive/src/test/resources/ql/src/test/queries/clientpositive/smb_mapjoin_25.q --- @@ -19,7 +19,7 @@ select * from (select a.key from

[GitHub] spark issue #14601: [SPARK-13979][Core] Killed executor is re spawned withou...

2017-02-25 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14601 1. It's good to have some tests 2. I note that `appendS3AndSparkHadoopConfigurations()` has a weakness in how it propagates env vars: no propagation of the session environment

[GitHub] spark issue #16990: [SPARK-19660][CORE][SQL] Replace the configuration prope...

2017-02-25 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/16990 LGTM, though you'd have to go do the full coverage to verify that there's not a typo in any of the strings. This is why although Spark has adopted the more readable inline strings, I'm more

[GitHub] spark issue #14601: [SPARK-13979][Core] Killed executor is re spawned withou...

2017-02-25 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14601 spark.hadoop.fs.* would work. The (not yet shipped in ASF code) Azure Data Lake FS has, for reasons I don't know and have only just noticed, adopted "dfs.adl" as their prefi

<    1   2   3   4   5   6   7   8   9   10   >