[GitHub] spark pull request #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol bin...

2018-05-07 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/21066#discussion_r186469794 --- Diff: hadoop-cloud/src/hadoop-3/main/scala/org/apache/spark/internal/io/cloud/BindingParquetOutputCommitter.scala --- @@ -0,0 +1,122

[GitHub] spark pull request #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol bin...

2018-05-07 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/21066#discussion_r186469740 --- Diff: hadoop-cloud/src/hadoop-3/main/scala/org/apache/spark/internal/io/cloud/BindingParquetOutputCommitter.scala --- @@ -0,0 +1,122

[GitHub] spark pull request #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol bin...

2018-05-07 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/21066#discussion_r186467415 --- Diff: hadoop-cloud/src/hadoop-3/main/scala/org/apache/spark/internal/io/cloud/PathOutputCommitProtocol.scala --- @@ -0,0 +1,260

[GitHub] spark pull request #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol bin...

2018-05-07 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/21066#discussion_r186466925 --- Diff: hadoop-cloud/src/hadoop-3/main/scala/org/apache/spark/internal/io/cloud/PathOutputCommitProtocol.scala --- @@ -0,0 +1,260

[GitHub] spark pull request #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol bin...

2018-05-07 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/21066#discussion_r186467007 --- Diff: hadoop-cloud/src/hadoop-3/main/scala/org/apache/spark/internal/io/cloud/PathOutputCommitProtocol.scala --- @@ -0,0 +1,260

[GitHub] spark pull request #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol bin...

2018-05-07 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/21066#discussion_r186466513 --- Diff: hadoop-cloud/src/hadoop-3/main/scala/org/apache/spark/internal/io/cloud/PathOutputCommitProtocol.scala --- @@ -0,0 +1,260

[GitHub] spark pull request #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol bin...

2018-05-07 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/21066#discussion_r186466367 --- Diff: hadoop-cloud/src/hadoop-3/main/scala/org/apache/spark/internal/io/cloud/PathOutputCommitProtocol.scala --- @@ -0,0 +1,260

[GitHub] spark pull request #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol bin...

2018-05-07 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/21066#discussion_r186466259 --- Diff: hadoop-cloud/src/hadoop-3/main/scala/org/apache/spark/internal/io/cloud/BindingParquetOutputCommitter.scala --- @@ -0,0 +1,122

[GitHub] spark pull request #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol bin...

2018-05-07 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/21066#discussion_r186463919 --- Diff: hadoop-cloud/src/main/scala/org/apache/spark/internal/io/cloud/PathCommitterConstants.scala --- @@ -0,0 +1,87 @@ +/* + * Licensed

[GitHub] spark pull request #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol bin...

2018-05-07 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/21066#discussion_r186463018 --- Diff: hadoop-cloud/src/main/scala/org/apache/spark/internal/io/cloud/PathCommitterConstants.scala --- @@ -0,0 +1,87 @@ +/* + * Licensed

[GitHub] spark issue #19404: [SPARK-21760] [Streaming] Fix for Structured streaming t...

2018-04-26 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19404 I think the sync is important, but that you just need to handle the case of "fs doesn't support it". Thinking about this a bit more, I didn't like my proposed patch. Bette

[GitHub] spark issue #21146: [SPARK-23654][BUILD][WiP] remove jets3t as a dependency ...

2018-04-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/21146 As promised, dependencies fail ``` diff --git a/dev/deps/spark-deps-hadoop-2.6 b/dev/pr-deps/spark-deps-hadoop-2.6 index 32b2e4f..609eeb9 100644 --- a/dev/deps/spark-deps

[GitHub] spark pull request #21146: [SPARK-23654][BUILD][WiP] remove jets3t as a depe...

2018-04-24 Thread steveloughran
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/21146 [SPARK-23654][BUILD][WiP] remove jets3t as a dependency of spark ## What changes were proposed in this pull request? With the update of bouncy-castle JAR in Spark 2.3; jets3t doesn't

[GitHub] spark issue #20923: [SPARK-23807][BUILD] Add Hadoop 3.1 profile with relevan...

2018-04-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20923 thank you! I guess that means I'm down for the hive JAR, doesn't it :) Better make a list of patches which should go in, I think internally we have 1+ kerberos related (https

[GitHub] spark issue #20923: [SPARK-23807][BUILD] Add Hadoop 3.1 profile with relevan...

2018-04-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20923 I've also added a comment to (SPARK-18673)[https://issues.apache.org/jira/browse/SPARK-18673] offering to fix the org.spark-project.hive JAR, but only once this patch is in. This bit

[GitHub] spark issue #20923: [SPARK-23807][BUILD] Add Hadoop 3.1 profile with relevan...

2018-04-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20923 I can and do build Hadoop with this local version enabled, so it's easy enough to set things up locally, Indeed the ability to change Hadoop version, [HADOOP-13852](https://issues.apache.org

[GitHub] spark issue #19404: [SPARK-21760] [Streaming] Fix for Structured streaming t...

2018-04-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19404 BTW, perf wise: hflush() is required to block until the flush has got to the store (visible to others), and with hsync actually saved to the durable store. So it will take time, but if you

[GitHub] spark pull request #19404: [SPARK-21760] [Streaming] Fix for Structured stre...

2018-04-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19404#discussion_r183482802 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala --- @@ -139,6 +139,9 @@ abstract class

[GitHub] spark pull request #19404: [SPARK-21760] [Streaming] Fix for Structured stre...

2018-04-23 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19404#discussion_r183480609 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -123,6 +123,7 @@ class HDFSMetadataLog[T

[GitHub] spark issue #19404: [SPARK-21760] [Streaming] Fix for Structured streaming t...

2018-04-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19404 Problem here is that a stream which doesn't implement hflush/hsync is required to throw an exception; it's a way of guaranteeing that if hsync/hflush does complete, the action has done what

[GitHub] spark issue #20923: [SPARK-23807][BUILD] Add Hadoop 3.1 profile with relevan...

2018-04-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20923 @vanzin : The followup to this is #21066; I could move the compile time changes there but if you are going to have POMs playing with dependencies, seems best to have it all in one place

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-18 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/21060 * from the ASF process-police perspective, something like versioning/backport policy is something which should be done on the ASF dev list...consider asking in user@ to see what people's

[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

2018-04-16 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/21071 I like this, but you'll need people with authority to trigger the builds and reviews. There's some discussion kicked off last week on the ASF incubator about the fact that htrace has

[GitHub] spark pull request #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

2018-04-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/21071#discussion_r181810726 --- Diff: core/src/main/scala/org/apache/spark/trace/SparkAppTracer.scala --- @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

2018-04-16 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/21071 + @rdblue --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20923: [SPARK-23807][BUILD] Add Hadoop 3.1 profile with ...

2018-04-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/20923#discussion_r181680034 --- Diff: hadoop-cloud/pom.xml --- @@ -38,7 +38,32 @@ hadoop-cloud + + target/scala-${scala.binary.version

[GitHub] spark pull request #20923: [SPARK-23807][BUILD] Add Hadoop 3.1 profile with ...

2018-04-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/20923#discussion_r181677376 --- Diff: assembly/pom.xml --- @@ -254,6 +254,14 @@ spark-hadoop-cloud_${scala.binary.version} ${project.version

[GitHub] spark pull request #20923: [SPARK-23807][BUILD] Add Hadoop 3.1 profile with ...

2018-04-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/20923#discussion_r181676700 --- Diff: hadoop-cloud/pom.xml --- @@ -38,7 +38,32 @@ hadoop-cloud + --- End diff -- it's in an adjacent PR

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-16 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/21060 This is one of those great problems in software engineering: no good answer. I think case-by-case is generally the best tactic, with a bias against feature backport, though my track record

[GitHub] spark pull request #21048: [SPARK-23966][SS] Refactoring all checkpoint file...

2018-04-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/21048#discussion_r181672758 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala --- @@ -0,0 +1,347

[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-04-16 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20704 @megaserg : if you are writing to GCS, Azure, algorithm 2 is fine. If S3 is the target, then it's only safe to use with a consistent store (Hadoop 3.0 +S3Guard, Amazon Consistent EMR); you

[GitHub] spark pull request #21048: [SPARK-23966][SS] Refactoring all checkpoint file...

2018-04-13 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/21048#discussion_r181486717 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala --- @@ -0,0 +1,347

[GitHub] spark pull request #21048: [SPARK-23966][SS] Refactoring all checkpoint file...

2018-04-13 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/21048#discussion_r181485619 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala --- @@ -0,0 +1,347

[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-13 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/21066 RAT test was on a 0-byte .keep file in `src/test/scala` as the maven plugging adding a profile-specific test source path needs an original one. easiest fix is just to add a real scala

[GitHub] spark pull request #21066: [SPARK-23977][CLOUD][Wip] Add commit protocol bin...

2018-04-13 Thread steveloughran
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/21066 [SPARK-23977][CLOUD][Wip] Add commit protocol binding to Hadoop 3.1 PathOutputCommitter mechanism ## What changes were proposed in this pull request? This patch has on SPARK-23807

[GitHub] spark pull request #21048: [SPARK-23966][SS] Refactoring all checkpoint file...

2018-04-13 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/21048#discussion_r181357794 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManagerSuite.scala --- @@ -0,0 +1,192

[GitHub] spark pull request #21048: [SPARK-23966][SS] Refactoring all checkpoint file...

2018-04-13 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/21048#discussion_r181357072 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala --- @@ -0,0 +1,347

[GitHub] spark pull request #21048: [SPARK-23966][SS] Refactoring all checkpoint file...

2018-04-13 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/21048#discussion_r181355839 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala --- @@ -0,0 +1,347

[GitHub] spark pull request #21048: [SPARK-23966][SS] Refactoring all checkpoint file...

2018-04-13 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/21048#discussion_r181355640 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala --- @@ -0,0 +1,347

[GitHub] spark issue #20923: [SPARK-23807][BUILD] Add Hadoop 3.1 profile with relevan...

2018-04-13 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20923 @jerryshao comments? I know without the patched hive or mutant hadoop build Spark doesn't work with Hadoop 3, but this sets everything up to build consistently, which is a prerequisite

[GitHub] spark issue #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile with rele...

2018-04-12 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20923 The jetty problem has been dealt with; because of the shading declaration of jetty-util as provided (it isn't needed in spark any more), it wasn't getting into dist/jars even for those

[GitHub] spark issue #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile with rele...

2018-04-11 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20923 I should add that the spark-shell doesn't bring up the Azure client, though it's happy with the rest, because of jetty-utils not making into dist/jars...I fear this is shading related

[GitHub] spark issue #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile with rele...

2018-04-10 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20923 Test failures are org.apache.spark.sql.sources.BucketedWriteWithoutHiveSupportSuite.; [SPARK-23894](https://issues.apache.org/jira/browse/SPARK-23894

[GitHub] spark issue #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile with rele...

2018-04-09 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20923 bq. I think you should also update "test-dependencies.sh" to make the new deps file work. I did, but then things failed because the artifacts were only visible if you

[GitHub] spark issue #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile with rele...

2018-04-06 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20923 Test failures are all in ` org.apache.spark.sql.sources.BucketedWriteWithoutHiveSupportSuite`. I don't see how these pom changes could have affected

[GitHub] spark issue #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile with rele...

2018-04-05 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20923 Failure is the test dependencies failing as the checker is trying to pull in hadoop-3.1.0 & its still in ASF staging ``` Performing Maven install for hadoop-3 Using `mvn` from

[GitHub] spark issue #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile with rele...

2018-04-04 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20923 I saw that, but given there isn't much in the way of a 2.8 profile though it was more of a wish list than a requirement. How do I go about creating

[GitHub] spark issue #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile with rele...

2018-04-03 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20923 sbt isn't going to test this profile, obviously. Ran both the mvn and sbt package targets qith profiles hadoop-3,hadoop-cloud,yarn,Psnapshots-and-staging

[GitHub] spark issue #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile with rele...

2018-04-03 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20923 @jerryshao the latest revision only has the POM changes, and that also excludes the build profile option to compile the hadoop-3 source trees It also switches the hadoop 3.1 version

[GitHub] spark issue #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile with rele...

2018-04-03 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20923 bq. I think we could separate cloud related stuffs to another PR, and fix only build related stuff in this PR OK

[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-04-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/20923#discussion_r178823511 --- Diff: pom.xml --- @@ -2671,6 +2671,15 @@ + + hadoop-3 + +3.1.0-SNAPSHOT

[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-03-29 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/20923#discussion_r178072258 --- Diff: hadoop-cloud/pom.xml --- @@ -177,6 +214,188

[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-03-29 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/20923#discussion_r178060744 --- Diff: hadoop-cloud/pom.xml --- @@ -141,13 +93,98 @@ httpcore ${hadoop.deps.scope

[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-03-29 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/20923#discussion_r178057319 --- Diff: hadoop-cloud/pom.xml --- @@ -177,6 +214,188

[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-03-29 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/20923#discussion_r178054506 --- Diff: hadoop-cloud/pom.xml --- @@ -177,6 +214,188

[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-03-29 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/20923#discussion_r178054451 --- Diff: hadoop-cloud/pom.xml --- @@ -177,6 +214,188

[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

2018-03-28 Thread steveloughran
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/20923 [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile with relevant POM fix ups, cloud-storage artifacts and binding ## What changes were proposed in this pull request? 1. Adds a `hadoop-3

[GitHub] spark pull request #20824: [SPARK-23683][SQL][FOLLOW-UP] FileCommitProtocol....

2018-03-16 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/20824#discussion_r175050086 --- Diff: core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala --- @@ -145,15 +146,23 @@ object FileCommitProtocol

[GitHub] spark pull request #20824: [SPARK-23683][SQL][FOLLOW-UP] FileCommitProtocol....

2018-03-15 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/20824#discussion_r174829859 --- Diff: core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala --- @@ -145,15 +146,23 @@ object FileCommitProtocol

[GitHub] spark pull request #20824: [SPARK-23683][SQL][FOLLOW-UP] FileCommitProtocol....

2018-03-15 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/20824#discussion_r174740987 --- Diff: core/src/test/scala/org/apache/spark/internal/io/FileCommitProtocolInstantiationSuite.scala --- @@ -0,0 +1,146 @@ +/* + * Licensed

[GitHub] spark issue #20824: [SPARK-23683][SQL][FOLLOW-UP] FileCommitProtocol.instant...

2018-03-15 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20824 Fixed the title, used the new JIRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #20824: With SPARK-20236, FileCommitProtocol.instantiate(...

2018-03-14 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/20824#discussion_r174629458 --- Diff: core/src/test/scala/org/apache/spark/internal/io/FileCommitProtocolInstantiationSuite.scala --- @@ -0,0 +1,146 @@ +/* + * Licensed

[GitHub] spark pull request #20824: With SPARK-20236, FileCommitProtocol.instantiate(...

2018-03-14 Thread steveloughran
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/20824 With SPARK-20236, FileCommitProtocol.instantiate() looks for a three … ## What changes were proposed in this pull request? With SPARK-20236, `FileCommitProtocol.instantiate

[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-02 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20704 kicks in downstream depending on the order of imports; maven is closest-first in the graph. If you explicitly add hadoop-client in your deps at the top then everything gets reconciled

[GitHub] spark issue #20490: [SPARK-23323][SQL]: Support commit coordinator for DataS...

2018-02-12 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20490 @rdblue thanks. That was what I thought (the output coordinator doesn't tell incoming speculative work to abort until any actively committing task attempt has returned, I was just worried

[GitHub] spark issue #20490: [SPARK-23323][SQL]: Support commit coordinator for DataS...

2018-02-12 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/20490 Been having talks with colleagues last week and want to check something. How exactly do Spark executors abort speculative jobs without waiting for them get into the ready-to-commit

[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...

2018-02-06 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/20490#discussion_r166448459 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala --- @@ -117,20 +118,43 @@ object

[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...

2018-02-06 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/20490#discussion_r166447570 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala --- @@ -117,20 +118,43 @@ object

[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-10 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19885 LGTM. Effective use of parameterization --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2018-01-05 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19848 Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2018-01-04 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19848 WiP: [a_zero_rename_committer.pdf](https://github.com/steveloughran/zero-rename-committer/files/1604894/a_zero_rename_committer.pdf) I would really like some early review of the spark

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-30 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19848 > I actually feel like this is something hadoop should be documenting ... we are talking about how committers we happen to know work, rather than talking about the general contr

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-15 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19848 > Check if the same jobId already is committed and then remove existing files and commit again. if your job doesn't allow overwrite, that's mostly implicit; it's only in concurr

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-15 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19848 Thought some more on this. Here's a possible workflow for failures which can arise from job attempt recycling 1. Stage 1, Job ID 0, attempt 1, kicks off task 0 attempt 1

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-15 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19848 Job is is only used in the normal FileOutputCommitter to generate unique paths, using`s" _temporary/$jobid_$job-attempt"` for the file (ie. job-attempt-ID, which is jobID+attempt).

[GitHub] spark pull request #19848: [SPARK-22162] Executors and the driver should use...

2017-12-14 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19848#discussion_r157064132 --- Diff: core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala --- @@ -908,6 +918,40 @@ class NewFakeFormatWithCallback() extends

[GitHub] spark pull request #19848: [SPARK-22162] Executors and the driver should use...

2017-12-14 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19848#discussion_r157063770 --- Diff: core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala --- @@ -70,7 +70,8 @@ object SparkHadoopMapRedUtil extends Logging

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-12-14 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19848 > I was hoping you would know the hadoop committer semantics better than me I might, but that's only because I spent time with a debugger and asking people the history of thi

[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-07 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19885 I'd recommend the tests are parameterized, generating a separate test for each URI pair, and including the values on a failure. Plan for a future where all you have is a stack trace from

[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-06 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19885 if you make a path of each of these and call getFileSystem() on them, you will end up with two different FS instances in the same JVM. But they'll both be talking to the same namenode using

[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-06 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19885 @vanzin its too late for this, but I don't see any reason why `FileSystem.getCanonicalUri` should be kept protected. If someone wants to volunteer with the spec changes to filesystem.md

[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-06 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19885 User info isn't picked up from the URL, it's taken off your Kerberos credentials. If you are running HDFS unkerberized, then UGI takes it from the environment variable `HADOOP_USER_NAME

[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-05 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19885 Hi. If the comparision is isolated to a method testing URIs, rather than filesystems, it should be straightforward to write a suite of tests for this, with lists of URIs expected

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r14237 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -50,28 +53,34

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-02 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148643598 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -50,28 +53,34

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-02 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148596100 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -50,28 +53,34

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-02 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148595625 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -50,28 +53,34

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-02 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148542687 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -50,28 +53,34

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-02 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148507385 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -50,28 +53,34

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-02 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148507067 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -50,28 +53,34

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-02 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148503478 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -50,28 +53,34

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-01 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148325887 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java --- @@ -84,9 +86,9 @@ * This method will only

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-01 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148325560 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java --- @@ -72,7 +74,7 @@ * should still "

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-01 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148325280 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ReadTask.java --- @@ -37,13 +37,19 @@ * The preferred locations where

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-01 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148263405 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -75,8 +82,10 @@ /** * Aborts

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-01 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148227459 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java --- @@ -34,11 +34,17 @@ /** * Proceed

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-01 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148226526 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -75,8 +82,10 @@ /** * Aborts

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-01 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148225333 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ReadTask.java --- @@ -37,13 +37,19 @@ * The preferred locations where

[GitHub] spark issue #19269: [SPARK-22026][SQL] data source v2 write path

2017-10-24 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19269 thx; I'll see about passing it all the way down past FileOutputFormat --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #19269: [SPARK-22026][SQL] data source v2 write path

2017-10-23 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19269 w.r.t init, I'm thinking it's critical to get the DataframeWriter.extraOptions down the tree. This lets committers be tuned on a query-by-query basis for things like conflict management

<    1   2   3   4   5   6   7   8   9   10   >