Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/21066#discussion_r186469794
--- Diff:
hadoop-cloud/src/hadoop-3/main/scala/org/apache/spark/internal/io/cloud/BindingParquetOutputCommitter.scala
---
@@ -0,0 +1,122
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/21066#discussion_r186469740
--- Diff:
hadoop-cloud/src/hadoop-3/main/scala/org/apache/spark/internal/io/cloud/BindingParquetOutputCommitter.scala
---
@@ -0,0 +1,122
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/21066#discussion_r186467415
--- Diff:
hadoop-cloud/src/hadoop-3/main/scala/org/apache/spark/internal/io/cloud/PathOutputCommitProtocol.scala
---
@@ -0,0 +1,260
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/21066#discussion_r186466925
--- Diff:
hadoop-cloud/src/hadoop-3/main/scala/org/apache/spark/internal/io/cloud/PathOutputCommitProtocol.scala
---
@@ -0,0 +1,260
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/21066#discussion_r186467007
--- Diff:
hadoop-cloud/src/hadoop-3/main/scala/org/apache/spark/internal/io/cloud/PathOutputCommitProtocol.scala
---
@@ -0,0 +1,260
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/21066#discussion_r186466513
--- Diff:
hadoop-cloud/src/hadoop-3/main/scala/org/apache/spark/internal/io/cloud/PathOutputCommitProtocol.scala
---
@@ -0,0 +1,260
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/21066#discussion_r186466367
--- Diff:
hadoop-cloud/src/hadoop-3/main/scala/org/apache/spark/internal/io/cloud/PathOutputCommitProtocol.scala
---
@@ -0,0 +1,260
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/21066#discussion_r186466259
--- Diff:
hadoop-cloud/src/hadoop-3/main/scala/org/apache/spark/internal/io/cloud/BindingParquetOutputCommitter.scala
---
@@ -0,0 +1,122
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/21066#discussion_r186463919
--- Diff:
hadoop-cloud/src/main/scala/org/apache/spark/internal/io/cloud/PathCommitterConstants.scala
---
@@ -0,0 +1,87 @@
+/*
+ * Licensed
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/21066#discussion_r186463018
--- Diff:
hadoop-cloud/src/main/scala/org/apache/spark/internal/io/cloud/PathCommitterConstants.scala
---
@@ -0,0 +1,87 @@
+/*
+ * Licensed
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19404
I think the sync is important, but that you just need to handle the case of
"fs doesn't support it".
Thinking about this a bit more, I didn't like my proposed patch. Bette
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/21146
As promised, dependencies fail
```
diff --git a/dev/deps/spark-deps-hadoop-2.6
b/dev/pr-deps/spark-deps-hadoop-2.6
index 32b2e4f..609eeb9 100644
--- a/dev/deps/spark-deps
GitHub user steveloughran opened a pull request:
https://github.com/apache/spark/pull/21146
[SPARK-23654][BUILD][WiP] remove jets3t as a dependency of spark
## What changes were proposed in this pull request?
With the update of bouncy-castle JAR in Spark 2.3; jets3t doesn't
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20923
thank you! I guess that means I'm down for the hive JAR, doesn't it :)
Better make a list of patches which should go in, I think internally we
have 1+ kerberos related (https
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20923
I've also added a comment to
(SPARK-18673)[https://issues.apache.org/jira/browse/SPARK-18673] offering to
fix the org.spark-project.hive JAR, but only once this patch is in. This bit
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20923
I can and do build Hadoop with this local version enabled, so it's easy
enough to set things up locally, Indeed the ability to change Hadoop version,
[HADOOP-13852](https://issues.apache.org
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19404
BTW, perf wise: hflush() is required to block until the flush has got to
the store (visible to others), and with hsync actually saved to the durable
store. So it will take time, but if you
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19404#discussion_r183482802
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala
---
@@ -139,6 +139,9 @@ abstract class
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19404#discussion_r183480609
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
---
@@ -123,6 +123,7 @@ class HDFSMetadataLog[T
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19404
Problem here is that a stream which doesn't implement hflush/hsync is
required to throw an exception; it's a way of guaranteeing that if hsync/hflush
does complete, the action has done what
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20923
@vanzin : The followup to this is #21066; I could move the compile time
changes there but if you are going to have POMs playing with dependencies,
seems best to have it all in one place
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/21060
* from the ASF process-police perspective, something like
versioning/backport policy is something which should be done on the ASF dev
list...consider asking in user@ to see what people's
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/21071
I like this, but you'll need people with authority to trigger the builds
and reviews.
There's some discussion kicked off last week on the ASF incubator about the
fact that htrace has
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/21071#discussion_r181810726
--- Diff: core/src/main/scala/org/apache/spark/trace/SparkAppTracer.scala
---
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/21071
+ @rdblue
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/20923#discussion_r181680034
--- Diff: hadoop-cloud/pom.xml ---
@@ -38,7 +38,32 @@
hadoop-cloud
+
+
target/scala-${scala.binary.version
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/20923#discussion_r181677376
--- Diff: assembly/pom.xml ---
@@ -254,6 +254,14 @@
spark-hadoop-cloud_${scala.binary.version}
${project.version
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/20923#discussion_r181676700
--- Diff: hadoop-cloud/pom.xml ---
@@ -38,7 +38,32 @@
hadoop-cloud
+
--- End diff --
it's in an adjacent PR
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/21060
This is one of those great problems in software engineering: no good
answer. I think case-by-case is generally the best tactic, with a bias against
feature backport, though my track record
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/21048#discussion_r181672758
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala
---
@@ -0,0 +1,347
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20704
@megaserg : if you are writing to GCS, Azure, algorithm 2 is fine. If S3 is
the target, then it's only safe to use with a consistent store (Hadoop 3.0
+S3Guard, Amazon Consistent EMR); you
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/21048#discussion_r181486717
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala
---
@@ -0,0 +1,347
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/21048#discussion_r181485619
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala
---
@@ -0,0 +1,347
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/21066
RAT test was on a 0-byte .keep file in `src/test/scala` as the maven
plugging adding a profile-specific test source path needs an original one.
easiest fix is just to add a real scala
GitHub user steveloughran opened a pull request:
https://github.com/apache/spark/pull/21066
[SPARK-23977][CLOUD][Wip] Add commit protocol binding to Hadoop 3.1
PathOutputCommitter mechanism
## What changes were proposed in this pull request?
This patch has on SPARK-23807
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/21048#discussion_r181357794
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManagerSuite.scala
---
@@ -0,0 +1,192
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/21048#discussion_r181357072
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala
---
@@ -0,0 +1,347
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/21048#discussion_r181355839
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala
---
@@ -0,0 +1,347
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/21048#discussion_r181355640
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala
---
@@ -0,0 +1,347
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20923
@jerryshao comments? I know without the patched hive or mutant hadoop build
Spark doesn't work with Hadoop 3, but this sets everything up to build
consistently, which is a prerequisite
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20923
The jetty problem has been dealt with; because of the shading declaration
of jetty-util as provided (it isn't needed in spark any more), it wasn't
getting into dist/jars even for those
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20923
I should add that the spark-shell doesn't bring up the Azure client, though
it's happy with the rest, because of jetty-utils not making into dist/jars...I
fear this is shading related
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20923
Test failures are
org.apache.spark.sql.sources.BucketedWriteWithoutHiveSupportSuite.;
[SPARK-23894](https://issues.apache.org/jira/browse/SPARK-23894
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20923
bq. I think you should also update "test-dependencies.sh" to make the new
deps file work.
I did, but then things failed because the artifacts were only visible if
you
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20923
Test failures are all in `
org.apache.spark.sql.sources.BucketedWriteWithoutHiveSupportSuite`. I don't see
how these pom changes could have affected
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20923
Failure is the test dependencies failing as the checker is trying to pull
in hadoop-3.1.0 & its still in ASF staging
```
Performing Maven install for hadoop-3
Using `mvn` from
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20923
I saw that, but given there isn't much in the way of a 2.8 profile though
it was more of a wish list than a requirement. How do I go about creating
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20923
sbt isn't going to test this profile, obviously. Ran both the mvn and sbt
package targets qith profiles hadoop-3,hadoop-cloud,yarn,Psnapshots-and-staging
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20923
@jerryshao the latest revision only has the POM changes, and that also
excludes the build profile option to compile the hadoop-3 source trees
It also switches the hadoop 3.1 version
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20923
bq. I think we could separate cloud related stuffs to another PR, and fix
only build related stuff in this PR
OK
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/20923#discussion_r178823511
--- Diff: pom.xml ---
@@ -2671,6 +2671,15 @@
+
+ hadoop-3
+
+3.1.0-SNAPSHOT
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/20923#discussion_r178072258
--- Diff: hadoop-cloud/pom.xml ---
@@ -177,6 +214,188
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/20923#discussion_r178060744
--- Diff: hadoop-cloud/pom.xml ---
@@ -141,13 +93,98 @@
httpcore
${hadoop.deps.scope
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/20923#discussion_r178057319
--- Diff: hadoop-cloud/pom.xml ---
@@ -177,6 +214,188
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/20923#discussion_r178054506
--- Diff: hadoop-cloud/pom.xml ---
@@ -177,6 +214,188
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/20923#discussion_r178054451
--- Diff: hadoop-cloud/pom.xml ---
@@ -177,6 +214,188
GitHub user steveloughran opened a pull request:
https://github.com/apache/spark/pull/20923
[SPARK-23807][BUILD][WIP] Add Hadoop 3 profile with relevant POM fix ups,
cloud-storage artifacts and binding
## What changes were proposed in this pull request?
1. Adds a `hadoop-3
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/20824#discussion_r175050086
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala ---
@@ -145,15 +146,23 @@ object FileCommitProtocol
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/20824#discussion_r174829859
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala ---
@@ -145,15 +146,23 @@ object FileCommitProtocol
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/20824#discussion_r174740987
--- Diff:
core/src/test/scala/org/apache/spark/internal/io/FileCommitProtocolInstantiationSuite.scala
---
@@ -0,0 +1,146 @@
+/*
+ * Licensed
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20824
Fixed the title, used the new JIRA.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/20824#discussion_r174629458
--- Diff:
core/src/test/scala/org/apache/spark/internal/io/FileCommitProtocolInstantiationSuite.scala
---
@@ -0,0 +1,146 @@
+/*
+ * Licensed
GitHub user steveloughran opened a pull request:
https://github.com/apache/spark/pull/20824
With SPARK-20236, FileCommitProtocol.instantiate() looks for a three â¦
## What changes were proposed in this pull request?
With SPARK-20236, `FileCommitProtocol.instantiate
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20704
kicks in downstream depending on the order of imports; maven is
closest-first in the graph. If you explicitly add hadoop-client in your deps at
the top then everything gets reconciled
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20490
@rdblue thanks. That was what I thought (the output coordinator doesn't
tell incoming speculative work to abort until any actively committing task
attempt has returned, I was just worried
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/20490
Been having talks with colleagues last week and want to check something.
How exactly do Spark executors abort speculative jobs without waiting for
them get into the ready-to-commit
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/20490#discussion_r166448459
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala
---
@@ -117,20 +118,43 @@ object
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/20490#discussion_r166447570
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala
---
@@ -117,20 +118,43 @@ object
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19885
LGTM. Effective use of parameterization
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19848
Done.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19848
WiP:
[a_zero_rename_committer.pdf](https://github.com/steveloughran/zero-rename-committer/files/1604894/a_zero_rename_committer.pdf)
I would really like some early review of the spark
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19848
> I actually feel like this is something hadoop should be documenting ...
we are talking about how committers we happen to know work, rather than talking
about the general contr
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19848
> Check if the same jobId already is committed and then remove existing
files and commit again.
if your job doesn't allow overwrite, that's mostly implicit; it's only in
concurr
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19848
Thought some more on this.
Here's a possible workflow for failures which can arise from job attempt
recycling
1. Stage 1, Job ID 0, attempt 1, kicks off task 0 attempt 1
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19848
Job is is only used in the normal FileOutputCommitter to generate unique
paths, using`s" _temporary/$jobid_$job-attempt"` for the file (ie.
job-attempt-ID, which is jobID+attempt).
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19848#discussion_r157064132
--- Diff:
core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala ---
@@ -908,6 +918,40 @@ class NewFakeFormatWithCallback() extends
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19848#discussion_r157063770
--- Diff:
core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala ---
@@ -70,7 +70,8 @@ object SparkHadoopMapRedUtil extends Logging
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19848
> I was hoping you would know the hadoop committer semantics better than me
I might, but that's only because I spent time with a debugger and asking
people the history of thi
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19885
I'd recommend the tests are parameterized, generating a separate test for
each URI pair, and including the values on a failure. Plan for a future where
all you have is a stack trace from
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19885
if you make a path of each of these and call getFileSystem() on them, you
will end up with two different FS instances in the same JVM. But they'll both
be talking to the same namenode using
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19885
@vanzin its too late for this, but I don't see any reason why
`FileSystem.getCanonicalUri` should be kept protected. If someone wants to
volunteer with the spec changes to filesystem.md
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19885
User info isn't picked up from the URL, it's taken off your Kerberos
credentials. If you are running HDFS unkerberized, then UGI takes it from the
environment variable `HADOOP_USER_NAME
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19885
Hi.
If the comparision is isolated to a method testing URIs, rather than
filesystems, it should be straightforward to write a suite of tests for this,
with lists of URIs expected
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r14237
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java
---
@@ -50,28 +53,34
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148643598
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java
---
@@ -50,28 +53,34
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148596100
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java
---
@@ -50,28 +53,34
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148595625
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java
---
@@ -50,28 +53,34
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148542687
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java
---
@@ -50,28 +53,34
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148507385
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java
---
@@ -50,28 +53,34
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148507067
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java
---
@@ -50,28 +53,34
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148503478
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java
---
@@ -50,28 +53,34
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148325887
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java
---
@@ -84,9 +86,9 @@
* This method will only
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148325560
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java
---
@@ -72,7 +74,7 @@
* should still "
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148325280
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ReadTask.java ---
@@ -37,13 +37,19 @@
* The preferred locations where
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148263405
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java
---
@@ -75,8 +82,10 @@
/**
* Aborts
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148227459
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java
---
@@ -34,11 +34,17 @@
/**
* Proceed
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148226526
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java
---
@@ -75,8 +82,10 @@
/**
* Aborts
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148225333
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ReadTask.java ---
@@ -37,13 +37,19 @@
* The preferred locations where
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19269
thx; I'll see about passing it all the way down past FileOutputFormat
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19269
w.r.t init, I'm thinking it's critical to get the
DataframeWriter.extraOptions down the tree. This lets committers be tuned on a
query-by-query basis for things like conflict management
101 - 200 of 1115 matches
Mail list logo