[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-21 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261292#comment-16261292 ] Steve Loughran commented on SPARK-22526: S3a uses the AWS S3 client, which uses httpclient

[jira] [Commented] (SPARK-22374) STS ran into OOM in a secure cluster

2017-11-21 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261444#comment-16261444 ] Steve Loughran commented on SPARK-22374: We need to do something about this, it is dangerously

[jira] [Commented] (SPARK-22240) S3 CSV number of partitions incorrectly computed

2017-11-16 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255861#comment-16255861 ] Steve Loughran commented on SPARK-22240: I think there are two separate issues and I'm adding

[jira] [Commented] (SPARK-2984) FileNotFoundException on _temporary directory

2017-11-01 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233937#comment-16233937 ] Steve Loughran commented on SPARK-2984: --- [~soumdmw] you asked bq. is there a simpler way to not

[jira] [Commented] (SPARK-2984) FileNotFoundException on _temporary directory

2017-11-01 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233924#comment-16233924 ] Steve Loughran commented on SPARK-2984: --- Darron: different stack trace, different parts of the code,

[jira] [Commented] (SPARK-22657) Hadoop fs implementation classes are not loaded if they are part of the app jar or other jar when --packages flag is used

2017-12-01 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274310#comment-16274310 ] Steve Loughran commented on SPARK-22657: No, more that we need to change how that service search

[jira] [Commented] (SPARK-21074) Parquet files are read fully even though only count() is requested

2017-12-07 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281901#comment-16281901 ] Steve Loughran commented on SPARK-21074: Is there any update on this? # I'd like to see if this

[jira] [Commented] (SPARK-22240) S3 CSV number of partitions incorrectly computed

2017-10-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217325#comment-16217325 ] Steve Loughran commented on SPARK-22240: I'm doing some testing with master & reading files off

[jira] [Commented] (SPARK-22240) S3 CSV number of partitions incorrectly computed

2017-10-24 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217398#comment-16217398 ] Steve Loughran commented on SPARK-22240: no, spark 2.2 doesn't fix this. I have to explicitly

[jira] [Commented] (SPARK-23977) Add commit protocol binding to Hadoop 3.1 PathOutputCommitter mechanism

2018-05-07 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465917#comment-16465917 ] Steve Loughran commented on SPARK-23977: It will need the hadoop-aws module and deoendencies as

[jira] [Commented] (SPARK-18673) Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version

2018-05-07 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465925#comment-16465925 ] Steve Loughran commented on SPARK-18673: Josh Rosen added some changes, particularly: *

[jira] [Commented] (SPARK-18673) Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version

2018-05-07 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465950#comment-16465950 ] Steve Loughran commented on SPARK-18673: Good Q, [~Bidek]. That SPARK-23807 POM fixes up the

[jira] [Commented] (SPARK-23681) Switch OrcFileFormat to newer hadoop.mapreduce output classes

2018-05-15 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475790#comment-16475790 ] Steve Loughran commented on SPARK-23681: sorry, been offline * yes, cut the version * lower on my

[jira] [Created] (SPARK-24280) Speed up indexing of files in object stores by using listFiles(path, recursive=true)

2018-05-15 Thread Steve Loughran (JIRA)
Steve Loughran created SPARK-24280: -- Summary: Speed up indexing of files in object stores by using listFiles(path, recursive=true) Key: SPARK-24280 URL: https://issues.apache.org/jira/browse/SPARK-24280

[jira] [Commented] (SPARK-19790) OutputCommitCoordinator should not allow another task to commit after an ExecutorFailure

2018-05-21 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482512#comment-16482512 ] Steve Loughran commented on SPARK-19790: Update on this, having spent lots of time working in the

[jira] [Commented] (SPARK-24271) sc.hadoopConfigurations can not be overwritten in the same spark context

2018-05-25 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490474#comment-16490474 ] Steve Loughran commented on SPARK-24271: Disabling the s3 cache can be pretty inefficient, as

[jira] [Commented] (SPARK-24492) Endless attempted task when TaskCommitDenied exception writing to S3A

2018-06-09 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507113#comment-16507113 ] Steve Loughran commented on SPARK-24492: well, you've got a consistency problem as a task commit

[jira] [Updated] (SPARK-24492) Endless attempted task when TaskCommitDenied exception writing to S3A

2018-06-08 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-24492: --- Summary: Endless attempted task when TaskCommitDenied exception writing to S3A (was:

[jira] [Updated] (SPARK-24476) java.net.SocketTimeoutException: Read timed out under jets3t while running the Spark Structured Streaming

2018-06-08 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-24476: --- Component/s: (was: Spark Core) Structured Streaming >

[jira] [Commented] (SPARK-24476) java.net.SocketTimeoutException: Read timed out Exception while running the Spark Structured Streaming in 2.3.0

2018-06-08 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506002#comment-16506002 ] Steve Loughran commented on SPARK-24476: Switch from s3n to the s3a connector, see if it goes

[jira] [Updated] (SPARK-24476) java.net.SocketTimeoutException: Read timed out under jets3t while running the Spark Structured Streaming

2018-06-08 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-24476: --- Summary: java.net.SocketTimeoutException: Read timed out under jets3t while running the

[jira] [Commented] (SPARK-24492) Endless attempted task when TaskCommitDenied exception writing to S3A

2018-06-08 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506009#comment-16506009 ] Steve Loughran commented on SPARK-24492: the retry problem looks like something with the commit

[jira] [Updated] (SPARK-24476) java.net.SocketTimeoutException: Read timed out under jets3t while running the Spark Structured Streaming

2018-06-08 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-24476: --- Priority: Minor (was: Major) > java.net.SocketTimeoutException: Read timed out under

[jira] [Commented] (SPARK-23534) Spark run on Hadoop 3.0.0

2018-06-08 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506021#comment-16506021 ] Steve Loughran commented on SPARK-23534: [~jerryshao] is that the HDFS token identifier thing?

[jira] [Commented] (SPARK-24476) java.net.SocketTimeoutException: Read timed out under jets3t while running the Spark Structured Streaming

2018-06-15 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513933#comment-16513933 ] Steve Loughran commented on SPARK-24476: * Use S3A, as S3n is unsupported and deleted from the

[jira] [Updated] (SPARK-24273) Failure while using .checkpoint method to private S3 store via S3A connector

2018-05-27 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-24273: --- Summary: Failure while using .checkpoint method to private S3 store via S3A connector (was:

[jira] [Updated] (SPARK-24273) Failure while using .checkpoint method to private S3 store via S3A connector

2018-05-27 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-24273: --- Description: We are getting following error: {code}

[jira] [Commented] (SPARK-24273) Failure while using .checkpoint method to private S3 store via S3A connector

2018-05-27 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492159#comment-16492159 ] Steve Loughran commented on SPARK-24273: Of course, there's no need to send range headers on a 0

[jira] [Commented] (SPARK-18673) Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version

2018-06-01 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497928#comment-16497928 ] Steve Loughran commented on SPARK-18673: Jerry, I list up the other commits I'd like to put in;

[jira] [Commented] (SPARK-24470) RestSubmissionClient to be robust against 404 & non json responses

2018-06-05 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502177#comment-16502177 ] Steve Loughran commented on SPARK-24470: stack from the issue {code} Running Spark using the

[jira] [Created] (SPARK-24470) RestSubmissionClient to be robust against 404 & non json responses

2018-06-05 Thread Steve Loughran (JIRA)
Steve Loughran created SPARK-24470: -- Summary: RestSubmissionClient to be robust against 404 & non json responses Key: SPARK-24470 URL: https://issues.apache.org/jira/browse/SPARK-24470 Project:

[jira] [Commented] (SPARK-20202) Remove references to org.spark-project.hive

2018-06-04 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500560#comment-16500560 ] Steve Loughran commented on SPARK-20202: I think you could split things into two # a modified

[jira] [Commented] (SPARK-13446) Spark need to support reading data from Hive 2.0.0 metastore

2018-04-26 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453825#comment-16453825 ] Steve Loughran commented on SPARK-13446: [~Tavis]: can you paste in the stack you see? > Spark

[jira] [Commented] (SPARK-22240) S3 CSV number of partitions incorrectly computed

2017-10-28 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223400#comment-16223400 ] Steve Loughran commented on SPARK-22240: so this partition calculation problem is independent of

[jira] [Commented] (SPARK-7755) MetadataCache.refresh does not take into account _SUCCESS

2018-01-05 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313803#comment-16313803 ] Steve Loughran commented on SPARK-7755: --- I concur with [~hyukjin.kwon] here: if incomplete files are

[jira] [Resolved] (SPARK-18883) FileNotFoundException on _temporary directory

2018-01-08 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved SPARK-18883. Resolution: Won't Fix I'm going to close as a WONTFIX, because the solution is "don't use

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-15 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326330#comment-16326330 ] Steve Loughran commented on SPARK-23050: Quick review of the code Yes, there's potentially a

[jira] [Comment Edited] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-15 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326330#comment-16326330 ] Steve Loughran edited comment on SPARK-23050 at 1/15/18 3:24 PM: - Quick

[jira] [Commented] (SPARK-21697) NPE & ExceptionInInitializerError trying to load UTF from HDFS

2018-01-17 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16328928#comment-16328928 ] Steve Loughran commented on SPARK-21697: No, it's spark's ability to have hdfs:// URLs on the

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-21 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333477#comment-16333477 ] Steve Loughran commented on SPARK-23050: there's one thing which worries me here: the implication

[jira] [Commented] (SPARK-6305) Add support for log4j 2.x to Spark

2018-01-16 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327144#comment-16327144 ] Steve Loughran commented on SPARK-6305: --- It'll be related to HADOOP-12956 , HDFS-12829,

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-12 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324745#comment-16324745 ] Steve Loughran commented on SPARK-23050: this s3n is the amazon EMR closed source impl; nothing

[jira] [Commented] (SPARK-23308) ignoreCorruptFiles should not ignore retryable IOException

2018-02-08 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357266#comment-16357266 ] Steve Loughran commented on SPARK-23308: bq. Other option would be creating a special exception

[jira] [Commented] (SPARK-23308) ignoreCorruptFiles should not ignore retryable IOException

2018-02-08 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357387#comment-16357387 ] Steve Loughran commented on SPARK-23308: HADOOP-15216 covers S3A handling this failure with

[jira] [Commented] (SPARK-23308) ignoreCorruptFiles should not ignore retryable IOException

2018-02-06 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354784#comment-16354784 ] Steve Loughran commented on SPARK-23308: bq. I have not heard this come up before as an issue in

[jira] [Comment Edited] (SPARK-23308) ignoreCorruptFiles should not ignore retryable IOException

2018-02-06 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354784#comment-16354784 ] Steve Loughran edited comment on SPARK-23308 at 2/7/18 12:26 AM: - bq. I

[jira] [Commented] (SPARK-23308) ignoreCorruptFiles should not ignore retryable IOException

2018-02-12 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360809#comment-16360809 ] Steve Loughran commented on SPARK-23308: BTW bq I should get at least ~82k partitions, thus the

[jira] [Comment Edited] (SPARK-23308) ignoreCorruptFiles should not ignore retryable IOException

2018-02-14 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360809#comment-16360809 ] Steve Loughran edited comment on SPARK-23308 at 2/14/18 11:27 AM: -- BTW

[jira] [Commented] (SPARK-23308) ignoreCorruptFiles should not ignore retryable IOException

2018-02-15 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365495#comment-16365495 ] Steve Loughran commented on SPARK-23308: I'm going to recommend this is closed as a WONTFIX. Not

[jira] [Commented] (SPARK-11182) HDFS Delegation Token will be expired when calling "UserGroupInformation.getCurrentUser.addCredentials" in HA mode

2018-02-20 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369914#comment-16369914 ] Steve Loughran commented on SPARK-11182: bug is in HDFS; been fixed in 2.8.2+ with cherry

[jira] [Commented] (SPARK-23420) Datasource loading not handling paths with regex chars.

2018-02-19 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369330#comment-16369330 ] Steve Loughran commented on SPARK-23420: Can I note that if there's a colon in the path, it'd

[jira] [Commented] (SPARK-23683) FileCommitProtocol.instantiate to require 3-arg constructor for dynamic partition overwrite

2018-07-26 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558660#comment-16558660 ] Steve Loughran commented on SPARK-23683: If it's a regression, you could argue for it >

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-08-08 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574276#comment-16574276 ] Steve Loughran commented on SPARK-23050: bq. Is there any way we can avoid happening this? With

[jira] [Commented] (SPARK-22634) Update Bouncy castle dependency

2018-08-07 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572675#comment-16572675 ] Steve Loughran commented on SPARK-22634: If nothing else is using it, correct. And nothing is

[jira] [Updated] (SPARK-23654) Cut jets3t as a dependency of spark-core

2018-08-13 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-23654: --- Summary: Cut jets3t as a dependency of spark-core (was: Cut jets3t and bouncy castle as

[jira] [Updated] (SPARK-23654) Cut jets3t and bouncy castle as dependencies of spark-core

2018-08-13 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-23654: --- Summary: Cut jets3t and bouncy castle as dependencies of spark-core (was: Cut jets3t as a

[jira] [Commented] (SPARK-24787) Events being dropped at an alarming rate due to hsync being slow for eventLogging

2018-08-15 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581658#comment-16581658 ] Steve Loughran commented on SPARK-24787: yes,, hsync updating the file length is the problem;

[jira] [Commented] (SPARK-25111) increment kinesis client/producer lib versions & aws-sdk to match

2018-08-15 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581663#comment-16581663 ] Steve Loughran commented on SPARK-25111: FWIW, it'd be interesting to do a followup & add the

[jira] [Created] (SPARK-25111) increment kinesis client/producer lib versions & aws-sdk to match

2018-08-13 Thread Steve Loughran (JIRA)
Steve Loughran created SPARK-25111: -- Summary: increment kinesis client/producer lib versions & aws-sdk to match Key: SPARK-25111 URL: https://issues.apache.org/jira/browse/SPARK-25111 Project: Spark

[jira] [Commented] (SPARK-22236) CSV I/O: does not respect RFC 4180

2018-08-16 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582930#comment-16582930 ] Steve Loughran commented on SPARK-22236: bq. One can always repartition after reading, and if

[jira] [Commented] (SPARK-22236) CSV I/O: does not respect RFC 4180

2018-08-16 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582933#comment-16582933 ] Steve Loughran commented on SPARK-22236: bq. But is the implication that we can never change the

[jira] [Commented] (SPARK-24771) Upgrade AVRO version from 1.7.7 to 1.8

2018-08-16 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582911#comment-16582911 ] Steve Loughran commented on SPARK-24771: Linking to the previous PR, as that's got the

[jira] [Commented] (SPARK-24771) Upgrade AVRO version from 1.7.7 to 1.8

2018-08-16 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583077#comment-16583077 ] Steve Loughran commented on SPARK-24771: All the wire stuff (e.g. to HDFS is protobuf).

[jira] [Commented] (SPARK-22236) CSV I/O: does not respect RFC 4180

2018-08-09 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575629#comment-16575629 ] Steve Loughran commented on SPARK-22236: I wouldn't recommend changing multiline=true by default

[jira] [Created] (SPARK-25183) Spark HiveServer2 registers shutdown hook with JVM, not ShutdownHookManager; race conditions can arise

2018-08-21 Thread Steve Loughran (JIRA)
Steve Loughran created SPARK-25183: -- Summary: Spark HiveServer2 registers shutdown hook with JVM, not ShutdownHookManager; race conditions can arise Key: SPARK-25183 URL:

[jira] [Commented] (SPARK-25180) Spark standalone failure in Utils.doFetchFile() if nslookup of local hostname fails

2018-08-21 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588005#comment-16588005 ] Steve Loughran commented on SPARK-25180: Stack {code} scala> text("hello all!") res10: String =

[jira] [Created] (SPARK-25180) Spark standalone failure in Utils.doFetchFile() if nslookup of local hostname fails

2018-08-21 Thread Steve Loughran (JIRA)
Steve Loughran created SPARK-25180: -- Summary: Spark standalone failure in Utils.doFetchFile() if nslookup of local hostname fails Key: SPARK-25180 URL: https://issues.apache.org/jira/browse/SPARK-25180

[jira] [Commented] (SPARK-6305) Add support for log4j 2.x to Spark

2018-08-21 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588053#comment-16588053 ] Steve Loughran commented on SPARK-6305: --- 1. exclusion of log4j 1.x you can only safely exclude it

[jira] [Commented] (SPARK-25180) Spark standalone failure in Utils.doFetchFile() if nslookup of local hostname fails

2018-08-21 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588001#comment-16588001 ] Steve Loughran commented on SPARK-25180: code snippet was some trivial CSV => ORC with both src

[jira] [Commented] (SPARK-25180) Spark standalone failure in Utils.doFetchFile() if nslookup of local hostname fails

2018-08-21 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588012#comment-16588012 ] Steve Loughran commented on SPARK-25180: Netty converts the UnknownHostException into an IOE in

[jira] [Commented] (SPARK-25180) Spark standalone failure in Utils.doFetchFile() if nslookup of local hostname fails

2018-08-21 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588035#comment-16588035 ] Steve Loughran commented on SPARK-25180: FWIW, there was no in-progress data at the dest store,

[jira] [Commented] (SPARK-20799) Unable to infer schema for ORC/Parquet on S3N when secrets are in the URL

2018-08-28 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595633#comment-16595633 ] Steve Loughran commented on SPARK-20799: [~jzijlstra] yes, the final listing. Note that in

[jira] [Commented] (SPARK-25155) Streaming from storage doesn't work when no directories exists

2018-08-19 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585438#comment-16585438 ] Steve Loughran commented on SPARK-25155: >From SPARK-17159 I have a more cloud-optimized stream

[jira] [Commented] (SPARK-25126) avoid creating OrcFile.Reader for all orc files

2018-08-22 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589326#comment-16589326 ] Steve Loughran commented on SPARK-25126: + [~dongjoon] > avoid creating OrcFile.Reader for all

[jira] [Commented] (SPARK-6305) Add support for log4j 2.x to Spark

2018-08-29 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596404#comment-16596404 ] Steve Loughran commented on SPARK-6305: --- bq. Could be possible that nobody is swapping it out for

[jira] [Commented] (SPARK-25180) Spark standalone failure in Utils.doFetchFile() if nslookup of local hostname fails

2018-08-29 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596566#comment-16596566 ] Steve Loughran commented on SPARK-25180: Reviewing a bit more, I think the root cause was *

[jira] [Commented] (SPARK-24746) AWS S3 301 Moved Permanently error message even after setting fs.s3a.endpoint for bucket in Mumbai region.

2018-07-06 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534728#comment-16534728 ] Steve Loughran commented on SPARK-24746: Mumbai is v4 auth, which isn't directly supported in

[jira] [Commented] (SPARK-24492) Endless attempted task when TaskCommitDenied exception writing to S3A

2018-07-06 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534725#comment-16534725 ] Steve Loughran commented on SPARK-24492: I think you'll have to set the logs to debug level and

[jira] [Commented] (SPARK-21962) Distributed Tracing in Spark

2018-07-12 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541915#comment-16541915 ] Steve Loughran commented on SPARK-21962: Yes, assume that: * Htrace goes off the classpath,

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-16 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327649#comment-16327649 ] Steve Loughran commented on SPARK-23050: {quote} Is there an API to detect S3 like file systems?

[jira] [Commented] (SPARK-23123) Unable to run Spark Job with Hadoop NameNode Federation using ViewFS

2018-01-17 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16328651#comment-16328651 ] Steve Loughran commented on SPARK-23123: I've never looked at ViewFS internals before, so treat

[jira] [Commented] (SPARK-23652) Spark Connection with S3

2018-03-12 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395428#comment-16395428 ] Steve Loughran commented on SPARK-23652: Don't use the s3:// connector which ships with the ASF

[jira] [Updated] (SPARK-23652) Verify error when using ASF s3:// connector.

2018-03-12 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-23652: --- Summary: Verify error when using ASF s3:// connector. (was: Spark Connection with S3) >

[jira] [Updated] (SPARK-23652) Verify error when using ASF s3:// connector. & Jetty 0.9.4

2018-03-12 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-23652: --- Priority: Minor (was: Critical) > Verify error when using ASF s3:// connector. & Jetty

[jira] [Commented] (SPARK-23652) Verify error when using ASF s3:// connector. & Jetty 0.9.4

2018-03-12 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395449#comment-16395449 ] Steve Loughran commented on SPARK-23652: this stack trace is just HADOOP-11086; tagging as a

[jira] [Commented] (SPARK-23654) cut jets3t as a dependency of spark-core

2018-03-12 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395451#comment-16395451 ] Steve Loughran commented on SPARK-23654: SPARK-22634 highights that the spark-hadoop-cloud module

[jira] [Updated] (SPARK-23654) cut jets3t as a dependency of spark-core; exclude it from hadoop-cloud module as incompatible

2018-03-12 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-23654: --- Summary: cut jets3t as a dependency of spark-core; exclude it from hadoop-cloud module as

[jira] [Created] (SPARK-23654) cut jets3t as a dependency of spark-core

2018-03-12 Thread Steve Loughran (JIRA)
Steve Loughran created SPARK-23654: -- Summary: cut jets3t as a dependency of spark-core Key: SPARK-23654 URL: https://issues.apache.org/jira/browse/SPARK-23654 Project: Spark Issue Type:

[jira] [Updated] (SPARK-23652) Verify error when using ASF s3:// connector. & Jetty 0.9.4

2018-03-12 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-23652: --- Summary: Verify error when using ASF s3:// connector. & Jetty 0.9.4 (was: Verify error when

[jira] [Commented] (SPARK-23654) cut jets3t as a dependency of spark-core; exclude it from hadoop-cloud module as incompatible

2018-03-14 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398653#comment-16398653 ] Steve Loughran commented on SPARK-23654: # In Hadoop 3.x anyone trying to create an s3n client is

[jira] [Created] (SPARK-23683) FileCommitProtocol.instantiate to require 3-arg constructor for dynamic partition overwrite

2018-03-14 Thread Steve Loughran (JIRA)
Steve Loughran created SPARK-23683: -- Summary: FileCommitProtocol.instantiate to require 3-arg constructor for dynamic partition overwrite Key: SPARK-23683 URL: https://issues.apache.org/jira/browse/SPARK-23683

[jira] [Updated] (SPARK-23681) Switch OrcFileFormat to using newer hadoop.mapreduce output classes

2018-03-14 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-23681: --- Summary: Switch OrcFileFormat to using newer hadoop.mapreduce output classes (was: Move

[jira] [Updated] (SPARK-23681) Switch OrcFileFormat to newer hadoop.mapreduce output classes

2018-03-14 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-23681: --- Summary: Switch OrcFileFormat to newer hadoop.mapreduce output classes (was: Switch

[jira] [Resolved] (SPARK-23652) Verify error when using ASF s3:// connector. & Jetty 0.9.4

2018-03-14 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved SPARK-23652. Resolution: Duplicate > Verify error when using ASF s3:// connector. & Jetty 0.9.4 >

[jira] [Created] (SPARK-23681) Move OrcFileFormat switch to using hadoop.mapreduce classes

2018-03-14 Thread Steve Loughran (JIRA)
Steve Loughran created SPARK-23681: -- Summary: Move OrcFileFormat switch to using hadoop.mapreduce classes Key: SPARK-23681 URL: https://issues.apache.org/jira/browse/SPARK-23681 Project: Spark

[jira] [Commented] (SPARK-22634) Update Bouncy castle dependency

2018-03-14 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398659#comment-16398659 ] Steve Loughran commented on SPARK-22634: moving to jets3t 0.9.4 breaks the (legacy) s3 & s3n

[jira] [Commented] (SPARK-22919) Bump Apache httpclient versions

2018-04-09 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431198#comment-16431198 ] Steve Loughran commented on SPARK-22919: going to highlight this appears to break Spark &

[jira] [Commented] (SPARK-23966) Refactoring all checkpoint file writing logic in a common interface

2018-04-13 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437201#comment-16437201 ] Steve Loughran commented on SPARK-23966: w.r.t FileContext.rename vs FileSystem.rename(), they

[jira] [Created] (SPARK-23977) Add committer binding to Hadoop 3.1 PathOutputCommitter Mechanism

2018-04-13 Thread Steve Loughran (JIRA)
Steve Loughran created SPARK-23977: -- Summary: Add committer binding to Hadoop 3.1 PathOutputCommitter Mechanism Key: SPARK-23977 URL: https://issues.apache.org/jira/browse/SPARK-23977 Project: Spark

[jira] [Updated] (SPARK-23977) Add commit protocol binding to Hadoop 3.1 PathOutputCommitter mechanism

2018-04-13 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-23977: --- Summary: Add commit protocol binding to Hadoop 3.1 PathOutputCommitter mechanism (was: Add

<    1   2   3   4   5   6   7   8   9   >