[jira] [Commented] (SPARK-27339) Decimal up cast to higher scale fails while reading parquet to Dataset

2022-09-27 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-27339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609958#comment-17609958 ] sam commented on SPARK-27339: - [~hyukjin.kwon][~wrschneider99] [~ksbalas]. We are working on

[jira] [Updated] (SPARK-40048) Cached partitions are traversed multiple times (invalidating Accumulator consistency)

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-40048: Summary: Cached partitions are traversed multiple times (invalidating Accumulator consistency) (was: Partitions a

[jira] [Updated] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-40048: Description: We are trying to use Accumulators to count RDDs without having to force `.count()` on them for effic

[jira] [Updated] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-40048: Description: We are trying to use Accumulators to count RDDs without having to force `.count()` on them for effic

[jira] [Updated] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-40048: Description: We are trying to use Accumulators to count RDDs without having to force `.count()` on them for effic

[jira] [Updated] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-40048: Affects Version/s: 3.2.1 > Partitions are traversed multiple times invalidating Accumulator consistency >

[jira] [Commented] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579231#comment-17579231 ] sam commented on SPARK-40048: - [~hyukjin.kwon] Unfortunately bumping to 3.2.1 did not fix th

[jira] [Comment Edited] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579229#comment-17579229 ] sam edited comment on SPARK-40048 at 8/13/22 9:30 AM: -- We tried `3.

[jira] [Commented] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579229#comment-17579229 ] sam commented on SPARK-40048: - We tried `3.2.1` and I'm now looking at https://github.com/t

[jira] [Commented] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579226#comment-17579226 ] sam commented on SPARK-40048: - Thanks [~hyukjin.kwon], but we hit a number of issues trying

[jira] [Updated] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-40048: Description: We are trying to use Accumulators to count RDDs without having to force `.count()` on them for effic

[jira] [Commented] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-11 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578447#comment-17578447 ] sam commented on SPARK-40048: - I've found a very dodgy hack around with this: ``` def force

[jira] [Updated] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-11 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-40048: Description: We are trying to use Accumulators to count RDDs without having to force `.count()` on them for effic

[jira] [Commented] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-11 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578369#comment-17578369 ] sam commented on SPARK-40048: - Also confirmed no eviction seems to be happening with https:

[jira] [Updated] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-11 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-40048: Description: We are trying to use Accumulators to count RDDs without having to force `.count()` on them for effic

[jira] [Created] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-11 Thread sam (Jira)
sam created SPARK-40048: --- Summary: Partitions are traversed multiple times invalidating Accumulator consistency Key: SPARK-40048 URL: https://issues.apache.org/jira/browse/SPARK-40048 Project: Spark I

[jira] [Commented] (SPARK-10000) Consolidate storage and execution memory management

2022-07-28 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572486#comment-17572486 ] sam commented on SPARK-1: - Is there any way to disable Spark from evicting rdd partition

[jira] [Commented] (SPARK-14289) Support multiple eviction strategies for cached RDD partitions

2022-07-28 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-14289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572485#comment-17572485 ] sam commented on SPARK-14289: - Is there any way to disable Spark from evicting rdd partition

[jira] [Updated] (SPARK-36966) Spark evicts RDD partitions instead of allowing OOM

2021-10-16 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-36966: Affects Version/s: 1.6.0 > Spark evicts RDD partitions instead of allowing OOM > -

[jira] [Updated] (SPARK-36966) Spark evicts RDD partitions instead of allowing OOM

2021-10-16 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-36966: Description: In the past Spark (pre 1.6) jobs would give OOM if an RDD could not fit into memory (when trying to

[jira] [Updated] (SPARK-36966) Spark evicts RDD partitions instead of allowing OOM

2021-10-16 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-36966: Description: In the past Spark jobs would give OOM if an RDD could not fit into memory (when trying to cache with

[jira] [Created] (SPARK-36966) Spark evicts RDD partitions instead of allowing OOM

2021-10-09 Thread sam (Jira)
sam created SPARK-36966: --- Summary: Spark evicts RDD partitions instead of allowing OOM Key: SPARK-36966 URL: https://issues.apache.org/jira/browse/SPARK-36966 Project: Spark Issue Type: Bug C

[jira] [Updated] (SPARK-34733) Spark UI not showing memory used of partitions in memory

2021-03-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-34733: Description: We have a job that caches RDDs into memory. We know the code to cache is working as the spark logs

[jira] [Updated] (SPARK-34733) Spark UI not showing memory used of partitions in memory

2021-03-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-34733: Description: We have a job that caches RDDs into memory. We know the code to cache is working as the spark logs

[jira] [Created] (SPARK-34733) Spark UI not showing memory used of partitions in memory

2021-03-13 Thread sam (Jira)
sam created SPARK-34733: --- Summary: Spark UI not showing memory used of partitions in memory Key: SPARK-34733 URL: https://issues.apache.org/jira/browse/SPARK-34733 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-34733) Spark UI not showing memory used of partitions in memory

2021-03-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-34733: Attachment: Screenshot 2021-03-13 at 16.31.06.png > Spark UI not showing memory used of partitions in memory > ---

[jira] [Commented] (SPARK-30101) spark.sql.shuffle.partitions is not in Configuration docs, but a very critical parameter

2019-12-05 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988676#comment-16988676 ] sam commented on SPARK-30101: - [~kabhwan] [~cloud_fan] [~sowen] > We may deal with it we st

[jira] [Updated] (SPARK-30101) spark.sql.shuffle.partitions is not in Configuration docs, but a very critical parameter

2019-12-03 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-30101: Description: I'm creating a `SparkSession` like this: ``` SparkSession .builder().appName("foo").master("lo

[jira] [Updated] (SPARK-30101) spark.sql.shuffle.partitions is not in Configuration docs, but a very critical parameter

2019-12-03 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-30101: Summary: spark.sql.shuffle.partitions is not in Configuration docs, but a very critical parameter (was: Dataset d

[jira] [Reopened] (SPARK-30101) Dataset distinct does not respect spark.default.parallelism

2019-12-03 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam reopened SPARK-30101: - What is expected, is what is documented. > Dataset distinct does not respect spark.default.parallelism > --

[jira] [Commented] (SPARK-30101) Dataset distinct does not respect spark.default.parallelism

2019-12-03 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986722#comment-16986722 ] sam commented on SPARK-30101: - [~cloud_fan] [~kabhwan] Well this is at least a documentation

[jira] [Created] (SPARK-30101) Dataset distinct does not respect spark.default.parallelism

2019-12-02 Thread sam (Jira)
sam created SPARK-30101: --- Summary: Dataset distinct does not respect spark.default.parallelism Key: SPARK-30101 URL: https://issues.apache.org/jira/browse/SPARK-30101 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-26770) Misleading/unhelpful error message when wrapping a null in an Option

2019-01-29 Thread sam (JIRA)
sam created SPARK-26770: --- Summary: Misleading/unhelpful error message when wrapping a null in an Option Key: SPARK-26770 URL: https://issues.apache.org/jira/browse/SPARK-26770 Project: Spark Issue Typ

[jira] [Updated] (SPARK-26534) Closure Cleaner Bug

2019-01-07 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-26534: Description: I've found a strange combination of closures where the closure cleaner doesn't seem to be smart enou

[jira] [Commented] (SPARK-26534) Closure Cleaner Bug

2019-01-07 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735699#comment-16735699 ] sam commented on SPARK-26534: - [~viirya] If I change to RDD I cannot reproduce either.  Thi

[jira] [Commented] (SPARK-26534) Closure Cleaner Bug

2019-01-06 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735211#comment-16735211 ] sam commented on SPARK-26534: - [~viirya] Your version is slightly different, can you reprodu

[jira] [Created] (SPARK-26534) Closure Cleaner Bug

2019-01-04 Thread sam (JIRA)
sam created SPARK-26534: --- Summary: Closure Cleaner Bug Key: SPARK-26534 URL: https://issues.apache.org/jira/browse/SPARK-26534 Project: Spark Issue Type: Bug Components: Spark Core Affect

[jira] [Comment Edited] (SPARK-2243) Support multiple SparkContexts in the same JVM

2018-11-12 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16683600#comment-16683600 ] sam edited comment on SPARK-2243 at 11/12/18 11:24 AM: --- Big bonus o

[jira] [Commented] (SPARK-2243) Support multiple SparkContexts in the same JVM

2018-11-12 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16683600#comment-16683600 ] sam commented on SPARK-2243: Big bonus of being able to create and shutdown SparkContexts is

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2018-05-30 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494852#comment-16494852 ] sam commented on SPARK-20144: - Regarding the original issue of sorting, I agree with [~srowe

[jira] [Created] (SPARK-24425) Regression from 1.6 to 2.x - Spark no longer respects input partitions, unnecessary shuffle required

2018-05-30 Thread sam (JIRA)
sam created SPARK-24425: --- Summary: Regression from 1.6 to 2.x - Spark no longer respects input partitions, unnecessary shuffle required Key: SPARK-24425 URL: https://issues.apache.org/jira/browse/SPARK-24425 Pr

[jira] [Commented] (SPARK-6190) create LargeByteBuffer abstraction for eliminating 2GB limit on blocks

2018-03-21 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407729#comment-16407729 ] sam commented on SPARK-6190: [~bdolbeare] [~UZiVcbfPXaNrMtT] I completely agree that it's dep

[jira] [Commented] (SPARK-17998) Reading Parquet files coalesces parts into too few in-memory partitions

2018-01-10 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320041#comment-16320041 ] sam commented on SPARK-17998: - [~srowen] Thanks, no idea where I got that from, cursed weakly

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2017-10-13 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203882#comment-16203882 ] sam commented on SPARK-20144: - I think this is a regression. We used to be able to easily co

[jira] [Commented] (SPARK-17998) Reading Parquet files coalesces parts into too few in-memory partitions

2017-10-13 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203877#comment-16203877 ] sam commented on SPARK-17998: - [~lwlin] I think this is a regression. We used to be able to

[jira] [Commented] (SPARK-22225) wholeTextFilesIterators

2017-10-10 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198326#comment-16198326 ] sam commented on SPARK-5: - Thanks [~srowen] and [~hyukjin.kwon], I wasn't aware of either

[jira] [Commented] (SPARK-18965) wholeTextFiles() is not able to read large files

2017-10-09 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196872#comment-16196872 ] sam commented on SPARK-18965: - [~pradeep_misra] [~srowen]. Yes it's a new feature. What we

[jira] [Created] (SPARK-22225) wholeTextFilesIterators

2017-10-09 Thread sam (JIRA)
sam created SPARK-5: --- Summary: wholeTextFilesIterators Key: SPARK-5 URL: https://issues.apache.org/jira/browse/SPARK-5 Project: Spark Issue Type: New Feature Components: Spark Cor

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-21 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057592#comment-16057592 ] sam commented on SPARK-21137: - [~srowen] I thought I already made a point about that? Please

[jira] [Comment Edited] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054176#comment-16054176 ] sam edited comment on SPARK-21137 at 6/19/17 3:20 PM: -- [~srowen] Ah

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054176#comment-16054176 ] sam commented on SPARK-21137: - [~srowen] Ah OK, sorry, not used to that process. On other pr

[jira] [Comment Edited] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054111#comment-16054111 ] sam edited comment on SPARK-21137 at 6/19/17 2:36 PM: -- [~srowen] >

[jira] [Comment Edited] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054111#comment-16054111 ] sam edited comment on SPARK-21137 at 6/19/17 2:35 PM: -- [~srowen] >

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054111#comment-16054111 ] sam commented on SPARK-21137: - [~srowen] > what stages are executing if any? *None, no tas

[jira] [Updated] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-21137: Description: A very common use case in big data is to read a large number of small files. For example the Enron e

[jira] [Comment Edited] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054026#comment-16054026 ] sam edited comment on SPARK-21137 at 6/19/17 1:53 PM: -- [~srowen] As

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054026#comment-16054026 ] sam commented on SPARK-21137: - [~srowen] As I said in the description, which you may have mi

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053977#comment-16053977 ] sam commented on SPARK-21137: - [~srowen] So I've provided full reproduce steps here (includi

[jira] [Updated] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-21137: Description: A very common use case in big data is to read a large number of small files. For example the Enron e

[jira] [Comment Edited] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053808#comment-16053808 ] sam edited comment on SPARK-21137 at 6/19/17 11:14 AM: --- [~srowen] S

[jira] [Reopened] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam reopened SPARK-21137: - Reopened after adding detail. > Spark cannot read many small files (wholeTextFiles) > --

[jira] [Updated] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-21137: Description: A very common use case in big data is to read a large number of small files. For example the Enron e

[jira] [Updated] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-21137: Description: A very common use case in big data is to read a large number of small files. For example the Enron e

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053808#comment-16053808 ] sam commented on SPARK-21137: - [~srowen] Sorry about the lack of detail Sean. I guess I just

[jira] [Created] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
sam created SPARK-21137: --- Summary: Spark cannot read many small files (wholeTextFiles) Key: SPARK-21137 URL: https://issues.apache.org/jira/browse/SPARK-21137 Project: Spark Issue Type: Bug C

[jira] [Commented] (SPARK-5159) Thrift server does not respect hive.server2.enable.doAs=true

2017-02-01 Thread Sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848259#comment-15848259 ] Sam commented on SPARK-5159: We are still having exactly this issue, any advice would be great

[jira] [Commented] (SPARK-11075) Spark SQL Thrift Server authentication issue on kerberized yarn cluster

2017-02-01 Thread Sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848257#comment-15848257 ] Sam commented on SPARK-11075: - We are still having exactly this issue, any advice would be gr

[jira] [Commented] (SPARK-16666) Kryo encoder for custom complex classes

2016-08-05 Thread Sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410454#comment-15410454 ] Sam commented on SPARK-1: - [~clockfly] in your code sample, there is a case class for Poi

[jira] [Updated] (SPARK-16666) Kryo encoder for custom complex classes

2016-07-21 Thread Sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam updated SPARK-1: Description: I'm trying to create a dataset with some geo data using spark and esri. If `Foo` only have `Point` fi

[jira] [Updated] (SPARK-16666) Kryo encoder for custom complex classes

2016-07-21 Thread Sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam updated SPARK-1: Description: I'm trying to create a dataset with some geo data using spark and esri. If `Foo` only have `Point` fi

[jira] [Updated] (SPARK-16666) Kryo encoder for custom complex classes

2016-07-21 Thread Sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam updated SPARK-1: Description: I'm trying to create a dataset with some geo data using spark and esri. If `Foo` only have `Point` fi

[jira] [Created] (SPARK-16666) Kryo encoder for custom complex classes

2016-07-21 Thread Sam (JIRA)
Sam created SPARK-1: --- Summary: Kryo encoder for custom complex classes Key: SPARK-1 URL: https://issues.apache.org/jira/browse/SPARK-1 Project: Spark Issue Type: Question Componen

[jira] [Commented] (SPARK-11853) java.lang.ClassNotFoundException with spray-json on EMR

2015-11-27 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1503#comment-1503 ] sam commented on SPARK-11853: - Fair enough [~srowen] I'll concede that dependency management

[jira] [Updated] (SPARK-11853) java.lang.ClassNotFoundException with spray-json on EMR

2015-11-23 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11853: Affects Version/s: (was: 1.5.1) 1.5.0 > java.lang.ClassNotFoundException with spray-json

[jira] [Commented] (SPARK-11853) java.lang.ClassNotFoundException with spray-json on EMR

2015-11-21 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15020545#comment-15020545 ] sam commented on SPARK-11853: - OK, I'll look into that next week and see if I can put togethe

[jira] [Commented] (SPARK-11853) java.lang.ClassNotFoundException with spray-json on EMR

2015-11-21 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15020381#comment-15020381 ] sam commented on SPARK-11853: - spark-submit --master yarn-client --class my.class.Main my.jar

[jira] [Comment Edited] (SPARK-11853) java.lang.ClassNotFoundException with spray-json on EMR

2015-11-20 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018032#comment-15018032 ] sam edited comment on SPARK-11853 at 11/20/15 1:40 PM: --- [~srowen] W

[jira] [Commented] (SPARK-11853) java.lang.ClassNotFoundException with spray-json on EMR

2015-11-20 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018032#comment-15018032 ] sam commented on SPARK-11853: - [~srowen] We are not using --jars or anything like that, just

[jira] [Updated] (SPARK-11853) java.lang.ClassNotFoundException with spray-json on EMR

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11853: Description: I'm using a fat jar (sbt assembly), so there is no reason for spark to do this. MORE INFO ENVIRONMEN

[jira] [Reopened] (SPARK-11853) java.lang.ClassNotFoundException with spray-json on EMR

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam reopened SPARK-11853: - > java.lang.ClassNotFoundException with spray-json on EMR > --- >

[jira] [Updated] (SPARK-11853) java.lang.ClassNotFoundException with spray-json on EMR

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11853: Summary: java.lang.ClassNotFoundException with spray-json on EMR (was: java.lang.ClassNotFoundException for no rea

[jira] [Commented] (SPARK-3877) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014683#comment-15014683 ] sam commented on SPARK-3877: Actually ignore, as per comment in duplicate, can't seem to repro

[jira] [Closed] (SPARK-11854) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam closed SPARK-11854. --- Tried to produce a minimal app to reproduce, couldn't, probably issue lies between keyboard and chair. I assumed it was

[jira] [Commented] (SPARK-11853) java.lang.ClassNotFoundException for no reason

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014667#comment-15014667 ] sam commented on SPARK-11853: - Just ran a simple 1 line app with `sc.makeRDD` locally trying

[jira] [Comment Edited] (SPARK-11854) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014144#comment-15014144 ] sam edited comment on SPARK-11854 at 11/19/15 10:56 PM: [~srowen]

[jira] [Commented] (SPARK-11853) java.lang.ClassNotFoundException for no reason

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014635#comment-15014635 ] sam commented on SPARK-11853: - [~srowen] // you're not sure what version you're running her

[jira] [Updated] (SPARK-11853) java.lang.ClassNotFoundException for no reason

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11853: Description: I'm using a fat jar (sbt assembly), so there is no reason for spark to do this. MORE INFO ENVIRONMEN

[jira] [Commented] (SPARK-11853) java.lang.ClassNotFoundException for no reason

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014149#comment-15014149 ] sam commented on SPARK-11853: - [~srowen] Take another look, I edited the description shortly

[jira] [Commented] (SPARK-11854) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014144#comment-15014144 ] sam commented on SPARK-11854: - [~srowen] It's on emr-4.1.0 with latest Spark EMR uses (so 1.5

[jira] [Updated] (SPARK-11853) java.lang.ClassNotFoundException for no reason

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11853: Description: I'm using a fat jar (sbt assembly), so there is no reason for spark to do this. MORE INFO ENVIRONMEN

[jira] [Updated] (SPARK-11853) java.lang.ClassNotFoundException for no reason

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11853: Description: I'm using a fat jar (sbt assembly), so there is no reason for spark to do this. MORE INFO ENVIRONMEN

[jira] [Updated] (SPARK-11853) java.lang.ClassNotFoundException for no reason

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11853: Description: I'm using a fat jar (sbt assembly), so there is no reason for spark to do this. MORE INFO ENVIRONMEN

[jira] [Updated] (SPARK-11854) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11854: Description: When an yarn application fails (-yarn-cluster- yarn-client mode), the exit code of spark-submit is sti

[jira] [Created] (SPARK-11854) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-19 Thread sam (JIRA)
sam created SPARK-11854: --- Summary: The exit code of spark-submit is still 0 when an yarn application fails Key: SPARK-11854 URL: https://issues.apache.org/jira/browse/SPARK-11854 Project: Spark Issue

[jira] [Updated] (SPARK-11854) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11854: Target Version/s: (was: 1.1.1, 1.2.0) > The exit code of spark-submit is still 0 when an yarn application fails >

[jira] [Updated] (SPARK-11854) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11854: Affects Version/s: (was: 1.1.0) 1.5.1 > The exit code of spark-submit is still 0 when an

[jira] [Updated] (SPARK-11854) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11854: Fix Version/s: (was: 1.2.0) (was: 1.1.1) > The exit code of spark-submit is still 0 when

[jira] [Created] (SPARK-11853) java.lang.ClassNotFoundException for no reason

2015-11-19 Thread sam (JIRA)
sam created SPARK-11853: --- Summary: java.lang.ClassNotFoundException for no reason Key: SPARK-11853 URL: https://issues.apache.org/jira/browse/SPARK-11853 Project: Spark Issue Type: Bug Affects Vers

[jira] [Commented] (SPARK-3877) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-18 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011690#comment-15011690 ] sam commented on SPARK-3877: Is this really fixed?? I'm getting this on 1.5.0 using EMR. [~tg

[jira] [Commented] (SPARK-4492) Exception when following SimpleApp tutorial java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil

2015-07-30 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647446#comment-14647446 ] sam commented on SPARK-4492: I imagine building a fat jar for running with `java -cp` is possi

  1   2   >