[GitHub] spark issue #22673: [SPARK-20144] Allow reading files in order with spark.sq...

2018-10-08 Thread darabos
Github user darabos commented on the issue: https://github.com/apache/spark/pull/22673 > @darabos . Thank you for trying to make a contribution. However, we had better discuss on that JIRA first before making a PR. Especially, for SPARK-20144 which is discussed already, it d

[GitHub] spark pull request #22673: [SPARK-20144] Allow reading files in order with s...

2018-10-08 Thread darabos
Github user darabos commented on a diff in the pull request: https://github.com/apache/spark/pull/22673#discussion_r223412612 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -1368,6 +1368,17 @@ class CSVSuite extends

[GitHub] spark pull request #22673: [SPARK-20144] Allow reading files in order with s...

2018-10-08 Thread darabos
GitHub user darabos opened a pull request: https://github.com/apache/spark/pull/22673 [SPARK-20144] Allow reading files in order with spark.sql.files.allowReordering=false ## What changes were proposed in this pull request? I'm adding `spark.sql.files.allowReord

[GitHub] spark issue #16678: [SPARK-19209] [WIP] JDBC: Fix "No suitable driver" on th...

2017-01-23 Thread darabos
Github user darabos commented on the issue: https://github.com/apache/spark/pull/16678 (I used `make-distribution.sh` because when I built with `build/mvn -DskipTests clean package` I could not reproduce the issue. I think `-Phive` is probably the culprit, but I have not experimented

[GitHub] spark issue #16678: [SPARK-19209] [WIP] JDBC: Fix "No suitable driver" on th...

2017-01-23 Thread darabos
Github user darabos commented on the issue: https://github.com/apache/spark/pull/16678 Thanks for the quick pull request! > @darabos Could you make a manual test and see whether this changes can resolve your issue? Unfortunately it does not. I built the code w

[GitHub] spark issue #14975: Correct fetchsize property name in docs

2016-09-16 Thread darabos
Github user darabos commented on the issue: https://github.com/apache/spark/pull/14975 > +1. There are other occurrences of "FetchSize" or "fetchSize" in the code, though none will matter for users (i.e. test names). But feel free to fix them. Done.

[GitHub] spark issue #14975: Correct fetchsize property name in docs

2016-09-16 Thread darabos
Github user darabos commented on the issue: https://github.com/apache/spark/pull/14975 > @darabos do you want to close this out or should I do the update? Sorry! I'll try to do it tonight. If I don't report back, consider me eaten by a monster. --- If your proj

[GitHub] spark issue #14975: Correct fetchsize property name in docs

2016-09-08 Thread darabos
Github user darabos commented on the issue: https://github.com/apache/spark/pull/14975 > +1. There are other occurrences of "FetchSize" or "fetchSize" in the code, though none will matter for users (i.e. test names). But feel free to fix them. Good

[GitHub] spark pull request #14975: Correct fetchsize property name in docs

2016-09-06 Thread darabos
GitHub user darabos opened a pull request: https://github.com/apache/spark/pull/14975 Correct fetchsize property name in docs ## What changes were proposed in this pull request? Replace `fetchSize` with `fetchsize` in the docs. ## How was this patch tested

[GitHub] spark issue #13618: [SPARK-15796] [CORE] Reduce spark.memory.fraction defaul...

2016-06-11 Thread darabos
Github user darabos commented on the issue: https://github.com/apache/spark/pull/13618 Thanks Sean! 0.66 would probably work well. But I think @gaborfeher tested only with 0.6, and this value seemed to be the conclusion on JIRA ([comment](https://issues.apache.org/jira/browse/SPARK

[GitHub] spark pull request: [SPARK-9858][SPARK-9859][SPARK-9861][SQL] Add ...

2016-03-02 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/9276#issuecomment-191230900 Is `spark.sql.adaptive.enabled` documented somewhere? It's not in http://spark.apache.org/docs/1.6.0/configuration.html. --- If your project is set up for it, yo

[GitHub] spark pull request: [SPARK-11403] Log something when killing execu...

2015-11-03 Thread darabos
Github user darabos closed the pull request at: https://github.com/apache/spark/pull/9355 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-11403] Log something when killing execu...

2015-11-03 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/9355#issuecomment-153429008 I've done an artificial test with Spark 1.5.1 and got the `# -XX:OnOutOfMemoryError="kill %p"` message on stderr. Maybe I just missed this origina

[GitHub] spark pull request: [SPARK-11403] Log something when killing execu...

2015-11-03 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/9355#issuecomment-153413263 Sorry, I kept putting off experimenting with this, but I'll do it now. I'm pretty sure I checked both stdout and stderr from the executor, but not 100%. -

[GitHub] spark pull request: [SPARK-11403] Log something when killing execu...

2015-10-29 Thread darabos
Github user darabos commented on a diff in the pull request: https://github.com/apache/spark/pull/9355#discussion_r43393025 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala --- @@ -238,7 +238,7 @@ object YarnSparkHadoopUtil { if

[GitHub] spark pull request: [SPARK-11403] Log something when killing execu...

2015-10-29 Thread darabos
GitHub user darabos opened a pull request: https://github.com/apache/spark/pull/9355 [SPARK-11403] Log something when killing executors due to OOME Without anything printed it's very hard to figure out why the executor disappeared. https://issues.apache.org/jira/b

[GitHub] spark pull request: [SPARK-8893] Add runtime checks against non-po...

2015-07-13 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/7285#issuecomment-120898722 > Ah, I think this may have to be a check higher up, on the argument to `repartition`? this looks too low level. An RDD with 0 partitions is OK, just not repartition

[GitHub] spark pull request: [SPARK-8902] Correctly print hostname in error

2015-07-08 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/7288#issuecomment-119695715 > @darabos it would be good to file a trivial JIRA, and explain briefly how the output differs before and after this change. In general we should try to track

[GitHub] spark pull request: [SPARK-8893] Add runtime checks against non-po...

2015-07-08 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/7285#issuecomment-119624289 `org.apache.spark.rdd.PairRDDFunctionsSuite` and `org.apache.spark.JavaAPISuite` trigger the checks. I'll try to do something. --- If your project is set up f

[GitHub] spark pull request: [SPARK-8893] Add runtime checks against non-po...

2015-07-08 Thread darabos
Github user darabos commented on a diff in the pull request: https://github.com/apache/spark/pull/7285#discussion_r34146057 --- Diff: core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala --- @@ -78,6 +78,10 @@ private[spark] class CoalescedRDD[T: ClassTag

[GitHub] spark pull request: Correctly print hostname in error

2015-07-08 Thread darabos
GitHub user darabos opened a pull request: https://github.com/apache/spark/pull/7288 Correctly print hostname in error With "+" the strings are separate expressions, and format() is called on the last string before concatenation. (So substitution does not happen.) Witho

[GitHub] spark pull request: [SPARK-8893] Add runtime checks against non-po...

2015-07-08 Thread darabos
GitHub user darabos opened a pull request: https://github.com/apache/spark/pull/7285 [SPARK-8893] Add runtime checks against non-positive number of partitions https://issues.apache.org/jira/browse/SPARK-8893 > What does `sc.parallelize(1 to 3).repartition(p).collect` ret

[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2015-06-24 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/3346#issuecomment-114877984 Thanks for the note, @andrewor14. @Forevian is not working with Spark lately, but I'm happy to take over this change from him. From a superficial look at the co

[GitHub] spark pull request: Fix maxTaskFailures comment

2015-06-03 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/6621#issuecomment-108632096 Yes, it's not the intuitive definition for me either. But it's in http://spark.apache.org/docs/latest/configuration.html: > Number of individual

[GitHub] spark pull request: Fix maxTaskFailures comment

2015-06-03 Thread darabos
GitHub user darabos opened a pull request: https://github.com/apache/spark/pull/6621 Fix maxTaskFailures comment If maxTaskFailures is 1, the task set is aborted after 1 task failure. Other documentation and the code supports this reading, I think it's just this comment tha

[GitHub] spark pull request: Remove outdated remark about take(n).

2015-02-11 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/4533#issuecomment-73910087 Oh, thanks. I never looked into how `allowLocal` works. Looks like it results in local execution if the number of affected partitions is 1 (https://github.com

[GitHub] spark pull request: Remove outdated remark about take(n).

2015-02-11 Thread darabos
GitHub user darabos opened a pull request: https://github.com/apache/spark/pull/4533 Remove outdated remark about take(n). Looking at the code, I believe this remark about `take(n)` computing partitions on the driver is no longer correct. Apologies if I'm wrong. This

[GitHub] spark pull request: [SPARK-5102][Core]subclass of MapStatus needs ...

2015-01-12 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/4007#issuecomment-69580933 Thanks for the quick fix! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Do not include SPARK_CLASSPATH if empty

2014-12-11 Thread darabos
GitHub user darabos opened a pull request: https://github.com/apache/spark/pull/3678 Do not include SPARK_CLASSPATH if empty My guess for fixing https://issues.apache.org/jira/browse/SPARK-4831. You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request: Fix comment

2014-11-24 Thread darabos
GitHub user darabos opened a pull request: https://github.com/apache/spark/pull/3432 Fix comment This file is for Hive 0.13.1 I think. You can merge this pull request into a Git repository by running: $ git pull https://github.com/darabos/spark patch-2 Alternatively you can

[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2014-09-17 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/1345#issuecomment-55912866 @Forevian, can you please update it to merge cleanly? Then hunt down a reviewer! It would be great to have this in 1.2. It would make our code significantly more

[GitHub] spark pull request: Add SSDs to block device mapping

2014-09-01 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/2081#issuecomment-54044479 I've tested this now with `ec2/spark-ec2 -s 1 --instance-type m3.2xlarge --region=us-east-1 launch` and the machines have mounted the SSDs. Thanks! --- If your pr

[GitHub] spark pull request: Add SSDs to block device mapping

2014-09-01 Thread darabos
Github user darabos commented on a diff in the pull request: https://github.com/apache/spark/pull/2081#discussion_r16948878 --- Diff: ec2/spark_ec2.py --- @@ -342,6 +343,15 @@ def launch_cluster(conn, opts, cluster_name): device.delete_on_termination = True

[GitHub] spark pull request: Add SSDs to block device mapping

2014-08-29 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/2081#issuecomment-53856243 Wow, you're right, I hadn't read this line before. > When you launch an M3 instance, we ignore any instance store volumes specified in the block d

[GitHub] spark pull request: Add SSDs to block device mapping

2014-08-21 Thread darabos
GitHub user darabos opened a pull request: https://github.com/apache/spark/pull/2081 Add SSDs to block device mapping On `m3.2xlarge` instances the 2x80GB SSDs are inaccessible if not added to the block device mapping when the instance is created. They work when added with this

[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2014-08-13 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/1345#issuecomment-52090746 @Forevian is on vacation from tomorrow to next Tuesday. But if you have any questions I can try to answer until then. @pwendell, are you interested in this? --- If your

[GitHub] spark pull request: Use the Executor's ClassLoader in sc.objectFil...

2014-07-11 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/181#issuecomment-48796852 I think Jenkins means to say it's all good. ~~~ Attempting to post to Github: {"body": "QA results for PR 181: - This pat

[GitHub] spark pull request: [SPARK-2403] Catch all errors during serializa...

2014-07-08 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/1329#issuecomment-48373710 > LGTM. Regarding the initial problem you observed, did you see the actual exception via the DAGScheduler's OneForOneStrategy failure? Or were there no log

[GitHub] spark pull request: [SPARK-2403] Catch all errors during serializa...

2014-07-08 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/1329#issuecomment-48368953 Thanks! I've added the suggested changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pr

[GitHub] spark pull request: [SPARK-2403] Catch all errors during serializa...

2014-07-08 Thread darabos
Github user darabos commented on a diff in the pull request: https://github.com/apache/spark/pull/1329#discussion_r14665377 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -768,6 +768,10 @@ class DAGScheduler( abortStage(stage

[GitHub] spark pull request: [SPARK-2403] Catch all errors during serializa...

2014-07-08 Thread darabos
GitHub user darabos opened a pull request: https://github.com/apache/spark/pull/1329 [SPARK-2403] Catch all errors during serialization in DAGScheduler https://issues.apache.org/jira/browse/SPARK-2403 Spark hangs for us whenever we forget to register a class with Kryo. This

[GitHub] spark pull request: Use the Executor's ClassLoader in sc.objectFil...

2014-07-01 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/181#issuecomment-47647756 I was so slow, Bogdan has already fixed this in #821. Anyway, here's the belated test. It's probably still useful to avoid regressions. I tested the test by

[GitHub] spark pull request: Use the Executor's ClassLoader in sc.objectFil...

2014-06-25 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/181#issuecomment-47074387 Sorry for leaving this hanging. I'll take a look at the test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark pull request: SPARK-2035: Store call stack for stages, displ...

2014-06-17 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/981#issuecomment-46303967 Thanks Patrick! @ankurdave: Do you want to add this to the storage UI? I can probably do it too if you're busy. --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-2035: Store call stack for stages, displ...

2014-06-13 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/981#issuecomment-46019029 It's a failure in `pyspark/sql.py`, but I can't reproduce it locally either in my branch or in upstream master. How did Jenkins do it? File "

[GitHub] spark pull request: SPARK-2035: Store call stack for stages, displ...

2014-06-13 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/981#issuecomment-46007190 Thanks for the feedback! I've added JSON (de)serialization code for the new field. Patched in your change (thanks!). And added one more line to the top of the stack

[GitHub] spark pull request: SPARK-2035: Store call stack for stages, displ...

2014-06-11 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/981#issuecomment-45817646 Thanks for fixing SPARK-2070! This works without the excludes now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-2035: Store call stack for stages, displ...

2014-06-08 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/981#issuecomment-45448060 Thanks Patrick! Binary compatibility is quite a mystery to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: SPARK-2035: Store call stack for stages, displ...

2014-06-07 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/981#issuecomment-45409069 Okay, now it is a binary incompatibility: [error] * method getCallSite()java.lang.String in class org.apache.spark.SparkContext has now a different result

[GitHub] spark pull request: SPARK-2035: Store call stack for stages, displ...

2014-06-06 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/981#issuecomment-45390572 I have some fixing to do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2035: Store call stack for stages, displ...

2014-06-06 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/981#issuecomment-45383511 I forgot `scalastyle`, sorry :(. Try again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-2035: Store call stack for stages, displ...

2014-06-06 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/981#issuecomment-45342809 @ankurdave: Cool! For some reason I didn't wire up RDDs, only stages. Your change should complement this nicely. @rxin: I went with "details" i

[GitHub] spark pull request: SPARK-2035: Store call stack for stages, displ...

2014-06-06 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/981#issuecomment-45321679 Wow, 40k stages? :open_mouth: To estimate the memory use, I guess a large stack trace could be ~10 kB, so it would be 400 MB total. Would that be noticeable compared to

[GitHub] spark pull request: SPARK-2035: Store call stack for stages, displ...

2014-06-05 Thread darabos
GitHub user darabos opened a pull request: https://github.com/apache/spark/pull/981 SPARK-2035: Store call stack for stages, display it on the UI. I'm not sure about the test -- I get a lot of unrelated failures for some reason. I'll try to sort it out. But hopefully the

[GitHub] spark pull request: Do not re-use objects in the EdgePartition/Edg...

2014-04-02 Thread darabos
Github user darabos commented on a diff in the pull request: https://github.com/apache/spark/pull/276#discussion_r11199524 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/impl/EdgePartition.scala --- @@ -84,19 +87,13 @@ class EdgePartition[@specialized(Char, Int, Boolean

[GitHub] spark pull request: Do not re-use objects in the EdgePartition/Edg...

2014-04-01 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/276#issuecomment-39199208 Thanks for the comments! The description of the GC effects was very educational. I made the suggested changes. Let me know if you'd like to see something else ch

[GitHub] spark pull request: Do not re-use objects in the EdgePartition/Edg...

2014-03-31 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/276#issuecomment-39077046 Sorry, the new JIRA link is https://issues.apache.org/jira/browse/SPARK-1188. Thanks! --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: Do not re-use objects in the EdgePartition/Edg...

2014-03-31 Thread darabos
GitHub user darabos opened a pull request: https://github.com/apache/spark/pull/276 Do not re-use objects in the EdgePartition/EdgeTriplet iterators. This avoids a silent data corruption issue (https://spark-project.atlassian.net/browse/SPARK-1188) and has no performance impact by

[GitHub] spark pull request: SPARK-1230: [WIP] Enable SparkContext.addJars(...

2014-03-28 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/119#issuecomment-38927661 Hi! I'd like to use TestUtils from this pull request in https://github.com/apache/spark/pull/181. If this pull request needs more time, perhaps the TestUtils code

[GitHub] spark pull request: Use the Executor's ClassLoader in sc.objectFil...

2014-03-28 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/181#issuecomment-38928010 I couldn't figure out how to get a separate jar built for use in this test. (I'm new to Java/Scala build systems.) Anyway you say it would be brittle. I'll

[GitHub] spark pull request: Use the Executor's ClassLoader in sc.objectFil...

2014-03-28 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/181#issuecomment-38909539 Sorry, the delay is my fault. I was too busy to get around to the test so far, but I still intend to do it. At least I've _read_ some Spark tests :). --- If your pr

[GitHub] spark pull request: Use the Executor's ClassLoader in sc.objectFil...

2014-03-19 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/181#issuecomment-38114317 I'll fix the style error and look at writing a test. Thanks for the pointers! What do you think about using Thread.currentThread.getContextClassLoad

[GitHub] spark pull request: Use the Executor's ClassLoader in sc.objectFil...

2014-03-19 Thread darabos
GitHub user darabos opened a pull request: https://github.com/apache/spark/pull/181 Use the Executor's ClassLoader in sc.objectFile(). This makes it possible to read classes from the object file which were specified in the user-provided jars. (By default ObjectInputStream