Github user darabos commented on the issue:
https://github.com/apache/spark/pull/22673
> @darabos . Thank you for trying to make a contribution. However, we had
better discuss on that JIRA first before making a PR. Especially, for
SPARK-20144 which is discussed already, it d
Github user darabos commented on a diff in the pull request:
https://github.com/apache/spark/pull/22673#discussion_r223412612
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -1368,6 +1368,17 @@ class CSVSuite extends
GitHub user darabos opened a pull request:
https://github.com/apache/spark/pull/22673
[SPARK-20144] Allow reading files in order with
spark.sql.files.allowReordering=false
## What changes were proposed in this pull request?
I'm adding `spark.sql.files.allowReord
Github user darabos commented on the issue:
https://github.com/apache/spark/pull/16678
(I used `make-distribution.sh` because when I built with `build/mvn
-DskipTests clean package` I could not reproduce the issue. I think `-Phive` is
probably the culprit, but I have not experimented
Github user darabos commented on the issue:
https://github.com/apache/spark/pull/16678
Thanks for the quick pull request!
> @darabos Could you make a manual test and see whether this changes can
resolve your issue?
Unfortunately it does not. I built the code w
Github user darabos commented on the issue:
https://github.com/apache/spark/pull/14975
> +1. There are other occurrences of "FetchSize" or "fetchSize" in the
code, though none will matter for users (i.e. test names). But feel free to fix
them.
Done.
Github user darabos commented on the issue:
https://github.com/apache/spark/pull/14975
> @darabos do you want to close this out or should I do the update?
Sorry! I'll try to do it tonight. If I don't report back, consider me eaten
by a monster.
---
If your proj
Github user darabos commented on the issue:
https://github.com/apache/spark/pull/14975
> +1. There are other occurrences of "FetchSize" or "fetchSize" in the
code, though none will matter for users (i.e. test names). But feel free to fix
them.
Good
GitHub user darabos opened a pull request:
https://github.com/apache/spark/pull/14975
Correct fetchsize property name in docs
## What changes were proposed in this pull request?
Replace `fetchSize` with `fetchsize` in the docs.
## How was this patch tested
Github user darabos commented on the issue:
https://github.com/apache/spark/pull/13618
Thanks Sean! 0.66 would probably work well. But I think @gaborfeher tested
only with 0.6, and this value seemed to be the conclusion on JIRA
([comment](https://issues.apache.org/jira/browse/SPARK
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/9276#issuecomment-191230900
Is `spark.sql.adaptive.enabled` documented somewhere? It's not in
http://spark.apache.org/docs/1.6.0/configuration.html.
---
If your project is set up for it, yo
Github user darabos closed the pull request at:
https://github.com/apache/spark/pull/9355
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/9355#issuecomment-153429008
I've done an artificial test with Spark 1.5.1 and got the `#
-XX:OnOutOfMemoryError="kill %p"` message on stderr. Maybe I just missed this
origina
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/9355#issuecomment-153413263
Sorry, I kept putting off experimenting with this, but I'll do it now. I'm
pretty sure I checked both stdout and stderr from the executor, but not 100%.
-
Github user darabos commented on a diff in the pull request:
https://github.com/apache/spark/pull/9355#discussion_r43393025
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
@@ -238,7 +238,7 @@ object YarnSparkHadoopUtil {
if
GitHub user darabos opened a pull request:
https://github.com/apache/spark/pull/9355
[SPARK-11403] Log something when killing executors due to OOME
Without anything printed it's very hard to figure out why the executor
disappeared.
https://issues.apache.org/jira/b
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/7285#issuecomment-120898722
> Ah, I think this may have to be a check higher up, on the argument to
`repartition`? this looks too low level. An RDD with 0 partitions is OK, just
not repartition
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/7288#issuecomment-119695715
> @darabos it would be good to file a trivial JIRA, and explain briefly how
the output differs before and after this change. In general we should try to
track
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/7285#issuecomment-119624289
`org.apache.spark.rdd.PairRDDFunctionsSuite` and
`org.apache.spark.JavaAPISuite` trigger the checks. I'll try to do something.
---
If your project is set up f
Github user darabos commented on a diff in the pull request:
https://github.com/apache/spark/pull/7285#discussion_r34146057
--- Diff: core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala ---
@@ -78,6 +78,10 @@ private[spark] class CoalescedRDD[T: ClassTag
GitHub user darabos opened a pull request:
https://github.com/apache/spark/pull/7288
Correctly print hostname in error
With "+" the strings are separate expressions, and format() is called on
the last string before concatenation. (So substitution does not happen.)
Witho
GitHub user darabos opened a pull request:
https://github.com/apache/spark/pull/7285
[SPARK-8893] Add runtime checks against non-positive number of partitions
https://issues.apache.org/jira/browse/SPARK-8893
> What does `sc.parallelize(1 to 3).repartition(p).collect` ret
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/3346#issuecomment-114877984
Thanks for the note, @andrewor14. @Forevian is not working with Spark
lately, but I'm happy to take over this change from him. From a superficial
look at the co
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/6621#issuecomment-108632096
Yes, it's not the intuitive definition for me either. But it's in
http://spark.apache.org/docs/latest/configuration.html:
> Number of individual
GitHub user darabos opened a pull request:
https://github.com/apache/spark/pull/6621
Fix maxTaskFailures comment
If maxTaskFailures is 1, the task set is aborted after 1 task failure.
Other documentation and the code supports this reading, I think it's just this
comment tha
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/4533#issuecomment-73910087
Oh, thanks. I never looked into how `allowLocal` works.
Looks like it results in local execution if the number of affected
partitions is 1
(https://github.com
GitHub user darabos opened a pull request:
https://github.com/apache/spark/pull/4533
Remove outdated remark about take(n).
Looking at the code, I believe this remark about `take(n)` computing
partitions on the driver is no longer correct. Apologies if I'm wrong.
This
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/4007#issuecomment-69580933
Thanks for the quick fix!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user darabos opened a pull request:
https://github.com/apache/spark/pull/3678
Do not include SPARK_CLASSPATH if empty
My guess for fixing https://issues.apache.org/jira/browse/SPARK-4831.
You can merge this pull request into a Git repository by running:
$ git pull https
GitHub user darabos opened a pull request:
https://github.com/apache/spark/pull/3432
Fix comment
This file is for Hive 0.13.1 I think.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/darabos/spark patch-2
Alternatively you can
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/1345#issuecomment-55912866
@Forevian, can you please update it to merge cleanly? Then hunt down a
reviewer! It would be great to have this in 1.2. It would make our code
significantly more
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/2081#issuecomment-54044479
I've tested this now with `ec2/spark-ec2 -s 1 --instance-type m3.2xlarge
--region=us-east-1 launch` and the machines have mounted the SSDs. Thanks!
---
If your pr
Github user darabos commented on a diff in the pull request:
https://github.com/apache/spark/pull/2081#discussion_r16948878
--- Diff: ec2/spark_ec2.py ---
@@ -342,6 +343,15 @@ def launch_cluster(conn, opts, cluster_name):
device.delete_on_termination = True
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/2081#issuecomment-53856243
Wow, you're right, I hadn't read this line before.
> When you launch an M3 instance, we ignore any instance store volumes
specified in the block d
GitHub user darabos opened a pull request:
https://github.com/apache/spark/pull/2081
Add SSDs to block device mapping
On `m3.2xlarge` instances the 2x80GB SSDs are inaccessible if not added to
the block device mapping when the instance is created. They work when added
with this
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/1345#issuecomment-52090746
@Forevian is on vacation from tomorrow to next Tuesday. But if you have any
questions I can try to answer until then. @pwendell, are you interested in this?
---
If your
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/181#issuecomment-48796852
I think Jenkins means to say it's all good.
~~~
Attempting to post to Github:
{"body": "QA results for PR 181:
- This pat
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/1329#issuecomment-48373710
> LGTM. Regarding the initial problem you observed, did you see the actual
exception via the DAGScheduler's OneForOneStrategy failure? Or were there no
log
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/1329#issuecomment-48368953
Thanks! I've added the suggested changes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pr
Github user darabos commented on a diff in the pull request:
https://github.com/apache/spark/pull/1329#discussion_r14665377
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -768,6 +768,10 @@ class DAGScheduler(
abortStage(stage
GitHub user darabos opened a pull request:
https://github.com/apache/spark/pull/1329
[SPARK-2403] Catch all errors during serialization in DAGScheduler
https://issues.apache.org/jira/browse/SPARK-2403
Spark hangs for us whenever we forget to register a class with Kryo. This
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/181#issuecomment-47647756
I was so slow, Bogdan has already fixed this in #821. Anyway, here's the
belated test. It's probably still useful to avoid regressions. I tested the
test by
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/181#issuecomment-47074387
Sorry for leaving this hanging. I'll take a look at the test.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitH
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/981#issuecomment-46303967
Thanks Patrick!
@ankurdave: Do you want to add this to the storage UI? I can probably do it
too if you're busy.
---
If your project is set up for it, you can
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/981#issuecomment-46019029
It's a failure in `pyspark/sql.py`, but I can't reproduce it locally either
in my branch or in upstream master. How did Jenkins do it?
File "
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/981#issuecomment-46007190
Thanks for the feedback! I've added JSON (de)serialization code for the new
field. Patched in your change (thanks!). And added one more line to the top of
the stack
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/981#issuecomment-45817646
Thanks for fixing SPARK-2070! This works without the excludes now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/981#issuecomment-45448060
Thanks Patrick! Binary compatibility is quite a mystery to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/981#issuecomment-45409069
Okay, now it is a binary incompatibility:
[error] * method getCallSite()java.lang.String in class
org.apache.spark.SparkContext has now a different result
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/981#issuecomment-45390572
I have some fixing to do.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/981#issuecomment-45383511
I forgot `scalastyle`, sorry :(. Try again?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/981#issuecomment-45342809
@ankurdave: Cool! For some reason I didn't wire up RDDs, only stages. Your
change should complement this nicely.
@rxin: I went with "details" i
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/981#issuecomment-45321679
Wow, 40k stages? :open_mouth: To estimate the memory use, I guess a large
stack trace could be ~10 kB, so it would be 400 MB total. Would that be
noticeable compared to
GitHub user darabos opened a pull request:
https://github.com/apache/spark/pull/981
SPARK-2035: Store call stack for stages, display it on the UI.
I'm not sure about the test -- I get a lot of unrelated failures for some
reason. I'll try to sort it out. But hopefully the
Github user darabos commented on a diff in the pull request:
https://github.com/apache/spark/pull/276#discussion_r11199524
--- Diff:
graphx/src/main/scala/org/apache/spark/graphx/impl/EdgePartition.scala ---
@@ -84,19 +87,13 @@ class EdgePartition[@specialized(Char, Int, Boolean
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/276#issuecomment-39199208
Thanks for the comments! The description of the GC effects was very
educational. I made the suggested changes. Let me know if you'd like to see
something else ch
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/276#issuecomment-39077046
Sorry, the new JIRA link is
https://issues.apache.org/jira/browse/SPARK-1188. Thanks!
---
If your project is set up for it, you can reply to this email and have your
GitHub user darabos opened a pull request:
https://github.com/apache/spark/pull/276
Do not re-use objects in the EdgePartition/EdgeTriplet iterators.
This avoids a silent data corruption issue
(https://spark-project.atlassian.net/browse/SPARK-1188) and has no performance
impact by
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/119#issuecomment-38927661
Hi! I'd like to use TestUtils from this pull request in
https://github.com/apache/spark/pull/181. If this pull request needs more time,
perhaps the TestUtils code
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/181#issuecomment-38928010
I couldn't figure out how to get a separate jar built for use in this test.
(I'm new to Java/Scala build systems.) Anyway you say it would be brittle. I'll
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/181#issuecomment-38909539
Sorry, the delay is my fault. I was too busy to get around to the test so
far, but I still intend to do it. At least I've _read_ some Spark tests :).
---
If your pr
Github user darabos commented on the pull request:
https://github.com/apache/spark/pull/181#issuecomment-38114317
I'll fix the style error and look at writing a test. Thanks for the
pointers!
What do you think about using Thread.currentThread.getContextClassLoad
GitHub user darabos opened a pull request:
https://github.com/apache/spark/pull/181
Use the Executor's ClassLoader in sc.objectFile().
This makes it possible to read classes from the object file which were
specified in the user-provided jars. (By default ObjectInputStream
63 matches
Mail list logo