Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/388#discussion_r11558664
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -98,8 +98,12 @@ private[spark] class TaskSchedulerImpl(
var
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/266#issuecomment-40273037
I did not notice this earlier.
The toByteArray method is insanely expensive for anything nontrivial.
A better solution would be to replace use of
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/376#issuecomment-40273076
@mateiz I have to admit that I ignore the importance of providing the
`minSplits`. I encountered a problem just now. I have 20,000 files and call
`wholeTextFiles(dir)`
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/266
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/290#issuecomment-40273164
All automated tests passed.
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14072/
---
If your project
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/290#issuecomment-40273163
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/290#issuecomment-40273215
Thanks - merged this and picked it into 1.0. @andrewor14: get some sleep.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user techaddict commented on the pull request:
https://github.com/apache/spark/pull/388#issuecomment-40273462
@pwendell Done :+1: anything else ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/266#issuecomment-40273651
I think we can replace it with a custom impl - where we decide that it is
ok to waste some memory within some threshold in case the copy is much more
expensive -
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/290
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user haosdent closed the pull request at:
https://github.com/apache/spark/pull/346
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/397
Added a FastByteArrayOutputStream that exposes the underlying array to
avoid unnecessary mem copy.
This should fix the extra memory copy introduced by #266.
@mridulm @pwendell @mateiz
You can
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/397#issuecomment-40274836
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/397#issuecomment-40274840
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/397#issuecomment-40274869
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/397#issuecomment-40274870
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14073/
---
If your project is set up for it, you can
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/290#issuecomment-40274906
Thanks. You and @tdas too.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user baishuo-ailk commented on the pull request:
https://github.com/apache/spark/pull/390#issuecomment-40275213
thank you @pwendell
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user baishuo commented on the pull request:
https://github.com/apache/spark/pull/390#issuecomment-40275299
thank you @pwendell
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/266#issuecomment-40275465
Hold up a sec -- the array copy is not new. It was merely hidden in the
call to `trim()` before, or to `ByteBuffer.allocate()`. Yes, it's better to
avoid it if possible.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/397#discussion_r11559113
--- Diff:
core/src/main/scala/org/apache/spark/util/io/FastByteArrayOutputStream.scala ---
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/397#issuecomment-40276213
I actually meant something like this:
(This is from an internal WIP branch to tackle the ByteBuffer to
Seq[ByteBuffer])
Ideally I should submit this via a PR, but
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/397#issuecomment-40277866
So I think I agree with the overall direction here, but want to make a few
comments to clarify why. Apologies if I'm stating the obvious.
The management of the
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/397#issuecomment-40278958
Your summarization is fairly accurate @srowen. To add, my initial approach
was to subclass to minimize code :-)
The reason why I moved away from it was because I did
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/379#issuecomment-40279118
@andrewor14 ,@tdas, mind reviewing this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/397#issuecomment-40279192
You could deprecate and override `toByteArray` to throw an exception, etc.,
to be extra-safe. They work, the result just may not have much meaning
independently. Your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/60#issuecomment-40279518
Build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/60#issuecomment-40279524
Build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/60#issuecomment-40279828
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/60#issuecomment-40279833
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/60#issuecomment-40279862
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14075/
---
If your project is set up for it, you can
Github user iven commented on the pull request:
https://github.com/apache/spark/pull/60#issuecomment-40280398
I've finally get this working, and fixed several bugs in the original PR.
It's really hard to get Spark(0.9 and higher) on Mesos working. Here's some
note:
*
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/60#issuecomment-40280554
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/60#issuecomment-40280557
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/60#issuecomment-40280597
All automated tests passed.
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14074/
---
If your project
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/60#issuecomment-40280596
Build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/60#issuecomment-40281446
All automated tests passed.
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14076/
---
If your project
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/60#issuecomment-40281445
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/353#issuecomment-40281879
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/353#issuecomment-40281918
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/353#issuecomment-40281922
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user iven commented on the pull request:
https://github.com/apache/spark/pull/60#issuecomment-40282298
I've no idea why the test fails.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/353#issuecomment-40283049
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/353#issuecomment-40283050
All automated tests passed.
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14077/
---
If your project
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/397#issuecomment-40284529
There are two issues here:
a) If we are going to override and deprecate/throw exception for every
method which is not exposed by OutputStream - while overriding
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/322#issuecomment-40284736
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/322#issuecomment-40284773
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/322#issuecomment-40284781
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/30#issuecomment-40284983
@sryza this is failing due to a python syntax error. In general if you
wouldn't mind it would be good to run tests locally before pushing, since
spinning up the test
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/397#issuecomment-40285054
Sure, I myself was not suggesting that we should make them throw
exceptions. If one really wanted to prohibit their use, that would be a way to
do so even when
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/322#issuecomment-40285726
All automated tests passed.
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14078/
---
If your project
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/322#issuecomment-40285724
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user ahirreddy commented on a diff in the pull request:
https://github.com/apache/spark/pull/363#discussion_r11560720
--- Diff: docs/sql-programming-guide.md ---
@@ -318,4 +391,24 @@ Row[] results = hiveCtx.hql(FROM src SELECT key,
value).collect();
/div
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/30#issuecomment-40286775
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/30#issuecomment-40286771
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/30#issuecomment-40286819
My bad. I made a change after running tests and should have re-run them.
Posted a patch that fixes the syntax error.
---
If your project is set up for it, you can reply
Github user ahirreddy commented on a diff in the pull request:
https://github.com/apache/spark/pull/363#discussion_r11560776
--- Diff: python/pyspark/rdd.py ---
@@ -1387,6 +1387,95 @@ def _jrdd(self):
def _is_pipelinable(self):
return not (self.is_cached or
Github user ahirreddy commented on a diff in the pull request:
https://github.com/apache/spark/pull/363#discussion_r11560795
--- Diff: python/pyspark/rdd.py ---
@@ -1387,6 +1387,95 @@ def _jrdd(self):
def _is_pipelinable(self):
return not (self.is_cached or
Github user ahirreddy commented on a diff in the pull request:
https://github.com/apache/spark/pull/363#discussion_r11560881
--- Diff: python/pyspark/rdd.py ---
@@ -1387,6 +1387,95 @@ def _jrdd(self):
def _is_pipelinable(self):
return not (self.is_cached or
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/30#issuecomment-40287754
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/30#issuecomment-40287755
All automated tests passed.
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14079/
---
If your project
Github user aarondav commented on a diff in the pull request:
https://github.com/apache/spark/pull/397#discussion_r11561006
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/util/RawTextSender.scala ---
@@ -43,15 +45,15 @@ object RawTextSender extends Logging {
Github user aarondav commented on a diff in the pull request:
https://github.com/apache/spark/pull/397#discussion_r11561015
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -1001,9 +1003,9 @@ private[spark] class BlockManager(
blockId:
Github user aarondav commented on the pull request:
https://github.com/apache/spark/pull/397#issuecomment-40288679
Having a toByteBuffer method definitely seems reasonable to me, the only
issue is that ByteBuffer does not provide a good stream-compatible API. So it
would either still
Github user ahirreddy commented on a diff in the pull request:
https://github.com/apache/spark/pull/363#discussion_r11561160
--- Diff: python/pyspark/rdd.py ---
@@ -1387,6 +1387,95 @@ def _jrdd(self):
def _is_pipelinable(self):
return not (self.is_cached or
Github user ahirreddy commented on a diff in the pull request:
https://github.com/apache/spark/pull/363#discussion_r11561162
--- Diff: python/pyspark/context.py ---
@@ -460,6 +463,225 @@ def sparkUser(self):
return self._jsc.sc().sparkUser()
Github user ahirreddy commented on a diff in the pull request:
https://github.com/apache/spark/pull/363#discussion_r11561164
--- Diff: python/run-tests ---
@@ -56,6 +56,9 @@ run_test pyspark/mllib/clustering.py
run_test pyspark/mllib/recommendation.py
run_test
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/363#issuecomment-40289144
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/363#issuecomment-40289149
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/397#issuecomment-40289835
@aarondav I personally like your second method. That alone is probably just
what is needed. Callers who actually want a `ByteBuffer` can wrap easily with
this info. In
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/363#issuecomment-40289932
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/363#issuecomment-40289938
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/363#issuecomment-40291396
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/363#issuecomment-40291397
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14080/
---
If your project is set up for it, you can
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/363#issuecomment-40292065
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14081/
---
If your project is set up for it, you can
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/363#issuecomment-40292064
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/215#issuecomment-40294619
@velvia mind closing this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/381
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user velvia closed the pull request at:
https://github.com/apache/spark/pull/215
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user velvia commented on the pull request:
https://github.com/apache/spark/pull/215#issuecomment-40295287
Ok.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/397#issuecomment-40295319
Ok pushed a new version that avoids the extra trim.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/397#issuecomment-40295427
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/397#issuecomment-40295467
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/397#issuecomment-40295468
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14082/
---
If your project is set up for it, you can
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-40295844
We should wait until questions like these are answered before we move to
scala-logging.
https://github.com/typesafehub/scala-logging/issues/4
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/397#issuecomment-40296425
@rxin hm looks like this RAT exclude isn't working. Can take another crack
at it later tonight.
https://github.com/apache/spark/blob/master/.rat-excludes#L43
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/394#issuecomment-40296579
Jenkins, test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/394#issuecomment-40296582
Seems like a good catch.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/394#issuecomment-40296590
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/392#issuecomment-40296603
Is there any way to do this test in Python instead of in bash? It looks
like complicated and potentially brittle bash code.
---
If your project is set up for it, you can
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/394#issuecomment-40296586
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/394#issuecomment-40296609
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/394#issuecomment-40296610
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14083/
---
If your project is set up for it, you can
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/391#issuecomment-40296943
Is 1.6 the oldest version it works with now, or could it also work with 1.5
or older?
---
If your project is set up for it, you can reply to this email and have your
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/376#issuecomment-40296996
While a minSplits for all New API Hadoop files would be useful, I think
that's too complicated to do in 1.0, so it would be fine to just add it for
wholeTextFiles now.
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/376#discussion_r11562788
--- Diff: core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala ---
@@ -24,10 +24,13 @@ import org.apache.hadoop.conf.{Configurable,
Configuration}
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/376#issuecomment-40297045
BTW the current approach looks good, we should just merge this for now and
maybe open a JIRA for the other types of files.
---
If your project is set up for it, you can
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/389#issuecomment-40297101
Good catch
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/389#issuecomment-40297100
Jenkins, test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/389#issuecomment-40297146
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
1 - 100 of 133 matches
Mail list logo