Github user debasish83 commented on the pull request:
https://github.com/apache/spark/pull/3098#issuecomment-89729377
I meant MAP...what's the MAP on netflix dataset you have seen before and
with what lambda ? I am running MAP experiments with various factorization
formulations includ
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/5362#issuecomment-89729144
Writing to stdout/stderr defeats the point of a logging framework, no. I
think you could argue that some of these other messages aren't vital at log
level ("setting up", "
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/4537#issuecomment-89728917
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have t
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/3098#issuecomment-89728639
@debasish83 do you mean RMSE? it is well-defined but not very useful. MAP
is the useful metric. I think that only a rank-dependent metric makes sense.
---
If your project
Github user piaozhexiu commented on the pull request:
https://github.com/apache/spark/pull/5362#issuecomment-89728738
@srowen I'd like to turn down pretty much every INFO message from YARN
client except the AM url. (See below.) As can be seen, none of these is useful
for end users exc
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/5362#issuecomment-89727695
No, println isn't appropriate here. That removes control over the logging
entirely. Instead, what log messages do you find noisy? maybe they can be
turned *down* since thi
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5362#issuecomment-89724217
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pro
GitHub user piaozhexiu opened a pull request:
https://github.com/apache/spark/pull/5362
[SPARK-6712][YARN] Allow lower the log level in YARN client while keeping
AM tracking URL printed
In YARN mode, log messages are quite verbose in interactive shells
(spark-shell, spark-sql, pysp
Github user scwf commented on the pull request:
https://github.com/apache/spark/pull/5178#issuecomment-89716003
@maropu , yeah i think it is a common case for yarn mode. We often specify
more executors than nodemanager, that means there are more than one executor on
one machine.
---
Github user bien commented on the pull request:
https://github.com/apache/spark/pull/5351#issuecomment-89713636
The behavior I was seeing was that RandomTree training tasks were spending
~90% of their time doing GC, and when I turned on verbose GC I would see that
most of the time was
Github user debasish83 commented on the pull request:
https://github.com/apache/spark/pull/3098#issuecomment-89706247
@coderxiang @mengxr If I have a dataset with implicit (click or 0) then MAP
is not that well defined right since in label set everything is 1.0 and so
there is no orde
Github user nchammas commented on a diff in the pull request:
https://github.com/apache/spark/pull/5173#discussion_r27773756
--- Diff: python/pyspark/cloudpickle.py ---
@@ -40,164 +40,126 @@
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN
Github user debasish83 commented on the pull request:
https://github.com/apache/spark/pull/3098#issuecomment-89697236
@srowen For netflix dataset what's the MAP you have seen before...I started
experiments on Netflix dataset...lambda is 0.065 for netflix as well right ?
For MovieLens
Github user nchammas commented on the pull request:
https://github.com/apache/spark/pull/5173#issuecomment-89697205
> TODO: ec2/spark-ec2.py is not fully tested with python3.
I can help with this. Do we want to hold off other spark-ec2 PRs until this
one goes in? Do we have a
Github user nchammas commented on a diff in the pull request:
https://github.com/apache/spark/pull/5173#discussion_r27773735
--- Diff: python/pyspark/sql/functions.py ---
@@ -116,7 +114,7 @@ def __init__(self, func, returnType):
def _create_judf(self):
f
Github user zzcclp commented on the pull request:
https://github.com/apache/spark/pull/4537#issuecomment-89694634
@koeninger , I can't visit [this
url](https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28872/)
, it's 404. ??
---
If your project is set up for it, yo
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5361#issuecomment-89686461
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/5361#issuecomment-89671362
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5361#issuecomment-89661772
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pro
GitHub user 31z4 opened a pull request:
https://github.com/apache/spark/pull/5361
[SPARK-6661] Python type errors should print type, not object
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/31z4/spark spark-6661
Alternatively
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5298#issuecomment-89639987
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5270#issuecomment-89639900
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/5268
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enab
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/5268#issuecomment-89639203
Merging this in master. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5213#issuecomment-89633708
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4469#issuecomment-89633239
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user saucam commented on the pull request:
https://github.com/apache/spark/pull/5298#issuecomment-89632303
hmm i see. Would definitely go through these PRs. Anyways fixed the
whitespace problem here.
---
If your project is set up for it, you can reply to this email and have yo
Github user wbraik commented on the pull request:
https://github.com/apache/spark/pull/2077#issuecomment-89632207
Does anyone have a good example of an application which produces multiple
(different) jobs, that we could use to test this on ?
---
If your project is set up for it, you
Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/5298#issuecomment-89631527
Ah, I'm also considering similar optimizations for Spark 1.4 :)
The tricky part here is that, when scanning the Parquet table, Spark needs
to call `ParquetInput
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5268#issuecomment-89629603
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/5275#issuecomment-89623421
@zhzhan I'm right now designing partitioning support for the data sources
API, and will hopefully make the design doc next week. Will come back to this
PR after that. W
Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/5298#issuecomment-89624702
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
en
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5298#issuecomment-89624832
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user liancheng closed the pull request at:
https://github.com/apache/spark/pull/4107
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is e
Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/4107#issuecomment-89621672
Yeah agree. Closing this. Though the `callWithAlternatives` utility
function can be very neat to do simple lightweight reflection tricks.
---
If your project is set up
Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/4945#issuecomment-89621194
The thing that makes me hesitant here is whether we should stick to Hive,
because Hive's behavior is actually error prone and unintuitive. In Hive, `IN`
is implemented
Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/4851#issuecomment-89615036
No. With the metastore adapter layer, we can always keep our tests
consistent with the most recent Hive version.
---
If your project is set up for it, you can reply to
Github user saucam commented on the pull request:
https://github.com/apache/spark/pull/4469#issuecomment-89613886
Hi @marmbrus , this is a pretty common scenario in production, where the
data is generated in some directory and then later partitions are added to
tables using alter tabl
Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/5345#issuecomment-89611702
@adachij2002 Would you mind to add a test case for this in `CliSuite`? We
can pass `--database ` via `extraArgs` in `runCliWithin` there.
---
If your project is set up
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/5348#discussion_r27770121
--- Diff: docs/sql-programming-guide.md ---
@@ -1034,6 +1034,79 @@ df3.printSchema()
+### Hive metastore Parquet table conversion
+
Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/5349#issuecomment-89610598
We need a properly configured Hive environment to run the test. I can add a
simple `TestHive`-like class to do metastore / warehouse configurations though.
---
If your
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/5263
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enab
Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/5263#issuecomment-89608688
Thanks for working on this! Merging to master.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your p
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4764#issuecomment-89604947
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5268#issuecomment-89604923
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user saucam commented on the pull request:
https://github.com/apache/spark/pull/4764#issuecomment-89604768
fixed the test case of zero count when there is no data. rebased with
latest master. please retest
---
If your project is set up for it, you can reply to this email and h
Github user debasish83 commented on a diff in the pull request:
https://github.com/apache/spark/pull/3098#discussion_r27769592
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala ---
@@ -167,23 +169,66 @@ object MovieLensALS {
.setProduct
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/5353
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enab
Github user debasish83 commented on the pull request:
https://github.com/apache/spark/pull/5005#issuecomment-89594722
@mengxr any insight on it ? the runtime issue is only in first iteration
and I think you can point out if there is any obvious issue in the way I call
the solver...loo
Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/5353#issuecomment-89594648
LGTM, merging to master and branch-1.3.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5359#issuecomment-89591756
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user hqzizania closed the pull request at:
https://github.com/apache/spark/pull/5360
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is e
GitHub user hqzizania opened a pull request:
https://github.com/apache/spark/pull/5360
[SPARKR-92] Phase 2: implement sum(rdd)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hqzizania/spark R3
Alternatively you can review and a
GitHub user Lewuathe opened a pull request:
https://github.com/apache/spark/pull/5359
Implement missing methods for MultivariateStatisticalSummary
Add below methods in pyspark for MultivariateStatisticalSummary
- normL1
- normL2
You can merge this pull request into a Git rep
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/5334#discussion_r27768822
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala ---
@@ -226,7 +224,7 @@ private[sql] case class ParquetRelation2(
p
Github user dragos commented on the pull request:
https://github.com/apache/spark/pull/5144#issuecomment-89547152
I still want to run this on a local cluster before I say LGTM, but the code
looks good so far!
---
If your project is set up for it, you can reply to this email and have
Github user dragos commented on a diff in the pull request:
https://github.com/apache/spark/pull/5144#discussion_r27768214
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/DriverQueue.scala
---
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5358#issuecomment-89536841
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pro
Github user dreamquster closed the pull request at:
https://github.com/apache/spark/pull/5346
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user dreamquster commented on the pull request:
https://github.com/apache/spark/pull/5346#issuecomment-89536605
ok,I split it into two pull request.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project d
GitHub user dreamquster opened a pull request:
https://github.com/apache/spark/pull/5358
[SQL] SPARK-6489: Optimize lateral view with explode to not unnecessary
columns.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dreamquste
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4865#issuecomment-89527734
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5357#issuecomment-89527728
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pro
GitHub user dreamquster opened a pull request:
https://github.com/apache/spark/pull/5357
[SQL] SPARK-6548: Adding stddev to DataFrame functions
remerge SPARK-6548
https://github.com/apache/spark/pull/5228
You can merge this pull request into a Git repository by running:
$
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5301#issuecomment-89516868
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/5301#issuecomment-89516426
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabl
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5350#issuecomment-89514738
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4038#issuecomment-89514490
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
68 matches
Mail list logo