Github user nkronenfeld commented on the issue:
https://github.com/apache/spark/pull/19705
ok, now I question my own testing... does maven not run scalastyle tests?
Or did I not run the tests properly somehow? I just ran mvn test from root, and
it all seemed to work on my machine
Github user nkronenfeld commented on the issue:
https://github.com/apache/spark/pull/19705
@gatorsmile @srowen I think this is set now.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
GitHub user nkronenfeld opened a pull request:
https://github.com/apache/spark/pull/19705
[SPARK-22308][test-maven] Support alternative unit testing styles in
external applications
Continuation of PR#19528
(https://github.com/apache/spark/pull/19529#issuecomment-340252119
Github user nkronenfeld commented on the issue:
https://github.com/apache/spark/pull/19529
hm... I was always testing with sbt, because maven was so slow to do
anything.
Will do
---
-
To unsubscribe, e
Github user nkronenfeld commented on a diff in the pull request:
https://github.com/apache/spark/pull/19529#discussion_r147038647
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/test/SharedSQLContext.scala ---
@@ -17,86 +17,8 @@
package org.apache.spark.sql.test
Github user nkronenfeld commented on a diff in the pull request:
https://github.com/apache/spark/pull/19529#discussion_r147038487
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/PlanTest.scala
---
@@ -29,7 +31,14 @@ import
Github user nkronenfeld commented on a diff in the pull request:
https://github.com/apache/spark/pull/19529#discussion_r147038504
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/test/SharedSQLContext.scala ---
@@ -17,86 +17,8 @@
package org.apache.spark.sql.test
Github user nkronenfeld commented on a diff in the pull request:
https://github.com/apache/spark/pull/19529#discussion_r146887435
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/test/SharedSQLContext.scala ---
@@ -17,86 +17,8 @@
package org.apache.spark.sql.test
Github user nkronenfeld commented on a diff in the pull request:
https://github.com/apache/spark/pull/19529#discussion_r146885975
--- Diff: core/src/test/scala/org/apache/spark/SharedSparkContext.scala ---
@@ -29,10 +29,25 @@ trait SharedSparkContext extends BeforeAndAfterAll
Github user nkronenfeld commented on a diff in the pull request:
https://github.com/apache/spark/pull/19529#discussion_r146885926
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala ---
@@ -52,17 +55,142 @@ import org.apache.spark.util
Github user nkronenfeld commented on the issue:
https://github.com/apache/spark/pull/19529
I assume the "unknown error code, -9" is:
[error] running
/home/jenkins/workspace/SparkPullRequestBuilder/build/sbt -Phadoop-2.6 -Pflume
-Phive-thriftserver -Pyarn -
Github user nkronenfeld commented on the issue:
https://github.com/apache/spark/pull/19529
Yeah, as predicted, that made PlanTest very easy to review, but didn't do
as well with SQLTestUtils. I suspect I reordered functions and what-not when I
was moving stuff around
Github user nkronenfeld commented on the issue:
https://github.com/apache/spark/pull/19529
@gatorsmile sounds good, giving that a try now... assuming tests pass,
I'll check it in and see if it's any better.
I've so far done this for PlanTest and SQLTestUtils
PlanTest I
Github user nkronenfeld commented on the issue:
https://github.com/apache/spark/pull/19529
@gatorsmile the code changes aren't huge - there's almost no new code here,
it's all just moving code around from one file to another in order to expose a
SharedSparkSession with no dependence
Github user nkronenfeld commented on a diff in the pull request:
https://github.com/apache/spark/pull/19529#discussion_r146312166
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/test/GenericFlatSpecSuite.scala ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache
Github user nkronenfeld commented on a diff in the pull request:
https://github.com/apache/spark/pull/19529#discussion_r146311684
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala ---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache
Github user nkronenfeld commented on a diff in the pull request:
https://github.com/apache/spark/pull/19529#discussion_r146311180
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala ---
@@ -52,249 +36,23 @@ import org.apache.spark.util
Github user nkronenfeld commented on a diff in the pull request:
https://github.com/apache/spark/pull/19529#discussion_r146308234
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/PlanTest.scala
---
@@ -18,158 +18,9 @@
package
Github user nkronenfeld commented on the issue:
https://github.com/apache/spark/pull/19529
Documentation removed as per @srowen 's request in the associated JIRA
issue [SPARK-22308]
---
-
To unsubscribe, e-mail
Github user nkronenfeld commented on the issue:
https://github.com/apache/spark/pull/19529
nope, using lazy val initialization won't work - at the very least,
UnsafeKryoSerializerSuite modifies conf before context construction
Github user nkronenfeld commented on the issue:
https://github.com/apache/spark/pull/19529
There is one small hack in the way this was done, which is documented - see
the comments and documentation on SharedSparkSession.initializeSession and
SharedSparkContext.initializeContext. I
Github user nkronenfeld commented on the issue:
https://github.com/apache/spark/pull/19529
I made my original changes here by using
git mv PlanTest.scala PlanTestBase.scala
git mv SQLTestUnit.scala SQLTestUnitBase.scala
git mv SharedSQLContext.scala
GitHub user nkronenfeld opened a pull request:
https://github.com/apache/spark/pull/19529
Support alternative unit testing styles in external applications
## What changes were proposed in this pull request?
Support unit tests of external code (i.e., applications that use spark
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/5565#issuecomment-96656969
I'll see, I guess I can always reopen.
If I can get some work towards this done, some deprecation with new
versions of the necessary functions, I'll resubmit
Github user nkronenfeld closed the pull request at:
https://github.com/apache/spark/pull/5565
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/5565#issuecomment-95221601
Could you leave it open a little bit longer? I want to see if I can shorten
the list of differences, and having it active here makes it a little easier to
see
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/5565#issuecomment-94472545
@srowen - the examples are pretty simple - it happens as soon as you are
passing an RDD to something, rather than passing something to an RDD. For
instance
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/5565#issuecomment-94471584
@pwendell - something like reduceByKey having the opposite argument order
of everything else in the code seems like an obvious mistake in the API
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/5565#issuecomment-94519103
@srowen - If I understand your question, you're asking why we can't take
our application, written for an RDD, and just put the whole thing inside a call
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/5565#issuecomment-94507802
@pwendell I get all that... does that mean we are stuck with those
deprecated versions forever then? Or is there some timeframe over which
deprecated functions
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/5565#issuecomment-94510832
@srowen are you suggesting putting the entire application inside the
foreachRDD call?
---
If your project is set up for it, you can reply to this email and have
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/5565#issuecomment-94554558
@srowen I'm sure, for what it's worth, that part works out - it's whether
the surrounding application structure can fit in that paradigm that is the
issue
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/5565#issuecomment-94330557
Thanks for taking a look at it.
I'll take a look and write up here exactly what API changes this makes -
while I don't think there are many, there aren't
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/5565#issuecomment-94332668
Ok, I think there are only 3 real API changes, though one of them appears
in multiple places.
* `RDD`
* `flatMap[U](f: T = TraversableOnce[U])` changed
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/5565#issuecomment-94351472
No, we do that at the moment.
But doing it that way results in a few rather ugly constructs in the
application code that can be rather painful, as soon
GitHub user nkronenfeld opened a pull request:
https://github.com/apache/spark/pull/5565
Common interfaces between RDD, DStream, and DataFrame
This PR is the beginning of an attempt to put a common interface between
the main distributed collection types in Spark.
I've
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/5140#issuecomment-92497626
@andrewor14 - is there anything else that needs to get done here?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/5140#issuecomment-85117276
I'm not sure how mesos and yarn clusters are started/stopped (nor do I have
such clusters on which to test), so I'm not sure how this will affect them. I
think
Github user nkronenfeld closed the pull request at:
https://github.com/apache/spark/pull/3699
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/3699#issuecomment-85112967
I'm redoing this in the latest code, remaking the PR from scratch, to
alleviate merge issues. I'll post the new PR here when it's made.
---
If your project is set
GitHub user nkronenfeld opened a pull request:
https://github.com/apache/spark/pull/5140
[Spark-4848] Stand-alone cluster: Allow differences between workers with
multiple instances
This refixes #3699 with the latest code.
This fixes SPARK-4848
I've changed the stand
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/3699#issuecomment-66998011
Jenkins, test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
GitHub user nkronenfeld opened a pull request:
https://github.com/apache/spark/pull/3699
[SPARK-4848] Stand-alone cluster: Allow differences between workers with
multiple instances
I've changed the stand-alone cluster run scripts to allow different workers
to have different
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/3699#issuecomment-66957137
I'm not sure how mesos and yarn clusters are started/stopped (nor do I have
such clusters on which to test), so I'm not sure how this will affect them. I
think
Github user nkronenfeld commented on a diff in the pull request:
https://github.com/apache/spark/pull/3570#discussion_r21572834
--- Diff: core/src/main/scala/org/apache/spark/Accumulators.scala ---
@@ -281,7 +282,9 @@ object AccumulatorParam {
private object Accumulators
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/3570#issuecomment-66380241
comment fixed. I'm trying to test the MiMa related changes to see if they
work, and having problems running mima on my machine. I'll probably just push
them
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/3570#issuecomment-66398560
I thought I'd done so, it looks like it lost my changes
I'll fix that asap
---
If your project is set up for it, you can reply to this email and have your
reply
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/3570#issuecomment-66399588
sorry, must have accidentally hit cancel instead of comment the first time.
Should be set now.
---
If your project is set up for it, you can reply to this email
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/3570#issuecomment-66238332
Any word on this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/3570#issuecomment-65883687
Should I back out the correction to the mima failure?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/3570#issuecomment-65883762
great... I think outside the mima issue, it should be all set, unless I can
figure out a way to unit test it. So far, my best methods of testing it
involve
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/3570#issuecomment-65883848
oh, a note for when you're reviewing - I didn't move the clear call, I just
added a second one; I saw no particular harm in leaving the old one there too,
just
GitHub user nkronenfeld opened a pull request:
https://github.com/apache/spark/pull/3570
Clear local copies of accumulators as soon as we're done with them
I suspect this PR is not really quite ready for merging, but this seemed
the best way of getting a few comments
GitHub user nkronenfeld opened a pull request:
https://github.com/apache/spark/pull/2637
Rectify gereneric parameter names between SparkContext and AccumulablePa...
AccumulableParam gave its generic parameters as 'R, T', whereas
SparkContext labeled them 'T, R'.
Trivial
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/1535#issuecomment-50836302
OK, @pwendel, I think it's set now. Let me know if there are merge
problems, I can resubmit on a clean branch if necessary.
---
If your project is set up
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/1535#issuecomment-50621396
Thanks, @pwendel. I can revert it back if you want - is that preferable to
the way it is now, with the option to include the memory info or not?
I'll start
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/1535#issuecomment-50487526
If I'm reading that correctly, that test failure is from an MLLib change
that's nothing to do with what I've done? Perhaps I'll just try it again,
maybe it's a bad
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/1535#issuecomment-50388139
I just parameterized the memory so one can display it or not as desired
(with not displaying it the default) - is that sufficient?
I forgot to put
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/1535#issuecomment-49870919
I'm not sure what to do about this test failure; all I've changed is
toDebugString, and this is in a spark streaming test which never calls that, so
I'm pretty sure
Github user nkronenfeld commented on a diff in the pull request:
https://github.com/apache/spark/pull/1535#discussion_r15308845
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1269,6 +1269,19 @@ abstract class RDD[T: ClassTag](
/** A description
GitHub user nkronenfeld opened a pull request:
https://github.com/apache/spark/pull/1532
Add caching information to RDD.toDebugString
I find it useful to see where in an RDD's DAG data is cached, so I figured
others might too.
I've added both the caching level
Github user nkronenfeld closed the pull request at:
https://github.com/apache/spark/pull/1532
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user nkronenfeld opened a pull request:
https://github.com/apache/spark/pull/1535
Add caching information to rdd.toDebugString
I find it useful to see where in an RDD's DAG data is cached, so I figured
others might too.
I've added both the caching level
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/1535#issuecomment-49783708
Done, and I also left a comment on Greg Owen's PR from yesterday asking him
for formatting comments
---
If your project is set up for it, you can reply
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/1535#issuecomment-49797427
Sorry, forgot to move one small formatting issue over from the old branch,
I'll check that in as soon as I test it.
---
If your project is set up for it, you can
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/1535#issuecomment-49825329
thanks mark, I had no idea that existed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/136#issuecomment-38506453
I think my initial sliding method PR addresses @pwendell 's concerns here -
but runs afoul of some other concerns raised by @rxin about code complexity
67 matches
Mail list logo