Github user rxin commented on the issue:
https://github.com/apache/spark/pull/15538
I don't think we are going to upgrade Parquet for branch-2.1, since it's
way past that point. Let's merge this for 2.1.
Merging in master/branch-2.1.
---
If your project is set up for it, y
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
@ericl Does this LGTY?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes s
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15538
I like this. It centralizes and yet elaborates the handling of the issue,
does seem to be the right mechanism to use, and I presume, does work reliably.
---
If your project is set up for it, you can
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68424/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #68424 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68424/consoleFull)**
for PR 15538 at commit
[`247ef91`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #68424 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68424/consoleFull)**
for PR 15538 at commit
[`247ef91`](https://github.com/apache/spark/commit/2
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
Pushed a new version which I think is cleaner than before. I tested all 8
scenarios manually.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
I'll work on a revision and try to push something today.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15538
If `ParquetFileFormat` had a static init block somehow, we'd be done right?
because the logging config in that static initializer would have to execute,
once, during classloading and therefore before
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
Sorry, I don't see how Java code will make this patch any better. We're not
really missing a static initializer. We just need _some_ initializer to run
early enough.
If we can tolerate putt
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15538
OK, maybe. It is not just about being hacky but being likely to break. If
this really calls for a static initializer, how about a Java helper class to do
the work? How about piggy backing on the Logg
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
> Yeah I see, but this is getting to be quite hacky **just to turn off log
messages**
This isn't just a few annoying log messages. This is an _avalanche_ of log
messages, each of which cont
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68289/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #68289 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68289/consoleFull)**
for PR 15538 at commit
[`34e3997`](https://github.com/apache/spark/commit/
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
@rxin I'm not working on the Parquet upgrade this week. I think we'll have
to punt on it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as w
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15538
Yeah I see, but this is getting to be quite hacky just to turn off log
messages, trying to make sure static init is triggered from a custom
deserializer. It feels like this should be punted until Par
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/15538
@mallman are you also working on Parquet upgrade? If we don't get it in in
the next day or two, we shouldn't merge that into branch-2.1 anymore since it
is way past feature freeze.
---
If your proje
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
I've pushed a rebase. I re-tested this PR using the methodology I describe
in the description for both local and remote executors.
---
If your project is set up for it, you can reply to this email
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
> If this is necessary, then isn't it simpler to leave the log
configuration call where it was, so that it doesn't depend on the constructor?
that wasn't the actual problem was it, just the logging?
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #68289 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68289/consoleFull)**
for PR 15538 at commit
[`34e3997`](https://github.com/apache/spark/commit/3
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
Hi Guys,
Unfortunately debugging our Spark job sucked up all my Spark time last
week, and I still have more to do on it this week. Because of that, it doesn't
look like I'll have time to wo
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
@rxin One of our weekly Spark jobs is choking, and fixing it could take the
rest of the week. I suggest someone else take the lead on the Parquet 1.9
upgrade if they can devote their time to it.
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
Yes, I am working on it. I'm planning to have a PR in no later than EOW.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your proje
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/15538
@mallman are you working on Parquet 1.9 upgrade? Would be great to get that
in sooner than later because in the past Parquet upgrades tend to bring a lot
of issues, so it'd be good to go through more Q
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15538
Ah right. Let me redirect 18140 to 13127 because all of these end up being
resolved by "upgrade to 1.9"
---
If your project is set up for it, you can reply to this email and have your
reply appear o
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
I found two such tickets. How should we organize this in Jira?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does no
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15538
Yeah there's a ticket for it to solve a different parquet bug. That might
be the path of least resistance.
---
If your project is set up for it, you can reply to this email and have your
reply appea
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
In fact, if no one else is working on the Parquet upgrade it probably makes
more sense for me to contribute that then continue working on this PR. I'll
check with the dev mailing list.
---
If your
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
Are we planning to incorporate the Parquet 1.9 libraries into Spark 2.1? If
so, then this PR should be unnecessary.
Hopefully.
---
If your project is set up for it, you can reply to this e
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15538
If this is necessary, then isn't it simpler to leave the log configuration
call where it was, so that it doesn't depend on the constructor? that wasn't
the actual problem was it, just the logging? if
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67795/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #67795 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67795/consoleFull)**
for PR 15538 at commit
[`7f858c6`](https://github.com/apache/spark/commit/
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67794/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #67794 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67794/consoleFull)**
for PR 15538 at commit
[`e544397`](https://github.com/apache/spark/commit/
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
Getting this redirection to work on remote executors was quite involved.
The additional complexities derive from the following:
1. Java doesn't call the default (or any) constructor in deser
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #67795 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67795/consoleFull)**
for PR 15538 at commit
[`7f858c6`](https://github.com/apache/spark/commit/7
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #67794 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67794/consoleFull)**
for PR 15538 at commit
[`e544397`](https://github.com/apache/spark/commit/e
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15538
OK, well whatever works here, even if somehow the previous version is
what's needed to get it to work.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
Actually, this may not be working for remote executors. I tested this patch
running in local mode, but in running a version of this with actual remote
executors I'm seeing the original parquet log o
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
@srowen I ran my manual tests for this build and they worked as expected.
Can you merge this PR?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitH
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #67707 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67707/consoleFull)**
for PR 15538 at commit
[`1fc3c93`](https://github.com/apache/spark/commit/
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67707/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #3377 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3377/consoleFull)**
for PR 15538 at commit
[`1fc3c93`](https://github.com/apache/spark/commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #67707 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67707/consoleFull)**
for PR 15538 at commit
[`1fc3c93`](https://github.com/apache/spark/commit/1
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #3377 has
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3377/consoleFull)**
for PR 15538 at commit
[`1fc3c93`](https://github.com/apache/spark/commit/
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15538
Jenkins add to whitelist
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user ericl commented on the issue:
https://github.com/apache/spark/pull/15538
Jenkins retest this please
On Thu, Oct 27, 2016, 12:34 PM Michael Allman
wrote:
Looks like the test failed for reasons unrelated to this PR. Can someone
trigger a retest, plea
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
Looks like the test failed for reasons unrelated to this PR. Can someone
trigger a retest, please?
---
If your project is set up for it, you can reply to this email and have your
reply appear on Gi
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67658/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #67658 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67658/consoleFull)**
for PR 15538 at commit
[`1fc3c93`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #67658 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67658/consoleFull)**
for PR 15538 at commit
[`1fc3c93`](https://github.com/apache/spark/commit/1
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
I'm still seeing the torrent of `CorruptStatistics` errors in the Jenkins
build log, even though I don't see them running the tests locally with sbt.
Maybe it's a maven versus sbt build issue.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67552/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #67552 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67552/consoleFull)**
for PR 15538 at commit
[`02df8c2`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #67552 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67552/consoleFull)**
for PR 15538 at commit
[`02df8c2`](https://github.com/apache/spark/commit/0
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
So raising the log threshold looks like it didn't do anything for Jenkins,
but when I run the tests locally it does just the trick. \*sigh\*
Anyway, might as well push a rebase and see what
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
I like this test failure:
```
org.apache.spark.sql.sources.CreateTableAsSelectSuite.(It is not a test)
```
Anyway, I don't think this is related to this PR.
---
If your pro
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67542/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #67542 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67542/consoleFull)**
for PR 15538 at commit
[`dec65c7`](https://github.com/apache/spark/commit/
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
The "CorruptStatistics" stack traces (which I agree are really annoying)
are being logged because parquet logs them at the WARN level, and Spark's
default logging threshold when running tests is WAR
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #67542 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67542/consoleFull)**
for PR 15538 at commit
[`dec65c7`](https://github.com/apache/spark/commit/d
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
I'll take a closer look at that.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled an
Github user ericl commented on the issue:
https://github.com/apache/spark/pull/15538
I see ~10k lines of "CorruptStatistics" stack traces in the jenkins log.
Though, it also seems to show up in the test logs for this pr, so maybe it
wouldn't affect this.
---
If your project is set u
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
@ericl What do you mean it's polluting test output?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user ericl commented on the issue:
https://github.com/apache/spark/pull/15538
also LGTM as-is; it's polluting test output which is kind of annoying.
Testing the log output might be a little overkill since this is not critical
functionality and has an upstream fix.
---
If your
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
I've spent a couple hours today working on a unit test which captures
stdout during a parquet write operation to validate that it has no parquet
logging output. I haven't got it working yet, but I'l
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67279/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #67279 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67279/consoleFull)**
for PR 15538 at commit
[`099afca`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #67279 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67279/consoleFull)**
for PR 15538 at commit
[`099afca`](https://github.com/apache/spark/commit/0
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
I pushed a commit to improve the documentation. I also removed a couple of
unused imports (boy scout rule).
---
If your project is set up for it, you can reply to this email and have your
reply app
Github user mallman commented on the issue:
https://github.com/apache/spark/pull/15538
I could use some advice on writing a unit test for this. Do you guys know
if there is a precedent in the codebase that covers a situation like this? I'd
like to reuse existing code if possible.
--
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67150/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15538
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #67150 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67150/consoleFull)**
for PR 15538 at commit
[`6101b83`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15538
**[Test build #67150 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67150/consoleFull)**
for PR 15538 at commit
[`6101b83`](https://github.com/apache/spark/commit/6
84 matches
Mail list logo