Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Thank you, @gatorsmile !!!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled an
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18640
Thanks! Merging to master.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and w
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Thank you so much, @rxin , @cloud-fan , @sameeragarwal , @mridulm , @viirya
!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. I
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18640
lgtm
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Hi, @cloud-fan , @rxin , @sameeragarwal and @mridulm .
Could you merge this PR?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as we
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Thank you again, @viirya .
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled an
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/18640
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the featur
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/18640
LGTM besides some minor questions, @rxin any more comments on this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Hi, @sameeragarwal and @mridulm .
I cannot see any clear reason for the objection here. Also, there is a
positive feedback from @ash211 in the dev@spark, too. This PR will bring an
improve
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18640
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80576/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18640
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18640
**[Test build #80576 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80576/testReport)**
for PR 18640 at commit
[`0f29656`](https://github.com/apache/spark/commit/0
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18640
**[Test build #80576 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80576/testReport)**
for PR 18640 at commit
[`0f29656`](https://github.com/apache/spark/commit/0f
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishe
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Hi, @mridulm, @sameeragarwal , and @rxin .
Please let me know if there is something for me to do here.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Thank you so much, @sameeragarwal .
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user sameeragarwal commented on the issue:
https://github.com/apache/spark/pull/18640
LGTM; unless @rxin still has some strong objections?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not ha
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18640
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80466/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18640
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18640
**[Test build #80466 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80466/testReport)**
for PR 18640 at commit
[`0f29656`](https://github.com/apache/spark/commit/0
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18640
**[Test build #80466 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80466/testReport)**
for PR 18640 at commit
[`0f29656`](https://github.com/apache/spark/commit/0f
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishe
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
@rxin . Could you make some decision for this PR? Do we need to put this
into `sql/hive` still for some reasons?
---
If your project is set up for it, you can reply to this email and have you
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Sure. Thank you so much, @omalley !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user omalley commented on the issue:
https://github.com/apache/spark/pull/18640
I would also comment that in the long term, Spark should move to using the
vectorized reader in ORC's core. That would remove the dependence on ORC's
mapreduce module, which provides row by row shim
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Thank you again for coming and reviewing this PR, @rxin , @kiszk , @mridulm
, @omalley .
So far, we discussed the followings.
1. `Why are we adding this to core? Why not just the h
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
@rxin . How can I proceed this PR now? Could you give me some advice again?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If y
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Thank you, @omalley .
@rxin . I think we had better depend on Apache ORC libraries as is in this
PR.
---
If your project is set up for it, you can reply to this email and have your
r
Github user omalley commented on the issue:
https://github.com/apache/spark/pull/18640
@rxin The ORC core library's dependency tree is aggressively kept as small
as possible. I've gone through and excluded unnecessary jars from our
dependencies. I also kick back pull requests that add
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Hi, @rxin
Since ORC 1.4.0, ORC community provides small shaded jar files to improve
usability in general purposes. This PR uses the followings.
- orc-core-1.4.0-nohive.jar (1.4MB)
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18640
I just checked the dependency size. They look pretty reasonable, roughly 2
MBs in total (although I do worry in the future whether ORC would bring in a
lot more jars).
cc @omalley any guidance
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Until now, I think ORC is the same with most of other data sources(CSV,
JDBC, JSON, PARQUET, TEXT) which live inside `sql/core` now. If that is an
architectural plan of Apache Spark 2.3, I wil
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18640
Why don't we then create a separate orc module? Just copy a few of the
files over?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If y
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
I agree with the following, but this does not block those users. This is
only better than putting the dependency on Hive because it also supports more
the other users who are using ML and stor
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18640
To the best of my knowledge almost everybody runs with Hive anyway and the
vast majority of users that run ORC are Hive users. In hindsight we probably
should have put most of the data source dependenc
Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/18640
LGTM, great to see progress on ORC support.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Thank you for review, @kiszk .
The example may be #17980 , #17924, and #17943 .
If possible, in this PR, I want to focus on only `Dependency on ORC` issue.
---
If your project is set
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Thank you for review, @rxin .
We can use ORC like Parquet now. Parquet is inside `sql/core`, not
`sql/hive`.
---
If your project is set up for it, you can reply to this email and have you
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/18640
Can we add any smaller code to use this, too?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18640
Why are we adding this to core? Why not just the hive module?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not h
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Hi, @liancheng , @zhzhan , @rxin , @marmbrus .
I'm pining you since you worked on #6194 before.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18640
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80221/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18640
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18640
**[Test build #80221 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80221/testReport)**
for PR 18640 at commit
[`0f29656`](https://github.com/apache/spark/commit/0
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18640
**[Test build #80221 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80221/testReport)**
for PR 18640 at commit
[`0f29656`](https://github.com/apache/spark/commit/0f
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18640
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18640
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80055/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18640
**[Test build #80055 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80055/testReport)**
for PR 18640 at commit
[`0f29656`](https://github.com/apache/spark/commit/0
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18640
**[Test build #80055 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80055/testReport)**
for PR 18640 at commit
[`0f29656`](https://github.com/apache/spark/commit/0f
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishe
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18640
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18640
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79951/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18640
**[Test build #79951 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79951/testReport)**
for PR 18640 at commit
[`0f29656`](https://github.com/apache/spark/commit/0
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18640
**[Test build #79951 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79951/testReport)**
for PR 18640 at commit
[`0f29656`](https://github.com/apache/spark/commit/0f
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishe
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
Hi, @rxin , @srowen , @sameeragarwal , @cloud-fan , @hvanhovell ,
@gatorsmile , @ueshin , @viirya , @kiszk .
Could you review this small PR about depedency change?
This is a s
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18640
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18640
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79627/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18640
**[Test build #79627 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79627/testReport)**
for PR 18640 at commit
[`0f29656`](https://github.com/apache/spark/commit/0
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18640
**[Test build #79627 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79627/testReport)**
for PR 18640 at commit
[`0f29656`](https://github.com/apache/spark/commit/0f
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18640
This aims to reduce the review scope for #17980 .
cc @kiszk .
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
62 matches
Mail list logo