dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-661348318
For the following, I'd like to ask your help if you are interested. I
believe we want to build the better Apache Spark in the community together.
> If you generalize the
dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-661345335
For the following, I added SPARK-32318 added a test coverage at
master/3.0/2.4. Are you suggesting that's not enough?
> Finally I do want to point out that there is no m
dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-661344352
@hvanhovell . The following is complete wrong because the above optimization
was one of the recommendations for many Hortonworks customers to save their
HDFS usage. I knew
dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-660693225
cc @cloud-fan and @gatorsmile once more.
This is an automated message from the Apache Git Service.
To resp
dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-660690119
Retest this please.
This is an automated message from the Apache Git Service.
To respond to the message, p
dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-659147381
Thank you for quick updating, @aokolnychyi . Also, thank you all for your
opinions.
This is an automated
dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658831236
BTW, @aokolnychyi . I merged the corner case test case. Could you rebase
this to the master? Then, we can discuss how to proceed this PR in a narrowed
direction.
---
dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658830542
cc @marmbrus and @gatorsmile since they know the existing customers well and
are good at protecting them.
dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658828708
@hvanhovell . I agree with you for the followings.
> AFAIK nested ordering can be ignored from a relation algebra point of
view.
> Regarding the shuffles. ...
dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658559706
No~ It depends on file formats instead of Spark side.
For example, in the above example, ORC files are small because it supports a
special encoding when the data is sort
dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658558813
I made a PR to add a test coverage for the above case.
- https://github.com/apache/spark/pull/29118
Thi
dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658550248
Very sorry, guys. Due to the above regression, I'll revert this commit
urgently. We can rethink about this PR.
---
dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658549984
**AFTER SPARK-32276**
```
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2,
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")
scala
dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475
To generate small Parquet/ORC files, we do the above tricks, don't we?
This is an automated message from t
dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658543717
Oops. Sorry, guys. It seems that I missed something during testing. For the
following case, we should not remove `Sort`.
**BEFORE THIS PR**
```scala
scala> Seq
dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658538140
Also, cc @gatorsmile and @cloud-fan
This is an automated message from the Apache Git Service.
To respond
dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-657713339
Thank you for pinging me, @aokolnychyi .
This is an automated message from the Apache Git Service.
To resp
17 matches
Mail list logo