Github user michalsenkyr commented on the issue:
https://github.com/apache/spark/pull/16541
@ueshin Thanks for the fix
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/16541
I sent a pr #17473.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, o
Github user brkyvz commented on the issue:
https://github.com/apache/spark/pull/16541
This PR unfortunately broke Scala 2.10 compilation
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-sbt-scala-2.10/4110/console
---
If your project is se
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16541
thanks, merging to master!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wi
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16541
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16541
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75263/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16541
**[Test build #75263 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75263/testReport)**
for PR 16541 at commit
[`d04e043`](https://github.com/apache/spark/commit/d
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16541
**[Test build #75263 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75263/testReport)**
for PR 16541 at commit
[`d04e043`](https://github.com/apache/spark/commit/d0
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16541
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so,
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16541
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16541
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75255/
Test FAILed.
---
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16541
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the fea
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16541
**[Test build #75255 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75255/testReport)**
for PR 16541 at commit
[`d04e043`](https://github.com/apache/spark/commit/d0
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16541
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if t
Github user michalsenkyr commented on the issue:
https://github.com/apache/spark/pull/16541
Thanks. Made the suggested changes in my latest commit.
I also encountered a minor problem when doing final testing. When using a
collection type that is a type alias (e.g., scala.List)
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16541
LGTM except 2 minor comments
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user michalsenkyr commented on the issue:
https://github.com/apache/spark/pull/16541
That seems to be the case here, yes.
What about the other benefits I mentioned (adding support for Java `List`s
and future Scala 2.13 compatibility)? I think the codegen is also more
s
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/16541
I didn't look into the details here, but very often scanning data twice
doesn't necessarily slow things down, especially in the case of sequential
scan.
---
If your project is set up for it, you can
Github user michalsenkyr commented on the issue:
https://github.com/apache/spark/pull/16541
Well, technically yes. But I would say it's a little more than that.
The current approach to deserialization of `Seq`s is to copy the data into
an array, construct a `WrappedArray` (whi
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16541
is it a performance improvement? there is no difference in the benchmark
results
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user michalsenkyr commented on the issue:
https://github.com/apache/spark/pull/16541
Also please note the [UnsafeArrayData-producing
branch](https://github.com/michalsenkyr/spark/compare/dataset-seq-builder...michalsenkyr:dataset-seq-builder-unsafe)
that is not yet merged into
Github user michalsenkyr commented on the issue:
https://github.com/apache/spark/pull/16541
Would it be possible for somebody to review this PR for me? I have a few
ideas that are dependent on this and I'd like to get to work on them. Most
notably support for Java Lists.
Maybe @cl
Github user michalsenkyr commented on the issue:
https://github.com/apache/spark/pull/16541
Apologies for taking so long.
I tried modifying the serialization logic as best as I could to serialize
into `UnsafeArrayData` ([branch
diff](https://github.com/michalsenkyr/spark/comp
Github user michalsenkyr commented on the issue:
https://github.com/apache/spark/pull/16541
I added the benchmarks based on the code you provided but I am getting
almost the same results before and after the optimization (see description). So
either the added benefit is really small o
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/16541
Can we get additional performance improvement if we could generate
`UnsafeArrayData` instead of `GenericArrayData` for this statement ```/* 104 */
final ArrayData serializefromobject_value = fal
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/16541
It looks like the similar optimization to
https://github.com/apache/spark/pull/15044.
Does [this
code](https://github.com/apache/spark/pull/15044/files#diff-d6f03c9d3e82f3774d1110559b039a6d)
hel
Github user michalsenkyr commented on the issue:
https://github.com/apache/spark/pull/16541
Added benchmarks.
I didn't find any standardized way of benchmarking codegen so I wrote a
simple script for Spark Shell. Benchmarks were run on a laptop so the
collections couldn't be
Github user michalsenkyr commented on the issue:
https://github.com/apache/spark/pull/16541
Added codegen comparison for a simple `List` dataset.
I will also prepare a benchmark and add some results later. Those will be
for `List`, `mutable.Queue` and `Seq`. Where `List` and `mutab
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/16541
Is this a perf optimization? If yes, can you show some benchmarks? Also for
codegen it's good to show the generated code before/after this change. You can
get that with
```
df.queryExecuti
Github user michalsenkyr commented on the issue:
https://github.com/apache/spark/pull/16541
Also, the new `CollectObjects` copies quite a bit of code from
`MapObjects`. Should I move the code into a common trait in order to reduce
duplicity or should I leave it as is?
---
If your pr
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16541
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feat
31 matches
Mail list logo