Github user cloud-fan closed the pull request at:
https://github.com/apache/spark/pull/22388
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22374#discussion_r217075954
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
---
@@ -216,7 +216,12 @@ class UnivocityParser
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22388
I can't recall the exact conflicts. There are only 2 commits touched these
2 files after my PR, and I carefully checked and theese changs are still
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22344
thanks, merging to master/2.4!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22395
LGTM, cc @viirya @gatorsmile
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22395
LGTM, cc @viirya @gatorsmile
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22353
ok to test
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22355#discussion_r217054661
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CodeGeneratorWithInterpretedFallback.scala
---
@@ -37,19 +37,22
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22401
It's not only about `avg`, it's also about `sum`.
I don't think the decision is made randomly, IIRC we did check other
databases and pick the best one we can do.
`sum
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22390
can you send a new PR for 2.2? thanks
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22390
thanks, merging to master/2.4/2.3!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22391
thanks, merging to 2.3!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22396#discussion_r217008531
--- Diff: docs/sql-programming-guide.md ---
@@ -1898,6 +1898,7 @@ working with timestamps in `pandas_udf`s to get the
best performance, see
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22402
cc @jose-torres @tdas @zsxwing
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/22402
[SPARK-25414][SS] The numInputRows metrics can be incorrect for streaming
self-join
## What changes were proposed in this pull request?
For self-join/self-union, Spark will produce
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22390
LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22380
thanks, merging to master!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22326
IIUC, you are pulling out the join condition with python UDF and create a
filter above join. Then the join become a cross join, which usually runs very
slowly. I think we should keep the cross
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22353
So you need a way to reliably report some extra information like file path
in the event logs, but don't want to show it in the UI as it maybe too long.
Basically we shouldn't put
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22378
LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22378#discussion_r216580577
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala ---
@@ -750,4 +751,27 @@ class InsertSuite extends QueryTest
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22371
My opinion is, it's not worth to spend time on it. The lock is not likely
to be a bottleneck and it's better to keep it simple even it's sub-optimal
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22371
@ConeyLiu we may have an executor lost and then come back, and may have 2
same tasks running on the same executor
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22343
To clarify: this is just a workaround when we hit a problematic(having
case-insensitive duplicated filed names in the parquet file) hive parquet
tables and we want to read it with the native
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22390#discussion_r216575397
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -263,10 +263,12 @@ object BooleanSimplification
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22388
yes
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22387
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22373
thanks, merging to master/2.4!
@mgaido91 can you send a new PR to 2.3? it conflicts
---
-
To unsubscribe, e-mail
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22353
Although event log is in JSON format, it's mostly for internal usage, to be
load by history server and used to build the Spark UI. For compatibility, we
only focus on making history to be able
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22382
thanks, merging to 2.2!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22387
LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/22388
Revert [SPARK-24882][SQL] improve data source v2 API from branch 2.4
## What changes were proposed in this pull request?
As discussed in the dev list, we don't want to include
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22388
cc @rxin @rdblue
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/18544
what's the status here?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20673
What's the status of this PR?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/21308#discussion_r216329544
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/DeleteSupport.java ---
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22373
Can you also update the PR description? thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22373#discussion_r216323311
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala ---
@@ -256,4 +256,9 @@ class VectorAssemblerSuite
assert
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22380
ok to test
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22373
@maropu says it's OK to revert that part, @mgaido91 can you do that? thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22380
cc @tdas @zsxwing @mgaido91
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/22380
[SPARK-25278][SQL][followup] remove the hack in ProgressReporter
## What changes were proposed in this pull request?
It turns out it's a bug that a `DataSourceV2ScanExec` instance may
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22373
I think we should allow `struct` function to take empty arguments.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/18516#discussion_r216292382
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
---
@@ -292,14 +296,17 @@ trait
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/18516#discussion_r216292172
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
---
@@ -292,14 +296,17 @@ trait
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22284
thanks, merging to master/2.4!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22377
thanks, merging to master/2.4!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22343#discussion_r216218261
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala
---
@@ -69,12 +69,25 @@ class ParquetOptions
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22343
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21968
thanks, merging to master!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/18142
> Spark SQL is designed to be compatible with the Hive Metastore, SerDes
and UDFs.
This is different from `Spark can run any Hive SQL`. Spark can load and use
Hive UDFs, with the ri
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22318
Can you define the scope of this PR? In which case we should change the
references in the join condition
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22371
How much perf can we save here? I don't think shuffle writing will be
bottlenecked by this lock.
---
-
To unsubscribe, e-mail
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22010
I think this works, can we post some Spark web UI screenshots to confirm
the shuffle is indeed eliminated?
BTW one idea to simplify the implementation:
```
def distinct
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/21433#discussion_r216209985
--- Diff: core/src/main/scala/org/apache/spark/storage/RDDInfo.scala ---
@@ -53,10 +55,16 @@ class RDDInfo(
}
private[spark] object
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/18142
> BTW, I believe there's no particular standard for backticks themselves
since different DBMS uses different backtick implementations.
You are right, but SQL standard does define
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22343#discussion_r216204114
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala
---
@@ -69,12 +69,25 @@ class ParquetOptions
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22359
thanks, merging to master/2.4!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22351
I'm surprised Hive changes the view text set by Spark. Is it a problem for
views? cc @gatorsmile @jiangxb1987 @hvanhovell
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22343
@dongjoon-hyun does the orc conversion need the same fix?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22343#discussion_r216191236
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala
---
@@ -1390,7 +1395,11 @@ class
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22343
ok to test
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22318
How does this work? When we have duplicated attributes in the join
condition, how can we know which attribute comes from which side
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/18142
After a second thought, isn't it a bug?
```
hive> SELECT `d100.udf100`(`emp`.`name`) FROM `emp`;
USER
```
This clearly violates the SQL semantic: the string ins
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22361
LGTM, I'm merging it to unblock the 2.4 RC, thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22262
ok to test
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22270#discussion_r215869136
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
---
@@ -1729,10 +1730,8 @@ class DataFrameSuite extends QueryTest
Github user cloud-fan closed the pull request at:
https://github.com/apache/spark/pull/22354
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22354
thanks, merging to 2.3!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22352
thanks, merging to master/2.4 (since it's a followup)
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22346
thanks, merging to 2.3!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22354
cc @tgravescs @jiangxb1987 @gatorsmile
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/22354
[SPARK-23243][CORE][2.3] Fix RDD.repartition() data correctness issue
backport https://github.com/apache/spark/pull/22112 to 2.3
---
An alternative fix for https
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22352
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22112
I'm preparing a PR for 2.3, thanks for reminding!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22284
This is a bug for sql metrics, let's include it in Spark 2.4.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22284
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22352
LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22171
Is there a standard about how should CSV store decimal values?
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22338
Since the change looks safer to me and it does fix the regression, I'm
merging it to unblock 2.4 release. Please continue to investigate the root
cause, thanks
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/18142
hmm, then it's too late. Maybe we can add it in Spark 2.3.2, cc @jerryshao
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22346
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/18142
@HyukjinKwon Thanks for the note! I think this behavior is better, I'm
adding a `release_note` tag to the JIRA ticket, so that we don't forget to
mention it in release notes
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22320
thanks, merging to master!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r215479502
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
---
@@ -82,7 +83,7 @@ case class
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22112
@tgravescs thanks for testing it out! I've created
https://issues.apache.org/jira/browse/SPARK-25341 and
https://issues.apache.org/jira/browse/SPARK-25342 to track the followup.
I think
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22336
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22179
Do we have any compatibility issues here? Seems fine to me as we already
shaded kryo.
---
-
To unsubscribe, e-mail: reviews
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22319
thanks, merging to master!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22338
This basically reverts the memory block in the hash computing, now the
memory block is just a holder of the base object and base offset. This does fix
the regression, will we also lose the perf
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22112
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22340
How does the non-test mode resolve the class path issue?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22338
Thanks for working on it! Will it be helpful if we move these hash methods
to `MemoryBlock`? e.g. the code can be `int halfWord =bytes[offset + i
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r215248202
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
---
@@ -82,7 +83,7 @@ case class
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r215247634
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
---
@@ -754,6 +754,54 @@ class HiveDDLSuite
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r215246692
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
---
@@ -805,6 +805,80 @@ class DataFrameReaderWriterSuite
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22320
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22336
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22112
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22319
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
1501 - 1600 of 17635 matches
Mail list logo