Github user gberger commented on the issue:
https://github.com/apache/spark/pull/19792
Great! Thanks all
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user gberger commented on the issue:
https://github.com/apache/spark/pull/19792
@HyukjinKwon no worries, I understand. We gotta be 100% thorough here.
Thanks for the help
---
-
To unsubscribe, e-mail
Github user gberger commented on a diff in the pull request:
https://github.com/apache/spark/pull/19792#discussion_r156971910
--- Diff: python/pyspark/sql/types.py ---
@@ -1083,7 +1083,11 @@ def _infer_schema(row):
elif hasattr(row, "_fields"): #
Github user gberger commented on a diff in the pull request:
https://github.com/apache/spark/pull/19792#discussion_r156339240
--- Diff: python/pyspark/sql/types.py ---
@@ -1083,7 +1083,8 @@ def _infer_schema(row):
elif hasattr(row, "_fields"): #
Github user gberger commented on a diff in the pull request:
https://github.com/apache/spark/pull/19792#discussion_r156338937
--- Diff: python/pyspark/sql/types.py ---
@@ -1083,7 +1083,8 @@ def _infer_schema(row):
elif hasattr(row, "_fields"): #
Github user gberger commented on a diff in the pull request:
https://github.com/apache/spark/pull/19792#discussion_r15602
--- Diff: python/pyspark/sql/types.py ---
@@ -1083,7 +1083,8 @@ def _infer_schema(row):
elif hasattr(row, "_fields"): #
Github user gberger commented on the issue:
https://github.com/apache/spark/pull/19792
Good catch @HyukjinKwon! I reverted those changes and added a test to cover
this regression.
---
-
To unsubscribe, e-mail
Github user gberger commented on a diff in the pull request:
https://github.com/apache/spark/pull/19792#discussion_r155238727
--- Diff: python/pyspark/sql/session.py ---
@@ -405,7 +401,7 @@ def _createFromLocal(self, data, schema):
data = list(data
Github user gberger commented on a diff in the pull request:
https://github.com/apache/spark/pull/19792#discussion_r155238617
--- Diff: python/pyspark/sql/session.py ---
@@ -405,7 +401,7 @@ def _createFromLocal(self, data, schema):
data = list(data
Github user gberger commented on the issue:
https://github.com/apache/spark/pull/19792
@HyukjinKwon done, with test added.
```
>>> spark.createDataFrame(spark.sparkContext.parallelize([[None, 1], ["a",
None], [1, 1]]), schema=["a", "b&qu
Github user gberger commented on the issue:
https://github.com/apache/spark/pull/19792
Fixed
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user gberger commented on the issue:
https://github.com/apache/spark/pull/19792
Friendly ping -- I've fixed that @ueshin.
Is there anything else I should look at to get this to be merged?
/cc @Hyukji
Github user gberger commented on a diff in the pull request:
https://github.com/apache/spark/pull/19792#discussion_r154036405
--- Diff: python/pyspark/sql/types.py ---
@@ -1108,19 +1109,33 @@ def _has_nulltype(dt):
return isinstance(dt, NullType)
-def
Github user gberger commented on the issue:
https://github.com/apache/spark/pull/19792
Hi all,
I have changed the error message to be like #18521. Here are some examples:
```
_merge_type(
StructType([StructField("f1", ArrayType(MapType(
Github user gberger commented on a diff in the pull request:
https://github.com/apache/spark/pull/19792#discussion_r153579071
--- Diff: python/pyspark/sql/types.py ---
@@ -1108,19 +1109,23 @@ def _has_nulltype(dt):
return isinstance(dt, NullType)
-def
Github user gberger commented on a diff in the pull request:
https://github.com/apache/spark/pull/19792#discussion_r153578612
--- Diff: python/pyspark/sql/tests.py ---
@@ -1722,6 +1723,83 @@ def test_infer_long_type(self):
self.assertEqual(_infer_type(2**61), LongType
Github user gberger commented on a diff in the pull request:
https://github.com/apache/spark/pull/19792#discussion_r153160307
--- Diff: python/pyspark/sql/tests.py ---
@@ -1722,6 +1723,83 @@ def test_infer_long_type(self):
self.assertEqual(_infer_type(2**61), LongType
Github user gberger commented on the issue:
https://github.com/apache/spark/pull/19792
Maybe a more performant way to do the path in the error message would be to
propagate it *up* the stack via try/catching the errors and adding the paths as
it goes.
But this way seems
Github user gberger commented on the issue:
https://github.com/apache/spark/pull/19792
Hey all,
## Error message
I revamped the error message and made it "recursive" similar to
@HyukjinKwon. Here's an example:
```
Github user gberger commented on the issue:
https://github.com/apache/spark/pull/19792
For sure @ueshin, I will add tests.
@HyukjinKwon understood!
How do we go about testing performance regression
Github user gberger commented on the issue:
https://github.com/apache/spark/pull/19792
@ueshin I think this build fail was an outage, can we retest?
Could not load hsdis-amd64.so; library not loadable; PrintAssembly is
disabled
Github user gberger commented on the issue:
https://github.com/apache/spark/pull/19792
Jenkins, retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user gberger commented on the issue:
https://github.com/apache/spark/pull/19792
Jenkins, please retest
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user gberger commented on the issue:
https://github.com/apache/spark/pull/19792
@ueshin
The reason that I modified the case for StructType is that, in
session.py#341, for each Pandas DF row we obtain a StructType with StructFields
mapping column names to value type
Github user gberger commented on the issue:
https://github.com/apache/spark/pull/19792
@ebuildy fixed
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user gberger opened a pull request:
https://github.com/apache/spark/pull/19792
[SPARK-22566][PYTHON] Better error message for `_merge_type` in Pandas to
Spark DF conversion
## What changes were proposed in this pull request?
It provides a better error message when
26 matches
Mail list logo