Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/21465
merged to master, thanks @huaxingao !
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22273
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22273
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22273
restest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r239569077
--- Diff: python/pyspark/ml/param/shared.py ---
@@ -814,3 +814,25 @@ def getDistanceMeasure(self):
"""
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/23236
> I suspect that might because as the resource usage is heavy,
StreamingLogisticRegressionWithSGD's training speed on input batch stream can't
always catch up predict batch stream.
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22275
merged to master, thanks @holdenk @viirya and @felixcheung !
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/23236
True, the test is not that long under light resources. Locally, I saw a
couple seconds difference with the changes I mentioned. The weird thing is the
unmodified test completes after the 11th
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r239245388
--- Diff: python/pyspark/ml/regression.py ---
@@ -705,12 +710,59 @@ def getNumTrees(self):
return self.getOrDefault(self.numTrees
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r239240113
--- Diff: python/pyspark/ml/regression.py ---
@@ -705,12 +710,59 @@ def getNumTrees(self):
return self.getOrDefault(self.numTrees
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r239243661
--- Diff: python/pyspark/ml/classification.py ---
@@ -1174,9 +1165,31 @@ def trees(self):
return [DecisionTreeClassificationModel(m) for m
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r239242316
--- Diff: python/pyspark/ml/classification.py ---
@@ -1174,9 +1165,31 @@ def trees(self):
return [DecisionTreeClassificationModel(m) for m
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r239243683
--- Diff: python/pyspark/ml/classification.py ---
@@ -1174,9 +1165,31 @@ def trees(self):
return [DecisionTreeClassificationModel(m) for m
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r239211515
--- Diff: python/pyspark/ml/classification.py ---
@@ -1174,9 +1165,31 @@ def trees(self):
return [DecisionTreeClassificationModel(m) for m
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/23236
Seems ok to me, but there are a few silly things with this test that might
help also
* why is the `stepSize` so low at 0.01? I think it would be fine at 0.1,
but even conservatively at
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/23203#discussion_r238868565
--- Diff: python/run-tests.py ---
@@ -93,17 +93,18 @@ def run_individual_python_test(target_dir, test_name,
pyspark_python):
"py
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r238801573
--- Diff: python/pyspark/ml/classification.py ---
@@ -1174,9 +1165,31 @@ def trees(self):
return [DecisionTreeClassificationModel(m) for m
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r238808440
--- Diff: python/pyspark/ml/classification.py ---
@@ -1174,9 +1165,31 @@ def trees(self):
return [DecisionTreeClassificationModel(m) for m
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r238809338
--- Diff: python/pyspark/ml/classification.py ---
@@ -1242,40 +1255,36 @@ class GBTClassifier(JavaEstimator, HasFeaturesCol,
HasLabelCol
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r238801256
--- Diff: python/pyspark/ml/classification.py ---
@@ -1174,9 +1165,31 @@ def trees(self):
return [DecisionTreeClassificationModel(m) for m
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r238809091
--- Diff: python/pyspark/ml/regression.py ---
@@ -650,19 +650,20 @@ def getFeatureSubsetStrategy(self):
return self.getOrDefault
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/23200
merged to master, thanks @HyukjinKwon
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/23200#discussion_r238454041
--- Diff: python/pyspark/mllib/tests/test_linalg.py ---
@@ -22,33 +22,18 @@
from numpy import array, array_equal, zeros, arange, tile, ones, inf
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22954
> @BryanCutler BTW, do you know the rough expected timing for Arrow 0.12.0
release?
I think we should be starting the release process soon, so maybe in a week
or
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r236880776
--- Diff: python/pyspark/ml/regression.py ---
@@ -705,12 +705,38 @@ def getNumTrees(self):
return self.getOrDefault(self.numTrees
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r236881042
--- Diff: python/pyspark/ml/regression.py ---
@@ -1030,9 +1056,9 @@ def featureImportances(self):
@inherit_doc
-class GBTRegressor
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/23055#discussion_r235524421
--- Diff: python/pyspark/worker.py ---
@@ -22,7 +22,12 @@
import os
import sys
import time
-import resource
+# 'resource&
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/23055#discussion_r235523238
--- Diff: docs/configuration.md ---
@@ -189,7 +189,7 @@ of the most common options to set are:
limited to this amount. If not set, Spark will
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21465#discussion_r235128413
--- Diff: python/pyspark/ml/classification.py ---
@@ -1176,8 +1176,8 @@ def trees(self):
@inherit_doc
class GBTClassifier(JavaEstimator
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/23077
Oh, I think the PR title should be SPARK-26105 too
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/23077
>BTW, Bryan, do you have some time to work on the has_numpy stuff
Yup, I can do that
---
-
To unsubscribe, e-m
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/23077
Oops, actually I think there is one more here
https://github.com/apache/spark/blob/master/python/pyspark/testing/mllibutils.py#L20
Other than that, looks good
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/23063
cc @HyukjinKwon
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/23063
Dist by line count:
```
348 ./test_algorithms.py
84 ./test_base.py
71 ./test_evaluation.py
314 ./test_feature.py
118 ./test_image.py
392 ./test_linalg.py
367
GitHub user BryanCutler opened a pull request:
https://github.com/apache/spark/pull/23063
[SPARK-26033][PYTHON][TESTS] Break large ml/tests.py file into smaller files
## What changes were proposed in this pull request?
This PR breaks down the large ml/tests.py file that
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/23056#discussion_r234093063
--- Diff: python/pyspark/testing/mllibutils.py ---
@@ -0,0 +1,44 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/23056
cc @HyukjinKwon @squito
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/23056
Dist by line count:
```
313 ./test_algorithms.py
201 ./test_feature.py
642 ./test_linalg.py
197 ./test_stat.py
523 ./test_streaming_algorithms.py
115 ./test_util.py
GitHub user BryanCutler opened a pull request:
https://github.com/apache/spark/pull/23056
[SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py file into smaller
files
## What changes were proposed in this pull request?
This PR breaks down the large mllib/tests.py file
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/23034
> Also, @BryanCutler, I think we can talk about locations of
testing/...util.py later when we finished to split the tests. Moving utils
would probably cause less conflicts and should be g
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/23033
Looks like ML is using `QuietTest` also, so the import needs to be updated
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22954#discussion_r233209385
--- Diff: R/pkg/R/SQLContext.R ---
@@ -189,19 +238,67 @@ createDataFrame <- function(data, schema = NULL,
samplingRatio = 1.0,
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/23021#discussion_r233183572
--- Diff: python/pyspark/testing/sqlutils.py ---
@@ -0,0 +1,268 @@
+#
--- End diff --
Maybe rename this file to `sql_testing_utils.py
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22954
I don't know R well enough to review that code, but the results look
awesome! Nice work @HyukjinKwon!!
---
-
To unsubs
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22954#discussion_r232425279
--- Diff: R/pkg/R/SQLContext.R ---
@@ -189,19 +238,67 @@ createDataFrame <- function(data, schema = NULL,
samplingRatio = 1.0,
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22954#discussion_r232425031
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala
---
@@ -225,4 +226,25 @@ private[sql] object SQLUtils extends Logging
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22913
Sounds good, thanks @javierluraschi !
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22275
ping @HyukjinKwon and @viirya to maybe take another look at the recent
changes to make this cleaner, if you are able to. Thanks
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22275#discussion_r232145973
--- Diff: python/pyspark/sql/tests.py ---
@@ -4923,6 +4923,28 @@ def test_timestamp_dst(self):
self.assertPandasEqual(pdf
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22275#discussion_r231311398
--- Diff: python/pyspark/sql/tests.py ---
@@ -4923,6 +4923,28 @@ def test_timestamp_dst(self):
self.assertPandasEqual(pdf
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22913
I'm a little against adding this because the Arrow Java Vectors used so far
were done to match the internal data of Spark, to keep things simple and avoid
lots of conversions on the Java
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22913#discussion_r230953015
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowUtils.scala
---
@@ -71,6 +71,7 @@ object ArrowUtils {
case d
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22275#discussion_r229522939
--- Diff: python/pyspark/sql/tests.py ---
@@ -4923,6 +4923,28 @@ def test_timestamp_dst(self):
self.assertPandasEqual(pdf
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22275
Apologies for the delay in circling back to this. I reorganized a little to
simplify and expanded the comments to hopefully better describe the code.
A quick summary of the changes: I
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22871
Thanks @HyukjinKwon , looks good!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22795
merged to master, thanks @HyukjinKwon !
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r227878740
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
@@ -63,7 +65,7 @@ private[spark] object PythonEvalType
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22795#discussion_r227875311
--- Diff: python/pyspark/sql/functions.py ---
@@ -3023,6 +3023,42 @@ def pandas_udf(f=None, returnType=None,
functionType=None
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22795#discussion_r227593281
--- Diff: python/pyspark/sql/functions.py ---
@@ -3023,6 +3023,42 @@ def pandas_udf(f=None, returnType=None,
functionType=None
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22795#discussion_r227592794
--- Diff: python/pyspark/sql/functions.py ---
@@ -3023,6 +3023,42 @@ def pandas_udf(f=None, returnType=None,
functionType=None
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r227582390
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
@@ -63,7 +65,7 @@ private[spark] object PythonEvalType
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r227579686
--- Diff: python/pyspark/sql/tests.py ---
@@ -6323,6 +6333,33 @@ def ordered_window(self):
def unpartitioned_window(self):
return
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r227579436
--- Diff: python/pyspark/sql/tests.py ---
@@ -6481,12 +6516,116 @@ def test_invalid_args(self):
foo_udf = pandas_udf(lambda x: x
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22807#discussion_r227571481
--- Diff: python/pyspark/serializers.py ---
@@ -248,7 +248,14 @@ def create_array(s, t):
# TODO: see ARROW-2432. Remove when the
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22807#discussion_r227572701
--- Diff: python/pyspark/sql/tests.py ---
@@ -4961,6 +4961,31 @@ def foofoo(x, y):
).collect
)
+def
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22795#discussion_r227093067
--- Diff: python/pyspark/sql/functions.py ---
@@ -3023,6 +3023,42 @@ def pandas_udf(f=None, returnType=None,
functionType=None
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22655
Thanks @viirya !
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r223774544
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
@@ -63,7 +65,7 @@ private[spark] object PythonEvalType
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r223505747
--- Diff: python/pyspark/worker.py ---
@@ -154,6 +154,47 @@ def wrapped(*series):
return lambda *a: (wrapped(*a), arrow_return_type
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r223507242
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
@@ -63,7 +65,7 @@ private[spark] object PythonEvalType
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r223506840
--- Diff: python/pyspark/worker.py ---
@@ -154,6 +154,47 @@ def wrapped(*series):
return lambda *a: (wrapped(*a), arrow_return_type
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22305
I think there is a typo in your example in the description
```
@pandas_udf('double', PandasUDFType.GROUPED_AGG)
def avg(v):
return v.mean()
return avg
```
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22653#discussion_r223475373
--- Diff: python/pyspark/sql/tests.py ---
@@ -1149,6 +1149,75 @@ def test_infer_schema(self):
result = self.spark.sql("SELECT l[0].a
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22275
Thanks for the review @holdenk ! I haven't had time to followup, but I'll
take a look through this and see what I can do about making thin
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22275#discussion_r223116201
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3279,34 +3280,33 @@ class Dataset[T] private[sql](
val timeZoneId
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22275#discussion_r223116082
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3279,34 +3280,33 @@ class Dataset[T] private[sql](
val timeZoneId
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22610#discussion_r223070065
--- Diff: python/pyspark/sql/functions.py ---
@@ -2909,6 +2909,11 @@ def pandas_udf(f=None, returnType=None,
functionType=None):
can fail
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22610
> It is pretty new one, is it said we need to upgrade to latest PyArrow in
order to use it? Since it is an option at Table.from_pandas, is it possible to
extend it to pyarrow.Ar
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22610
> Thanks, @BryanCutler. WDYT about documenting the type map thing?
I think that would help in the cases of dates/times because those can get a
little confusing. For primitives, I th
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22610
So pyarrow just added an option when converting from Pandas to raise an
error for unsafe casts. I'd have to try it out to see if it would prevent this
case though. It's a common o
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22610#discussion_r222501309
--- Diff: python/pyspark/worker.py ---
@@ -84,13 +84,36 @@ def wrap_scalar_pandas_udf(f, return_type):
arrow_return_type = to_arrow_type
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22610
Thanks for looking into this @viirya ! I agree that there are lots of
cases where casting to another type is intentional and works fine, so this
isn't a bug. The only other idea I have
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22610#discussion_r222007837
--- Diff: python/pyspark/worker.py ---
@@ -84,13 +84,36 @@ def wrap_scalar_pandas_udf(f, return_type):
arrow_return_type = to_arrow_type
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22540#discussion_r220283006
--- Diff: python/pyspark/worker.py ---
@@ -97,8 +97,9 @@ def verify_result_length(*a):
def wrap_grouped_map_pandas_udf(f, return_type
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22540#discussion_r220274047
--- Diff: python/pyspark/worker.py ---
@@ -97,8 +97,9 @@ def verify_result_length(*a):
def wrap_grouped_map_pandas_udf(f, return_type
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22540#discussion_r220272980
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowUtils.scala
---
@@ -131,11 +131,8 @@ object ArrowUtils {
} else
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22275
> generally, is this going to limit how much data to pass along because of
the bit length of the index?
So the index passed to python is the RecordBatch index, not an element
in
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22477
Thanks @HyukjinKwon !
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22477
cc @HyukjinKwon
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22477
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user BryanCutler opened a pull request:
https://github.com/apache/spark/pull/22477
[SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python
3.6 and Pandas 0.23
## What changes were proposed in this pull request?
Fix test that constructs a Pandas
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22275
@holdenk I was wondering if you had any thoughts on this? Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/20908
merged to master and branch-2.4, thanks @holdenk !
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22140
> Thanks for your understanding. Normally, we are very conservative to
introduce any potential behavior change to the released version.
Yes, I know. It seemed to me at the time
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22140
> Can we just simply take this out from branch-2.3?
Thanks @HyukjinKwon , that is fine with me. What do you think @gatorsm
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/22369#discussion_r216147674
--- Diff: docs/sql-programming-guide.md ---
@@ -1901,6 +1901,7 @@ working with timestamps in `pandas_udf`s to get the
best performance, see
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22140
@gatorsmile it seemed like a straightforward bug to me. Rows with extra
values lead to incorrect output and exceptions when used in `DataFrames`, so it
did not seem like there was any possible
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22140
merged to master, branch 2.4 and 2.3. Thanks @xuanyuanking !
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22140
> yea, to me it looks less sense actually but seems at least working for
now:
good point, I guess it only fails when you supply a sch
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/22329
merged to branch-2.4
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
1 - 100 of 1907 matches
Mail list logo