spark git commit: [SPARK-19866][ML][PYSPARK] Add local version of Word2Vec findSynonyms for spark.ml: Python API

2017-09-08 Thread holden
nts: 8598d03 Author: Xin Ren Authored: Fri Sep 8 12:09:00 2017 -0700 Committer: Holden Karau Committed: Fri Sep 8 12:09:00 2017 -0700 -- .../scala/org/apache/spark/ml/feature/Word2Vec.scala | 2 +- python/pyspark/ml/feature

spark git commit: [SPARK-15243][ML][SQL][PYTHON] Add missing support for unicode in Param methods & functions in dataframe

2017-09-08 Thread holden
k/commit/8598d03a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8598d03a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8598d03a Branch: refs/heads/master Commit: 8598d03a00a39dd23646bf752f9fed5d28e271c6 Parents: 8a4f228 Author: hyukjinkwon Authored: Fri Sep 8 11:57:33 201

spark git commit: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Values from Estimator

2017-08-22 Thread holden
ddc Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/41bb1ddc Branch: refs/heads/master Commit: 41bb1ddc63298c004bb6a6bb6fff9fd4f6e44792 Parents: d56c262 Author: Bryan Cutler Authored: Tue Aug 22 17:40:50 2017 -0700 Committer: Holden Karau Committed: Tue Aug 22 17:40:50 2017 -0

spark git commit: [SPARK-21566][SQL][PYTHON] Python method for summary

2017-08-18 Thread holden
wip-us.apache.org/repos/asf/spark/diff/10be0184 Branch: refs/heads/master Commit: 10be01848ef28004a287940a4e8d8a044e14b257 Parents: a2db5c5 Author: Andrew Ray Authored: Fri Aug 18 18:10:54 2017 -0700 Committer: Holden Karau Committed: Fri Aug 18 18:10:54 2017 -0

spark git commit: [SPARK-20090][PYTHON] Add StructType.fieldNames in PySpark

2017-07-28 Thread holden
56f79cc Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b56f79cc Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b56f79cc Branch: refs/heads/master Commit: b56f79cc359d093d757af83171175cfd933162d1 Parents: 0ef9fe6 Author: hyukjinkwon Authored: Fri Jul 28 20:59:32 2017 -0700 Committer: Hold

spark git commit: [SPARK-21434][PYTHON][DOCS] Add pyspark pip documentation.

2017-07-21 Thread holden
Repository: spark Updated Branches: refs/heads/branch-2.2 88dccda39 -> da403b953 [SPARK-21434][PYTHON][DOCS] Add pyspark pip documentation. Update the Quickstart and RDD programming guides to mention pip. Built docs locally. Author: Holden Karau Closes #18698 from holdenk/SPARK-21434-

spark git commit: [SPARK-21434][PYTHON][DOCS] Add pyspark pip documentation.

2017-07-21 Thread holden
ilt docs locally. Author: Holden Karau Closes #18698 from holdenk/SPARK-21434-add-pyspark-pip-documentation. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cc00e99d Tree: http://git-wip-us.apache.org/repos/asf/spark/t

spark git commit: [SPARK-21394][SPARK-21432][PYTHON] Reviving callable object/partial function support in UDF in PySpark

2017-07-17 Thread holden
s.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4ce735ee Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4ce735ee Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4ce735ee Branch: refs/heads/master Commit: 4ce735eed103f3bd055c087126acd1366

spark git commit: [SPARK-13534][PYSPARK] Using Apache Arrow to increase performance of DataFrame.toPandas

2017-07-10 Thread holden
ents: 2bfd5ac Author: Bryan Cutler Authored: Mon Jul 10 15:21:03 2017 -0700 Committer: Holden Karau Committed: Mon Jul 10 15:21:03 2017 -0700 -- bin/pyspark |2 +- dev/deps/spark-de

spark git commit: [SPARK-21278][PYSPARK] Upgrade to Py4J 0.10.6

2017-07-05 Thread holden
rg/repos/asf/spark/diff/c8d0aba1 Branch: refs/heads/master Commit: c8d0aba198c0f593c2b6b656c23b3d0fb7ea98a2 Parents: c8e7f44 Author: Dongjoon Hyun Authored: Wed Jul 5 16:33:23 2017 -0700 Committer: Holden Karau Committed: Wed Jul 5 16:33:23 2017 -0700

spark git commit: [SPARK-20627][PYSPARK] Drop the hadoop distirbution name from the Python version

2017-05-09 Thread holden
ifferent hadoop versions, we can simply drop the hadoop information. If at a later point we need to start publishing different hadoop versions we can look at make different packages or similar. ## How was this patch tested? Ran `make-distribution` locally Author: Holden Karau Closes #17885 from

spark git commit: [SPARK-20627][PYSPARK] Drop the hadoop distirbution name from the Python version

2017-05-09 Thread holden
ifferent hadoop versions, we can simply drop the hadoop information. If at a later point we need to start publishing different hadoop versions we can look at make different packages or similar. ## How was this patch tested? Ran `make-distribution` locally Author: Holden Karau Closes #17885 from

spark git commit: [SPARK-20627][PYSPARK] Drop the hadoop distirbution name from the Python version

2017-05-09 Thread holden
t hadoop versions, we can simply drop the hadoop information. If at a later point we need to start publishing different hadoop versions we can look at make different packages or similar. ## How was this patch tested? Ran `make-distribution` locally Author: Holden Karau Closes #17885 from

spark git commit: [SPARK-20442][PYTHON][DOCS] Fill up documentations for functions in Column API in PySpark

2017-04-29 Thread holden
org/repos/asf/spark/tree/d228cd0b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d228cd0b Branch: refs/heads/master Commit: d228cd0b0243773a1c834414a240d1c553ab7af6 Parents: 70f1bcd Author: hyukjinkwon Authored: Sat Apr 29 13:46:40 2017 -0700 Committer: Holden Karau Committed: Sat Apr 29 13

spark git commit: [SPARK-20132][DOCS] Add documentation for column string functions

2017-04-22 Thread holden
sf/spark/diff/8765bc17 Branch: refs/heads/master Commit: 8765bc17d0439032d0378686c4f2b17df2432abc Parents: b3c572a Author: Michael Patterson Authored: Sat Apr 22 19:58:54 2017 -0700 Committer: Holden Karau Committed: Sat Apr 22 19:58:54 20

spark git commit: [SPARK-20360][PYTHON] reprs for interpreters

2017-04-18 Thread holden
7dde.png) Hydrogen: ![screen shot 2017-04-17 at 3 49 55 pm](https://cloud.githubusercontent.com/assets/836375/25107664/a75e1ddc-2385-11e7-8477-258661833007.png) Author: Kyle Kelley Closes #17662 from rgbkrk/repr. (cherry picked from commit f654b39a63d4f9b118733733c7ed2a1b58649e3d) Signed-off-by: Hold

spark git commit: [SPARK-20360][PYTHON] reprs for interpreters

2017-04-18 Thread holden
18 12:35:27 2017 -0700 Committer: Holden Karau Committed: Tue Apr 18 12:35:27 2017 -0700 -- python/pyspark/context.py | 26 ++ python/pyspark/sql/session.py | 11 +++ 2 files changed, 37 in

spark git commit: [SPARK-19019][PYTHON][BRANCH-2.0] Fix hijacked `collections.namedtuple` and port cloudpickle changes for PySpark to work with Python 3.6.0

2017-04-17 Thread holden
700 Committer: Holden Karau Committed: Mon Apr 17 10:03:42 2017 -0700 -- python/pyspark/cloudpickle.py | 98 ++ python/pyspark/serializers.py | 20 2 files changed, 87 insertions(+),

spark git commit: [SPARK-19019][PYTHON][BRANCH-1.6] Fix hijacked `collections.namedtuple` and port cloudpickle changes for PySpark to work with Python 3.6.0

2017-04-17 Thread holden
won Authored: Mon Apr 17 09:58:55 2017 -0700 Committer: Holden Karau Committed: Mon Apr 17 09:58:55 2017 -0700 -- python/pyspark/cloudpickle.py | 98 ++ python/pyspark/serializers.py | 20

spark git commit: [SPARK-20232][PYTHON] Improve combineByKey docs

2017-04-13 Thread holden
mit: 8ddf0d2a60795a2306f94df8eac6e265b1fe5230 Parents: fbe4216 Author: David Gingrich Authored: Thu Apr 13 12:43:28 2017 -0700 Committer: Holden Karau Committed: Thu Apr 13 12:43:28 2017 -0700 -- python/pyspark/rdd.py | 24 +++- 1 file chan

spark git commit: [SPARK-19570][PYSPARK] Allow to disable hive in pyspark shell

2017-04-12 Thread holden
/git-wip-us.apache.org/repos/asf/spark/tree/99a94731 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/99a94731 Branch: refs/heads/master Commit: 99a9473127ec389283ac4ec3b721d2e34434e647 Parents: 5408553 Author: Jeff Zhang Authored: Wed Apr 12 10:54:50 2017 -0700 Committer: Holden Karau Committed: We

spark git commit: [SPARK-19505][PYTHON] AttributeError on Exception.message in Python3

2017-04-11 Thread holden
org/repos/asf/spark/diff/6297697f Branch: refs/heads/master Commit: 6297697f975960a3006c4e58b4964d9ac40eeaf5 Parents: 123b4fb Author: David Gingrich Authored: Tue Apr 11 12:18:31 2017 -0700 Committer: Holden Karau Committed: Tue Apr 11 12:18:31 2017 -0

spark git commit: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvements

2017-04-05 Thread holden
ree/e2773996 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e2773996 Branch: refs/heads/master Commit: e2773996b8d1c0214d9ffac634a059b4923caf7b Parents: a2d8d76 Author: zero323 Authored: Wed Apr 5 11:47:40 2017 -0700 Committer: Holden Karau Committ

spark git commit: [SPARK-19955][PYSPARK] Jenkins Python Conda based test.

2017-03-29 Thread holden
ability. ## How was this patch tested? Updated shell scripts, ran tests locally with installed conda, ran tests in Jenkins. Author: Holden Karau Closes #17355 from holdenk/SPARK-19955-support-python-tests-with-conda. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://

spark git commit: [SPARK-12334][SQL][PYSPARK] Support read from multiple input paths for orc file in DataFrameReader.orc

2017-03-09 Thread holden
ree/cabe1df8 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cabe1df8 Branch: refs/heads/master Commit: cabe1df8606e7e5b9e6efb106045deb3f39f5f13 Parents: 30b18e6 Author: Jeff Zhang Authored: Thu Mar 9 11:44:34 2017 -0800 Committer: Holden Karau Committed: Thu Mar 9 11:44:34 2017 -0

spark git commit: [SPARK-13330][PYSPARK] PYTHONHASHSEED is not propgated to python worker

2017-02-24 Thread holden
wip-us.apache.org/repos/asf/spark/tree/330c3e33 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/330c3e33 Branch: refs/heads/master Commit: 330c3e33bd10f035f49cf3d13357eb2d6d90dabc Parents: 5f74148 Author: Jeff Zhang Authored: Fri Feb 24 15:04:42 2017 -0800 Committer: Holden Karau Committed:

spark git commit: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-24 Thread holden
mit: 4a5e38f5747148022988631cae0248ae1affadd3 Parents: 8f33731 Author: zero323 Authored: Fri Feb 24 08:22:30 2017 -0800 Committer: Holden Karau Committed: Fri Feb 24 08:22:30 2017 -0800 -- python/pyspark/sql/functions.py | 11 ++- python/pyspark/

spark git commit: [SPARK-19160][PYTHON][SQL] Add udf decorator

2017-02-15 Thread holden
os/asf/spark/commit/c97f4e17 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c97f4e17 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c97f4e17 Branch: refs/heads/master Commit: c97f4e17de0ce39e8172a5a4ae81f1914816a358 Parents: 6eca21b Author: zero323 Authored: Wed Feb 15 10:16:34 2

spark git commit: [SPARK-19590][PYSPARK][ML] Update the document for QuantileDiscretizer in pyspark

2017-02-15 Thread holden
800 Committer: Holden Karau Committed: Wed Feb 15 10:12:07 2017 -0800 -- python/pyspark/ml/feature.py | 12 +++- 1 file changed, 11 insertions(+), 1 delet

spark git commit: [SPARK-18541][PYTHON] Add metadata parameter to pyspark.sql.Column.alias()

2017-02-14 Thread holden
nch: refs/heads/master Commit: 7b64f7aa03a49adca5fcafe6fff422823b587514 Parents: e0eeb0f Author: Sheamus K. Parkes Authored: Tue Feb 14 09:57:43 2017 -0800 Committer: Holden Karau Committed: Tue Feb 14 09:57:43 2017 -0800 -- pyt

spark git commit: [SPARK-19162][PYTHON][SQL] UserDefinedFunction should validate that func is callable

2017-02-14 Thread holden
wip-us.apache.org/repos/asf/spark/tree/e0eeb0f8 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e0eeb0f8 Branch: refs/heads/master Commit: e0eeb0f89fffb52cd4d15970bdf00c3c5d1eea88 Parents: 9c4405e Author: zero323 Authored: Tue Feb 14 09:46:22 2017 -0800 Committer: Holden Karau Committed: Tue

spark git commit: [SPARK-19453][PYTHON][SQL][DOC] Correct and extend DataFrame.replace docstring

2017-02-14 Thread holden
nts: 457850e Author: zero323 Authored: Tue Feb 14 09:42:24 2017 -0800 Committer: Holden Karau Committed: Tue Feb 14 09:42:24 2017 -0800 -- python/pyspark/sql/dataframe.py | 18 -- 1 file changed, 12 insertions(+)

spark git commit: [SPARK-19429][PYTHON][SQL] Support slice arguments in Column.__getitem__

2017-02-13 Thread holden
303 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e02ac303 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e02ac303 Branch: refs/heads/master Commit: e02ac303c6356cdf7fffec7361311d828a723afe Parents: 0169360 Author: zero323 Authored: Mon Feb 13 15:23:56 2017 -0800 Committer: Hol

spark git commit: [SPARK-19427][PYTHON][SQL] Support data type string as a returnType argument of UDF

2017-02-13 Thread holden
e7cd33 Author: zero323 Authored: Mon Feb 13 10:37:34 2017 -0800 Committer: Holden Karau Committed: Mon Feb 13 10:37:34 2017 -0800 -- python/pyspark/sql/functions.py | 8 +--- python/pyspark/sql/tests.py | 15 +++

spark git commit: [SPARK-19506][ML][PYTHON] Import warnings in pyspark.ml.util

2017-02-13 Thread holden
323 Closes #16846 from zero323/SPARK-19506. (cherry picked from commit 5e7cd3322b04f1dd207829b70546bc7ffdd63363) Signed-off-by: Holden Karau Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ef4fb7eb Tree: http://git-

spark git commit: [SPARK-19506][ML][PYTHON] Import warnings in pyspark.ml.util

2017-02-13 Thread holden
332 Branch: refs/heads/master Commit: 5e7cd3322b04f1dd207829b70546bc7ffdd63363 Parents: 4321ff9 Author: zero323 Authored: Mon Feb 13 09:26:49 2017 -0800 Committer: Holden Karau Committed: Mon Feb 13 09:26:49 2017 -0800 -- python/pysp

spark git commit: [SPARK-19421][ML][PYSPARK] Remove numClasses and numFeatures methods in LinearSVC

2017-02-05 Thread holden
:51 2017 -0800 Committer: Holden Karau Committed: Sun Feb 5 19:06:51 2017 -0800 -- python/pyspark/ml/classification.py | 16 1 file changed, 16 deleti

spark git commit: [SPARK-14352][SQL] approxQuantile should support multi columns

2017-02-01 Thread holden
wip-us.apache.org/repos/asf/spark/tree/b0985764 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b0985764 Branch: refs/heads/master Commit: b0985764f00acea97df7399a6b337262fc97f5ee Parents: c5fcb7f Author: Zheng RuiFeng Authored: Wed Feb 1 14:11:28 2017 -0800 Committer: Holden Karau Committed:

spark git commit: [SPARK-19163][PYTHON][SQL] Delay _judf initialization to the __call__

2017-01-31 Thread holden
mit: 9063835803e54538c94d95bbddcb4810dd7a1c55 Parents: 081b7ad Author: zero323 Authored: Tue Jan 31 18:03:39 2017 -0800 Committer: Holden Karau Committed: Tue Jan 31 18:03:39 2017 -0800 -- python/pyspark/sql/functions.py |

spark git commit: [SPARK-17161][PYSPARK][ML] Add PySpark-ML JavaWrapper convenience function to create Py4J JavaArrays

2017-01-31 Thread holden
org/repos/asf/spark/diff/57d70d26 Branch: refs/heads/master Commit: 57d70d26c88819360cdc806e7124aa2cc1b9e4c5 Parents: ce112ce Author: Bryan Cutler Authored: Tue Jan 31 15:42:36 2017 -0800 Committer: Holden Karau Committed: Tue Jan 31 15:42:36 2017 -0

spark git commit: [SPARK-19064][PYSPARK] Fix pip installing of sub components

2017-01-25 Thread holden
est script & make-distribution. ## How was this patch tested? Updated sanity test script to import mllib and ml sub-components. Author: Holden Karau Closes #16465 from holdenk/SPARK-19064-fix-pip-install-sub-components. (cherry picked from commit 965c82d8c4b7f2d4dfbc45ec4d47d6b6588094c3) Sig

spark git commit: [SPARK-19064][PYSPARK] Fix pip installing of sub components

2017-01-25 Thread holden
est script & make-distribution. ## How was this patch tested? Updated sanity test script to import mllib and ml sub-components. Author: Holden Karau Closes #16465 from holdenk/SPARK-19064-fix-pip-install-sub-components. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://

<    1   2   3   4   5