spark git commit: [SPARK-5853][SQL] Schema support in Row.

2015-02-16 Thread rxin
Repository: spark Updated Branches: refs/heads/master a51d51ffa - d380f324c [SPARK-5853][SQL] Schema support in Row. Author: Reynold Xin r...@databricks.com Closes #4640 from rxin/SPARK-5853 and squashes the following commits: 9c6f569 [Reynold Xin] [SPARK-5853][SQL] Schema support in Row

spark git commit: [SPARK-5853][SQL] Schema support in Row.

2015-02-16 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 c6a70694b - d0701d9bf [SPARK-5853][SQL] Schema support in Row. Author: Reynold Xin r...@databricks.com Closes #4640 from rxin/SPARK-5853 and squashes the following commits: 9c6f569 [Reynold Xin] [SPARK-5853][SQL] Schema support

[5/5] spark git commit: [SPARK-5097][SQL] DataFrame

2015-01-27 Thread rxin
test cases to cover the new API - [x] Python support - [ ] Type alias SchemaRDD Author: Reynold Xin r...@databricks.com Author: Davies Liu dav...@databricks.com Closes #4173 from rxin/df1 and squashes the following commits: 0a1a73b [Reynold Xin] Merge branch 'df1' of github.com:rxin/spark into df1

[1/5] spark git commit: [SPARK-5097][SQL] DataFrame

2015-01-27 Thread rxin
Repository: spark Updated Branches: refs/heads/master b1b35ca2e - 119f45d61 http://git-wip-us.apache.org/repos/asf/spark/blob/119f45d6/sql/core/src/test/scala/org/apache/spark/sql/execution/TgfSuite.scala -- diff --git

[4/5] spark git commit: [SPARK-5097][SQL] DataFrame

2015-01-27 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/119f45d6/python/pyspark/sql.py -- diff --git a/python/pyspark/sql.py b/python/pyspark/sql.py index 1990323..7d7550c 100644 --- a/python/pyspark/sql.py +++ b/python/pyspark/sql.py

[3/5] spark git commit: [SPARK-5097][SQL] DataFrame

2015-01-27 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/119f45d6/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala

[2/2] spark git commit: [SPARK-5445][SQL] Made DataFrame dsl usable in Java

2015-01-28 Thread rxin
ago. Author: Reynold Xin r...@databricks.com Closes #4241 from rxin/df-docupdate and squashes the following commits: c0f4810 [Reynold Xin] Fix Python merge conflict. 094c7d7 [Reynold Xin] Minor style fix. Reset Python tests. 3c89f4a [Reynold Xin] Package. dfe6962 [Reynold Xin] Updated Python

spark git commit: [SPARK-5307] SerializationDebugger

2015-01-30 Thread rxin
) - writeExternal data - externalizable object (class org.apache.spark.serializer.ExternalizableClass, org.apache.spark.serializer.ExternalizableClass320bdadc) ``` Author: Reynold Xin r...@databricks.com Closes #4098 from rxin/SerializationDebugger and squashes the following commits: 553b3ff

spark git commit: [SPARK-5307] Add a config option for SerializationDebugger.

2015-01-31 Thread rxin
#4297 from rxin/ser-config and squashes the following commits: f1d4629 [Reynold Xin] [SPARK-5307] Add a config option for SerializationDebugger. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/63640831 Tree: http://git-wip

spark git commit: Closes #4157

2015-01-25 Thread rxin
Repository: spark Updated Branches: refs/heads/master 0d1e67ee9 - d22ca1e92 Closes #4157 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d22ca1e9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d22ca1e9 Diff:

spark git commit: [SPARK-5058] Part 2. Typos and broken URL

2015-01-23 Thread rxin
Repository: spark Updated Branches: refs/heads/master e224dbb01 - 09e09c548 [SPARK-5058] Part 2. Typos and broken URL - Also fixed java link Author: Jongyoul Lee jongy...@gmail.com Closes #4172 from jongyoul/SPARK-FIXDOC and squashes the following commits: 6be03e5 [Jongyoul Lee]

spark git commit: [SPARK-5058] Part 2. Typos and broken URL

2015-01-23 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.2 73cb806f7 - ff2d7bd7b [SPARK-5058] Part 2. Typos and broken URL - Also fixed java link Author: Jongyoul Lee jongy...@gmail.com Closes #4172 from jongyoul/SPARK-FIXDOC and squashes the following commits: 6be03e5 [Jongyoul Lee]

spark git commit: [SPARK-5447][SQL] Replaced reference to SchemaRDD with DataFrame.

2015-01-28 Thread rxin
Repository: spark Updated Branches: refs/heads/master 453d7999b - c8e934ef3 [SPARK-5447][SQL] Replaced reference to SchemaRDD with DataFrame. and [SPARK-5448][SQL] Make CacheManager a concrete class and field in SQLContext Author: Reynold Xin r...@databricks.com Closes #4242 from rxin

spark git commit: [SPARK-5093] Set spark.network.timeout to 120s consistently.

2015-01-05 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6c6f32574 - bbcba3a94 [SPARK-5093] Set spark.network.timeout to 120s consistently. Author: Reynold Xin r...@databricks.com Closes #3903 from rxin/timeout-120 and squashes the following commits: 7c2138e [Reynold Xin] [SPARK-5093] Set

spark git commit: [SPARK-5141][SQL]CaseInsensitiveMap throws java.io.NotSerializableException

2015-01-09 Thread rxin
Repository: spark Updated Branches: refs/heads/master 4554529dc - 545dfcb92 [SPARK-5141][SQL]CaseInsensitiveMap throws java.io.NotSerializableException CaseInsensitiveMap throws java.io.NotSerializableException. Author: luogankun luogan...@gmail.com Closes #3944 from luogankun/SPARK-5141

[4/5] spark git commit: [SPARK-5123][SQL] Reconcile Java/Scala API for data types.

2015-01-13 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/f9969098/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/decimal/Decimal.scala -- diff --git

[2/5] spark git commit: [SPARK-5123][SQL] Reconcile Java/Scala API for data types.

2015-01-13 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/f9969098/sql/core/src/main/scala/org/apache/spark/sql/api/java/UDFRegistration.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/api/java/UDFRegistration.scala

[5/5] spark git commit: [SPARK-5123][SQL] Reconcile Java/Scala API for data types.

2015-01-13 Thread rxin
/spark/pull/3925 Author: Reynold Xin r...@databricks.com Closes #3958 from rxin/SPARK-5123-datatype-2 and squashes the following commits: 66505cc [Reynold Xin] [SPARK-5123] Expose only one version of the data type APIs (i.e. remove the Java-specific API). Project: http://git-wip-us.apache.org

[3/5] spark git commit: [SPARK-5123][SQL] Reconcile Java/Scala API for data types.

2015-01-13 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/f9969098/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala -- diff --git

[1/5] spark git commit: [SPARK-5123][SQL] Reconcile Java/Scala API for data types.

2015-01-13 Thread rxin
Repository: spark Updated Branches: refs/heads/master 14e3f114e - f9969098c http://git-wip-us.apache.org/repos/asf/spark/blob/f9969098/sql/core/src/test/scala/org/apache/spark/sql/DslQuerySuite.scala -- diff --git

spark git commit: [SQL] Remove the duplicated code

2015-02-09 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 6ddbca494 - 663d34ec8 [SQL] Remove the duplicated code Author: Cheng Hao hao.ch...@intel.com Closes #4494 from chenghao-intel/tiny_code_change and squashes the following commits: 450dfe7 [Cheng Hao] remove the duplicated code

spark git commit: [SQL] Remove the duplicated code

2015-02-09 Thread rxin
Repository: spark Updated Branches: refs/heads/master a2d33d0b0 - bd0b5ea70 [SQL] Remove the duplicated code Author: Cheng Hao hao.ch...@intel.com Closes #4494 from chenghao-intel/tiny_code_change and squashes the following commits: 450dfe7 [Cheng Hao] remove the duplicated code Project:

spark git commit: [SPARK-5678] Convert DataFrame to pandas.DataFrame and Series

2015-02-09 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 fa67877c2 - 43972b5d1 [SPARK-5678] Convert DataFrame to pandas.DataFrame and Series ``` pyspark.sql.DataFrame.to_pandas = to_pandas(self) unbound pyspark.sql.DataFrame method Collect all the rows and return a `pandas.DataFrame`.

[3/3] spark git commit: [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames

2015-02-14 Thread rxin
#4556 from rxin/SPARK-5752 and squashes the following commits: 5ef9910 [Reynold Xin] More fix 61d3fca [Reynold Xin] Merge branch 'df5' of github.com:davies/spark into SPARK-5752 ff5832c [Reynold Xin] Fix python 749c675 [Reynold Xin] count(*) fixes. 5806df0 [Reynold Xin] Fix build break again

[2/3] spark git commit: [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames

2015-02-14 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/e98dfe62/sql/core/src/main/scala/org/apache/spark/sql/DataFrameImpl.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameImpl.scala

spark git commit: [SPARK-5675][SQL] XyzType companion object should subclass XyzType

2015-02-09 Thread rxin
Closes #4463 from rxin/type-companion-object and squashes the following commits: 04d5d8d [Reynold Xin] Comment. 976e11e [Reynold Xin] [SPARK-5675][SQL]StringType case object should be subclass of StringType class Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip

spark git commit: [SPARK-5675][SQL] XyzType companion object should subclass XyzType

2015-02-09 Thread rxin
Closes #4463 from rxin/type-companion-object and squashes the following commits: 04d5d8d [Reynold Xin] Comment. 976e11e [Reynold Xin] [SPARK-5675][SQL]StringType case object should be subclass of StringType class (cherry picked from commit f48199eb354d6ec8675c2c1f96c3005064058d66) Signed-off

spark git commit: [HOTFIX] Ignore DirectKafkaStreamSuite.

2015-02-13 Thread rxin
Repository: spark Updated Branches: refs/heads/master 9f31db061 - 378c7eb0d [HOTFIX] Ignore DirectKafkaStreamSuite. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/378c7eb0 Tree:

[2/3] spark git commit: [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames

2015-02-14 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/ba91bf5f/sql/core/src/main/scala/org/apache/spark/sql/DataFrameImpl.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameImpl.scala

[3/3] spark git commit: [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames

2015-02-14 Thread rxin
#4556 from rxin/SPARK-5752 and squashes the following commits: 5ef9910 [Reynold Xin] More fix 61d3fca [Reynold Xin] Merge branch 'df5' of github.com:davies/spark into SPARK-5752 ff5832c [Reynold Xin] Fix python 749c675 [Reynold Xin] count(*) fixes. 5806df0 [Reynold Xin] Fix build break again

[1/3] spark git commit: [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames

2015-02-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master 0ce4e430a - e98dfe627 http://git-wip-us.apache.org/repos/asf/spark/blob/e98dfe62/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala -- diff

spark git commit: [SPARK-5678] Convert DataFrame to pandas.DataFrame and Series

2015-02-09 Thread rxin
Repository: spark Updated Branches: refs/heads/master de7806048 - afb131637 [SPARK-5678] Convert DataFrame to pandas.DataFrame and Series ``` pyspark.sql.DataFrame.to_pandas = to_pandas(self) unbound pyspark.sql.DataFrame method Collect all the rows and return a `pandas.DataFrame`.

[4/4] spark git commit: [SPARK-5469] restructure pyspark.sql into multiple files

2015-02-09 Thread rxin
[SPARK-5469] restructure pyspark.sql into multiple files All the DataTypes moved into pyspark.sql.types The changes can be tracked by `--find-copies-harder -M25` ``` davieslocalhost:~/work/spark/python$ git diff --find-copies-harder -M25 --numstat master.. 2 5

[2/4] spark git commit: [SPARK-5469] restructure pyspark.sql into multiple files

2015-02-09 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/08488c17/python/pyspark/sql/__init__.py -- diff --git a/python/pyspark/sql/__init__.py b/python/pyspark/sql/__init__.py new file mode 100644 index 000..0a5ba00 --- /dev/null

[2/4] spark git commit: [SPARK-5469] restructure pyspark.sql into multiple files

2015-02-09 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/f0562b42/python/pyspark/sql/__init__.py -- diff --git a/python/pyspark/sql/__init__.py b/python/pyspark/sql/__init__.py new file mode 100644 index 000..0a5ba00 --- /dev/null

[1/4] spark git commit: [SPARK-5469] restructure pyspark.sql into multiple files

2015-02-09 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 62b1e1fc0 - f0562b423 http://git-wip-us.apache.org/repos/asf/spark/blob/f0562b42/python/pyspark/sql/types.py -- diff --git a/python/pyspark/sql/types.py

[4/4] spark git commit: [SPARK-5469] restructure pyspark.sql into multiple files

2015-02-09 Thread rxin
[SPARK-5469] restructure pyspark.sql into multiple files All the DataTypes moved into pyspark.sql.types The changes can be tracked by `--find-copies-harder -M25` ``` davieslocalhost:~/work/spark/python$ git diff --find-copies-harder -M25 --numstat master.. 2 5

[1/4] spark git commit: [SPARK-5469] restructure pyspark.sql into multiple files

2015-02-09 Thread rxin
Repository: spark Updated Branches: refs/heads/master d302c4800 - 08488c175 http://git-wip-us.apache.org/repos/asf/spark/blob/08488c17/python/pyspark/sql/types.py -- diff --git a/python/pyspark/sql/types.py

[3/4] spark git commit: [SPARK-5469] restructure pyspark.sql into multiple files

2015-02-09 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/08488c17/python/pyspark/sql.py -- diff --git a/python/pyspark/sql.py b/python/pyspark/sql.py deleted file mode 100644 index 6a6dfbc..000 --- a/python/pyspark/sql.py +++

[3/4] spark git commit: [SPARK-5469] restructure pyspark.sql into multiple files

2015-02-09 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/f0562b42/python/pyspark/sql.py -- diff --git a/python/pyspark/sql.py b/python/pyspark/sql.py deleted file mode 100644 index 6a6dfbc..000 --- a/python/pyspark/sql.py +++

spark git commit: Add a config option to print DAG.

2015-02-09 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 f0562b423 - dad05e068 Add a config option to print DAG. Add a config option spark.rddDebug.enable to check whether to print DAG info. When spark.rddDebug.enable is true, it will print information about DAG in the log. Author:

spark git commit: Add a config option to print DAG.

2015-02-09 Thread rxin
Repository: spark Updated Branches: refs/heads/master 08488c175 - 31d435ecf Add a config option to print DAG. Add a config option spark.rddDebug.enable to check whether to print DAG info. When spark.rddDebug.enable is true, it will print information about DAG in the log. Author:

spark git commit: [SPARK-5577] Python udf for DataFrame

2015-02-04 Thread rxin
Repository: spark Updated Branches: refs/heads/master e0490e271 - dc101b0e4 [SPARK-5577] Python udf for DataFrame Author: Davies Liu dav...@databricks.com Closes #4351 from davies/python_udf and squashes the following commits: d250692 [Davies Liu] fix conflict 34234d4 [Davies Liu] Merge

spark git commit: [HOTFIX] MLlib build break.

2015-02-05 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 fba2dc663 - c83d118fa [HOTFIX] MLlib build break. (cherry picked from commit 6580929fa029c4010dd4170de9be9f18516f8e5a) Signed-off-by: Reynold Xin r...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-5602][SQL] Better support for creating DataFrame from local data collection

2015-02-04 Thread rxin
. Author: Reynold Xin r...@databricks.com Closes #4372 from rxin/localDataFrame and squashes the following commits: f696858 [Reynold Xin] style checker. 839ef7f [Reynold Xin] [SPARK-5602][SQL] Better support for creating DataFrame from local data collection. (cherry picked from commit

spark git commit: [SPARK-5602][SQL] Better support for creating DataFrame from local data collection

2015-02-04 Thread rxin
: Reynold Xin r...@databricks.com Closes #4372 from rxin/localDataFrame and squashes the following commits: f696858 [Reynold Xin] style checker. 839ef7f [Reynold Xin] [SPARK-5602][SQL] Better support for creating DataFrame from local data collection. Project: http://git-wip-us.apache.org/repos

spark git commit: [SPARK-5538][SQL] Fix flaky CachedTableSuite

2015-02-04 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6b4c7f080 - 206f9bc36 [SPARK-5538][SQL] Fix flaky CachedTableSuite Author: Reynold Xin r...@databricks.com Closes #4379 from rxin/CachedTableSuite and squashes the following commits: f2b44ce [Reynold Xin] [SQL] Fix flaky CachedTableSuite

spark git commit: [SPARK-5538][SQL] Fix flaky CachedTableSuite

2015-02-04 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 f05bfa633 - 1901b19c5 [SPARK-5538][SQL] Fix flaky CachedTableSuite Author: Reynold Xin r...@databricks.com Closes #4379 from rxin/CachedTableSuite and squashes the following commits: f2b44ce [Reynold Xin] [SQL] Fix flaky

spark git commit: [Branch-1.3] [DOC] doc fix for date

2015-02-05 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 d1066e921 - 17ef7f930 [Branch-1.3] [DOC] doc fix for date Trivial fix. Author: Daoyuan Wang daoyuan.w...@intel.com Closes #4400 from adrian-wang/docdate and squashes the following commits: 31bbe40 [Daoyuan Wang] doc fix for date

spark git commit: [Branch-1.3] [DOC] doc fix for date

2015-02-05 Thread rxin
Repository: spark Updated Branches: refs/heads/master 081ac69f3 - 6fa4ac1b0 [Branch-1.3] [DOC] doc fix for date Trivial fix. Author: Daoyuan Wang daoyuan.w...@intel.com Closes #4400 from adrian-wang/docdate and squashes the following commits: 31bbe40 [Daoyuan Wang] doc fix for date

spark git commit: [SPARK-5617][SQL] fix test failure of SQLQuerySuite

2015-02-05 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6fa4ac1b0 - a83936e10 [SPARK-5617][SQL] fix test failure of SQLQuerySuite SQLQuerySuite test failure: [info] - simple select (22 milliseconds) [info] - sorting (722 milliseconds) [info] - external sorting (728 milliseconds) [info] - limit

spark git commit: [SPARK-5126][Core] Verify Spark urls before creating Actors so that invalid urls can crash the process.

2015-01-07 Thread rxin
Repository: spark Updated Branches: refs/heads/master d345ebebd - 2b729d225 [SPARK-5126][Core] Verify Spark urls before creating Actors so that invalid urls can crash the process. Because `actorSelection` will return `deadLetters` for an invalid path, Worker keeps quiet for an invalid

spark git commit: [SPARK-5067][Core] Use '===' to compare well-defined case class

2015-01-04 Thread rxin
Repository: spark Updated Branches: refs/heads/master 939ba1f8f - 72396522b [SPARK-5067][Core] Use '===' to compare well-defined case class A simple fix would be adding `assert(e1.appId == e2.appId)` for `SparkListenerApplicationStart`. But actually we can use `===` for well-defined case

spark git commit: [SPARK-5069][Core] Fix the race condition of TaskSchedulerImpl.dagScheduler

2015-01-04 Thread rxin
Repository: spark Updated Branches: refs/heads/master 72396522b - 6c726a3fb [SPARK-5069][Core] Fix the race condition of TaskSchedulerImpl.dagScheduler It's not necessary to set `TaskSchedulerImpl.dagScheduler` in preStart. It's safe to set it after `initializeEventProcessActor()`. Author:

spark git commit: [SPARK-5083][Core] Fix a flaky test in TaskResultGetterSuite

2015-01-04 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6c726a3fb - 27e7f5a72 [SPARK-5083][Core] Fix a flaky test in TaskResultGetterSuite Because `sparkEnv.blockManager.master.removeBlock` is asynchronous, we need to make sure the block has already been removed before calling

spark git commit: [SPARK-5074][Core] Fix a non-deterministic test failure

2015-01-04 Thread rxin
Repository: spark Updated Branches: refs/heads/master 27e7f5a72 - 5c506cecb [SPARK-5074][Core] Fix a non-deterministic test failure Add `assert(sc.listenerBus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS))` to make sure `sparkListener` receive the message. Author: zsxwing zsxw...@gmail.com Closes

spark git commit: [SPARK-4688] Have a single shared network timeout in Spark

2015-01-05 Thread rxin
Repository: spark Updated Branches: refs/heads/master 5c506cecb - d3f07fd23 [SPARK-4688] Have a single shared network timeout in Spark [SPARK-4688] Have a single shared network timeout in Spark Author: Varun Saxena vsaxena.va...@gmail.com Author: varunsaxena vsaxena.va...@gmail.com Closes

spark git commit: [HOTFIX] Build break due to https://github.com/apache/spark/pull/5128

2015-03-22 Thread rxin
Repository: spark Updated Branches: refs/heads/master a41b9c600 - 7a0da4770 [HOTFIX] Build break due to https://github.com/apache/spark/pull/5128 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7a0da477 Tree:

spark git commit: [SPARK-6299][CORE] ClassNotFoundException in standalone mode when running groupByKey with class defined in REPL

2015-03-17 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 47cce984e - 5c16ced1e [SPARK-6299][CORE] ClassNotFoundException in standalone mode when running groupByKey with class defined in REPL ``` case class ClassA(value: String) val rdd = sc.parallelize(List((k1, ClassA(v1)), (k1,

spark git commit: [SQL][docs][minor] Fixed sample code in SQLContext scaladoc

2015-03-17 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 5c16ced1e - 426816b5c [SQL][docs][minor] Fixed sample code in SQLContext scaladoc Error in the code sample of the `implicits` object in `SQLContext`. Author: Lomig Mégard lomig.meg...@gmail.com Closes #5051 from tarfaa/simple and

spark git commit: [SPARK-6299][CORE] ClassNotFoundException in standalone mode when running groupByKey with class defined in REPL

2015-03-17 Thread rxin
Repository: spark Updated Branches: refs/heads/master 9667b9f9c - f0edeae7f [SPARK-6299][CORE] ClassNotFoundException in standalone mode when running groupByKey with class defined in REPL ``` case class ClassA(value: String) val rdd = sc.parallelize(List((k1, ClassA(v1)), (k1, ClassA(v2)) ))

spark git commit: [SPARK-6357][GraphX] Add unapply in EdgeContext

2015-03-17 Thread rxin
Repository: spark Updated Branches: refs/heads/master 68707225f - b3e6eca81 [SPARK-6357][GraphX] Add unapply in EdgeContext This extractor is mainly used for Graph#aggregateMessages*. Author: Takeshi YAMAMURO linguin@gmail.com Closes #5047 from maropu/AddUnapplyInEdgeContext and

spark git commit: [SPARK-6383][SQL]Fixed compiler and errors in Dataframe examples

2015-03-17 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 3ea38bc3d - cee6d0877 [SPARK-6383][SQL]Fixed compiler and errors in Dataframe examples Author: Tijo Thomas tijopara...@gmail.com Closes #5068 from tijoparacka/fix_sql_dataframe_example and squashes the following commits: 6953ac1

spark git commit: Tighten up field/method visibility in Executor and made some code more clear to read.

2015-03-19 Thread rxin
and unnecessary fields. I cleaned it up a bit, and also tightened up the visibility of various fields/methods. Also added some inline documentation to help understand this code better. Author: Reynold Xin r...@databricks.com Closes #4850 from rxin/executor and squashes the following commits

[1/2] spark git commit: [SPARK-6428][SQL] Added explicit type for all public methods in sql/core

2015-03-20 Thread rxin
Repository: spark Updated Branches: refs/heads/master 257cde7c3 - a95043b17 http://git-wip-us.apache.org/repos/asf/spark/blob/a95043b1/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala -- diff --git

[2/2] spark git commit: [SPARK-6428][SQL] Added explicit type for all public methods in sql/core

2015-03-20 Thread rxin
[SPARK-6428][SQL] Added explicit type for all public methods in sql/core Also implemented equals/hashCode when they are missing. This is done in order to enable automatic public method type checking. Author: Reynold Xin r...@databricks.com Closes #5104 from rxin/sql-hashcode-explicittype

spark git commit: [Docs] Replace references to SchemaRDD with DataFrame

2015-03-09 Thread rxin
Repository: spark Updated Branches: refs/heads/master f7c799204 - 70f88148b [Docs] Replace references to SchemaRDD with DataFrame Author: Reynold Xin r...@databricks.com Closes #4952 from rxin/schemardd-df-reference and squashes the following commits: b2b1dbe [Reynold Xin] [Docs] Replace

spark git commit: [Docs] Replace references to SchemaRDD with DataFrame

2015-03-09 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 c152f9a7e - 5e58f761f [Docs] Replace references to SchemaRDD with DataFrame Author: Reynold Xin r...@databricks.com Closes #4952 from rxin/schemardd-df-reference and squashes the following commits: b2b1dbe [Reynold Xin] [Docs

spark git commit: [SPARK-6296] [SQL] Added equals to Column

2015-03-12 Thread rxin
Repository: spark Updated Branches: refs/heads/master e921a665c - 25b71d8c1 [SPARK-6296] [SQL] Added equals to Column Author: Volodymyr Lyubinets vlyu...@gmail.com Closes #4988 from vlyubin/columncomp and squashes the following commits: 92d7c8f [Volodymyr Lyubinets] Added equals to Column

spark git commit: [SPARK-6296] [SQL] Added equals to Column

2015-03-12 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 bdc4682af - d9e141cb7 [SPARK-6296] [SQL] Added equals to Column Author: Volodymyr Lyubinets vlyu...@gmail.com Closes #4988 from vlyubin/columncomp and squashes the following commits: 92d7c8f [Volodymyr Lyubinets] Added equals to

spark git commit: [SPARK-6210] [SQL] use prettyString as column name in agg()

2015-03-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 301278126 - ad4756321 [SPARK-6210] [SQL] use prettyString as column name in agg() use prettyString instead of toString() (which include id of expression) as column name in agg() Author: Davies Liu dav...@databricks.com Closes #5006

spark git commit: [SPARK-6210] [SQL] use prettyString as column name in agg()

2015-03-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master e360d5e4a - b38e073fe [SPARK-6210] [SQL] use prettyString as column name in agg() use prettyString instead of toString() (which include id of expression) as column name in agg() Author: Davies Liu dav...@databricks.com Closes #5006 from

[2/2] spark git commit: [SPARK-6428][SQL] Added explicit types for all public methods in catalyst

2015-03-24 Thread rxin
[SPARK-6428][SQL] Added explicit types for all public methods in catalyst I think after this PR, we can finally turn the rule on. There are still some smaller ones that need to be fixed, but those are easier. Author: Reynold Xin r...@databricks.com Closes #5162 from rxin/catalyst-explicit

[1/2] spark git commit: [SPARK-6428][SQL] Added explicit types for all public methods in catalyst

2015-03-24 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 dcf56aa8b - 586e0d924 http://git-wip-us.apache.org/repos/asf/spark/blob/586e0d92/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala

spark git commit: [SPARK-6428][Streaming] Added explicit types for all public methods.

2015-03-24 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6930e965e - 94598653b [SPARK-6428][Streaming] Added explicit types for all public methods. Author: Reynold Xin r...@databricks.com Closes #5110 from rxin/streaming-explicit-type and squashes the following commits: 2c2db32 [Reynold Xin

spark git commit: [DOCUMENTATION]Fixed Missing Type Import in Documentation

2015-03-24 Thread rxin
Repository: spark Updated Branches: refs/heads/master c14ddd97e - c5cc41468 [DOCUMENTATION]Fixed Missing Type Import in Documentation Needed to import the types specifically, not the more general pyspark.sql Author: Bill Chambers wchamb...@ischool.berkeley.edu Author: anabranch

spark git commit: [spark-sql] a better exception message than scala.MatchError for unsupported types in Schema creation

2015-03-30 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 61813662a - 4859c40e2 [spark-sql] a better exception message than scala.MatchError for unsupported types in Schema creation Currently if trying to register an RDD (or DataFrame in 1.3) as a table that has types that have no supported

spark git commit: [SPARK-6592][SQL] fix filter for scaladoc to generate API doc for Row class under catalyst dir

2015-03-30 Thread rxin
catalyst directory, however, we have a corner case that Row class is a public API under that directory we need to include Row into the scaladoc while still excluding other classes of catalyst project Thanks for the help on this patch from rxin and liancheng Author: CodingCat zhunans...@gmail.com

spark git commit: [SPARK-6603] [PySpark] [SQL] add SQLContext.udf and deprecate inferSchema() and applySchema

2015-03-30 Thread rxin
inferSchema() and applySchema(), show an warning for them. cc rxin Author: Davies Liu dav...@databricks.com Closes #5273 from davies/udf and squashes the following commits: 476e947 [Davies Liu] address comments c096fdb [Davies Liu] add SQLContext.udf and deprecate inferSchema() and applySchema

spark git commit: [SPARK-6603] [PySpark] [SQL] add SQLContext.udf and deprecate inferSchema() and applySchema

2015-03-30 Thread rxin
inferSchema() and applySchema(), show an warning for them. cc rxin Author: Davies Liu dav...@databricks.com Closes #5273 from davies/udf and squashes the following commits: 476e947 [Davies Liu] address comments c096fdb [Davies Liu] add SQLContext.udf and deprecate inferSchema() and applySchema

spark git commit: [SPARK-6119][SQL] DataFrame support for missing data handling

2015-03-30 Thread rxin
API. Author: Reynold Xin r...@databricks.com Closes #5274 from rxin/df-missing-value and squashes the following commits: 4ee1b98 [Reynold Xin] Improve error reporting in Python. 33a330c [Reynold Xin] Remove replace for now. bc4fdbb [Reynold Xin] Added documentation for replace. d56f5a5 [Reynold

spark git commit: [SPARK-6119][SQL] DataFrame support for missing data handling

2015-03-30 Thread rxin
to the Python API. Author: Reynold Xin r...@databricks.com Closes #5274 from rxin/df-missing-value and squashes the following commits: 4ee1b98 [Reynold Xin] Improve error reporting in Python. 33a330c [Reynold Xin] Remove replace for now. bc4fdbb [Reynold Xin] Added documentation for replace. d56f5a5

spark git commit: [SPARK-5124][Core] Move StopCoordinator to the receive method since it does not require a reply

2015-03-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master b8ff2bc61 - 56775571c [SPARK-5124][Core] Move StopCoordinator to the receive method since it does not require a reply Hotfix for #4588 cc rxin Author: zsxwing zsxw...@gmail.com Closes #5283 from zsxwing/hotfix and squashes

spark git commit: [SPARK-6623][SQL] Alias DataFrame.na.drop and DataFrame.na.fill in Python.

2015-03-31 Thread rxin
Repository: spark Updated Branches: refs/heads/master f07e71406 - b80a030e9 [SPARK-6623][SQL] Alias DataFrame.na.drop and DataFrame.na.fill in Python. To maintain consistency with the Scala API. Author: Reynold Xin r...@databricks.com Closes #5284 from rxin/df-na-alias and squashes

spark git commit: [SPARK-6625][SQL] Add common string filters to data sources.

2015-03-31 Thread rxin
, Solr. I also took this chance to improve documentation for the data source filters. Author: Reynold Xin r...@databricks.com Closes #5285 from rxin/ds-string-filters and squashes the following commits: f021727 [Reynold Xin] Fixed grammar. 7695a52 [Reynold Xin] [SPARK-6625][SQL] Add common string

spark git commit: [SPARK-6625][SQL] Add common string filters to data sources.

2015-03-31 Thread rxin
. I also took this chance to improve documentation for the data source filters. Author: Reynold Xin r...@databricks.com Closes #5285 from rxin/ds-string-filters and squashes the following commits: f021727 [Reynold Xin] Fixed grammar. 7695a52 [Reynold Xin] [SPARK-6625][SQL] Add common string

spark git commit: [SPARK-6623][SQL] Alias DataFrame.na.drop and DataFrame.na.fill in Python.

2015-03-31 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 a97d4e6bf - cf651a46e [SPARK-6623][SQL] Alias DataFrame.na.drop and DataFrame.na.fill in Python. To maintain consistency with the Scala API. Author: Reynold Xin r...@databricks.com Closes #5284 from rxin/df-na-alias and squashes

spark git commit: [Doc] Improve Python DataFrame documentation

2015-03-31 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 c4c982a65 - e527b3590 [Doc] Improve Python DataFrame documentation Author: Reynold Xin r...@databricks.com Closes #5287 from rxin/pyspark-df-doc-cleanup-context and squashes the following commits: 1841b60 [Reynold Xin] Lint. f2007f1

spark git commit: [Doc] Improve Python DataFrame documentation

2015-03-31 Thread rxin
Repository: spark Updated Branches: refs/heads/master 37326079d - 305abe1e5 [Doc] Improve Python DataFrame documentation Author: Reynold Xin r...@databricks.com Closes #5287 from rxin/pyspark-df-doc-cleanup-context and squashes the following commits: 1841b60 [Reynold Xin] Lint. f2007f1

spark git commit: SPARK-6532 [BUILD] LDAModel.scala fails scalastyle on Windows

2015-03-26 Thread rxin
Repository: spark Updated Branches: refs/heads/master fe15ea976 - c3a52a082 SPARK-6532 [BUILD] LDAModel.scala fails scalastyle on Windows Use standard UTF-8 source / report encoding for scalastyle Author: Sean Owen so...@cloudera.com Closes #5211 from srowen/SPARK-6532 and squashes the

spark git commit: [SPARK-6633][SQL] Should be Contains instead of EndsWith when constructing sources.StringContains

2015-03-31 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 5a957fe0d - d85164637 [SPARK-6633][SQL] Should be Contains instead of EndsWith when constructing sources.StringContains Author: Liang-Chi Hsieh vii...@gmail.com Closes #5299 from viirya/stringcontains and squashes the following

spark git commit: [SPARK-6633][SQL] Should be Contains instead of EndsWith when constructing sources.StringContains

2015-03-31 Thread rxin
Repository: spark Updated Branches: refs/heads/master beebb7ffc - 2036bc599 [SPARK-6633][SQL] Should be Contains instead of EndsWith when constructing sources.StringContains Author: Liang-Chi Hsieh vii...@gmail.com Closes #5299 from viirya/stringcontains and squashes the following commits:

spark git commit: [DOC] Improvements to Python docs.

2015-03-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master f75f633b2 - 5eef00d0c [DOC] Improvements to Python docs. Author: Reynold Xin r...@databricks.com Closes #5238 from rxin/pyspark-docs and squashes the following commits: c285951 [Reynold Xin] Reset deprecation warning. 8c1031e [Reynold

spark git commit: [DOC] Improvements to Python docs.

2015-03-29 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 5e04f4518 - 3db08444b [DOC] Improvements to Python docs. Author: Reynold Xin r...@databricks.com Closes #5238 from rxin/pyspark-docs and squashes the following commits: c285951 [Reynold Xin] Reset deprecation warning. 8c1031e

spark git commit: aggregateMessages example in graphX doc

2015-03-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master 9ce12aaf2 - e7d8ae444 aggregateMessages example in graphX doc Examples illustrating difference between legacy mapReduceTriplets usage and aggregateMessages usage has type issues on the reduce for both operators. Being just an example-

spark git commit: [SPARK-5949] HighlyCompressedMapStatus needs more classes registered w/ kryo

2015-03-03 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 8446ad0eb - 9a0b75cdd [SPARK-5949] HighlyCompressedMapStatus needs more classes registered w/ kryo https://issues.apache.org/jira/browse/SPARK-5949 Author: Imran Rashid iras...@cloudera.com Closes #4877 from

spark git commit: [SPARK-5949] HighlyCompressedMapStatus needs more classes registered w/ kryo

2015-03-03 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6c20f3529 - 1f1fccc5c [SPARK-5949] HighlyCompressedMapStatus needs more classes registered w/ kryo https://issues.apache.org/jira/browse/SPARK-5949 Author: Imran Rashid iras...@cloudera.com Closes #4877 from

spark git commit: SPARK-5984: Fix TimSort bug causes ArrayOutOfBoundsException

2015-02-28 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 aa39460d4 - 317694ccf SPARK-5984: Fix TimSort bug causes ArrayOutOfBoundsException Fix TimSort bug which causes a ArrayOutOfBoundsException. Using the proposed fix here

spark git commit: SPARK-5984: Fix TimSort bug causes ArrayOutOfBoundsException

2015-02-28 Thread rxin
Repository: spark Updated Branches: refs/heads/master 86fcdaef6 - 643300a6e SPARK-5984: Fix TimSort bug causes ArrayOutOfBoundsException Fix TimSort bug which causes a ArrayOutOfBoundsException. Using the proposed fix here

spark git commit: SPARK-5930 [DOCS] Documented default of spark.shuffle.io.retryWait is confusing

2015-02-25 Thread rxin
Repository: spark Updated Branches: refs/heads/master f84c799ea - 7d8e6a2e4 SPARK-5930 [DOCS] Documented default of spark.shuffle.io.retryWait is confusing Clarify default max wait in spark.shuffle.io.retryWait docs CC andrewor14 Author: Sean Owen so...@cloudera.com Closes #4769 from

<    2   3   4   5   6   7   8   9   10   11   >