spark git commit: [SPARK-2321] Several progress API improvements / refactorings

2014-11-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master cbddac236 -> 40eb8b6ef [SPARK-2321] Several progress API improvements / refactorings This PR refactors / extends the status API introduced in #2696. - Change StatusAPI from a mixin trait to a class. Before, the new status API methods wer

spark git commit: [SPARK-2321] Several progress API improvements / refactorings

2014-11-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.2 c044e1241 -> 9eac5fee6 [SPARK-2321] Several progress API improvements / refactorings This PR refactors / extends the status API introduced in #2696. - Change StatusAPI from a mixin trait to a class. Before, the new status API methods

spark git commit: Added contains(key) to Metadata

2014-11-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master 60969b033 -> cbddac236 Added contains(key) to Metadata Add contains(key) to org.apache.spark.sql.catalyst.util.Metadata to test the existence of a key. Otherwise, Class Metadata's get methods may throw NoSuchElement exception if the key d

spark git commit: Added contains(key) to Metadata

2014-11-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.2 37716b795 -> c044e1241 Added contains(key) to Metadata Add contains(key) to org.apache.spark.sql.catalyst.util.Metadata to test the existence of a key. Otherwise, Class Metadata's get methods may throw NoSuchElement exception if the k

spark git commit: [SPARK-4260] Httpbroadcast should set connection timeout.

2014-11-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master 861223ee5 -> 60969b033 [SPARK-4260] Httpbroadcast should set connection timeout. Httpbroadcast sets read timeout but doesn't set connection timeout. Author: Kousuke Saruta Closes #3122 from sarutak/httpbroadcast-timeout and squashes the

spark git commit: [SPARK-4260] Httpbroadcast should set connection timeout.

2014-11-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.2 29a6da372 -> 37716b795 [SPARK-4260] Httpbroadcast should set connection timeout. Httpbroadcast sets read timeout but doesn't set connection timeout. Author: Kousuke Saruta Closes #3122 from sarutak/httpbroadcast-timeout and squashes

spark git commit: [SPARK-4363][Doc] Update the Broadcast example

2014-11-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master dba140582 -> 861223ee5 [SPARK-4363][Doc] Update the Broadcast example Author: zsxwing Closes #3226 from zsxwing/SPARK-4363 and squashes the following commits: 8109914 [zsxwing] Update the Broadcast example Project: http://git-wip-us.ap

spark git commit: [SPARK-4363][Doc] Update the Broadcast example

2014-11-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.2 e27fa40ed -> 29a6da372 [SPARK-4363][Doc] Update the Broadcast example Author: zsxwing Closes #3226 from zsxwing/SPARK-4363 and squashes the following commits: 8109914 [zsxwing] Update the Broadcast example (cherry picked from commit

spark git commit: [SPARK-4379][Core] Change Exception to SparkException in checkpoint

2014-11-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.2 306e68cf0 -> e27fa40ed [SPARK-4379][Core] Change Exception to SparkException in checkpoint It's better to change to SparkException. However, it's a breaking change since it will change the exception type. Author: zsxwing Closes #324

spark git commit: [SPARK-4379][Core] Change Exception to SparkException in checkpoint

2014-11-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master 7fe08b43c -> dba140582 [SPARK-4379][Core] Change Exception to SparkException in checkpoint It's better to change to SparkException. However, it's a breaking change since it will change the exception type. Author: zsxwing Closes #3241 fr

spark git commit: [SPARK-4415] [PySpark] JVM should exit after Python exit

2014-11-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 303a4e4d2 -> 7fe08b43c [SPARK-4415] [PySpark] JVM should exit after Python exit When JVM is started in a Python process, it should exit once the stdin is closed. test: add spark.driver.memory in conf/spark-defaults.conf ``` daviesdm:~/wo

spark git commit: [SPARK-4415] [PySpark] JVM should exit after Python exit

2014-11-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.2 118c89c28 -> 306e68cf0 [SPARK-4415] [PySpark] JVM should exit after Python exit When JVM is started in a Python process, it should exit once the stdin is closed. test: add spark.driver.memory in conf/spark-defaults.conf ``` daviesdm:

spark git commit: [SPARK-4404]SparkSubmitDriverBootstrapper should stop after its SparkSubmit sub-proc...

2014-11-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.2 c425e31ad -> 118c89c28 [SPARK-4404]SparkSubmitDriverBootstrapper should stop after its SparkSubmit sub-proc... ...ess ends https://issues.apache.org/jira/browse/SPARK-4404 When we have spark.driver.extra* or spark.driver.memory in S

spark git commit: [SPARK-4404]SparkSubmitDriverBootstrapper should stop after its SparkSubmit sub-proc...

2014-11-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master ad42b2832 -> 303a4e4d2 [SPARK-4404]SparkSubmitDriverBootstrapper should stop after its SparkSubmit sub-proc... ...ess ends https://issues.apache.org/jira/browse/SPARK-4404 When we have spark.driver.extra* or spark.driver.memory in SPARK

spark git commit: SPARK-4214. With dynamic allocation, avoid outstanding requests for more...

2014-11-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.2 ef39ec419 -> c425e31ad SPARK-4214. With dynamic allocation, avoid outstanding requests for more... ... executors than pending tasks need. WIP. Still need to add and fix tests. Author: Sandy Ryza Closes #3204 from sryza/sandy-spark-4

spark git commit: SPARK-4214. With dynamic allocation, avoid outstanding requests for more...

2014-11-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 37482ce5a -> ad42b2832 SPARK-4214. With dynamic allocation, avoid outstanding requests for more... ... executors than pending tasks need. WIP. Still need to add and fix tests. Author: Sandy Ryza Closes #3204 from sryza/sandy-spark-4214

spark git commit: [SPARK-4412][SQL] Fix Spark's control of Parquet logging.

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 63ca3af66 -> 37482ce5a [SPARK-4412][SQL] Fix Spark's control of Parquet logging. The Spark ParquetRelation.scala code makes the assumption that the parquet.Log class has already been loaded. If ParquetRelation.enableLogForwarding executes

spark git commit: [SPARK-4412][SQL] Fix Spark's control of Parquet logging.

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 aa5d8e57c -> ef39ec419 [SPARK-4412][SQL] Fix Spark's control of Parquet logging. The Spark ParquetRelation.scala code makes the assumption that the parquet.Log class has already been loaded. If ParquetRelation.enableLogForwarding execu

spark git commit: [SPARK-4365][SQL] Remove unnecessary filter call on records returned from parquet library

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 7f242dc29 -> aa5d8e57c [SPARK-4365][SQL] Remove unnecessary filter call on records returned from parquet library Since parquet library has been updated , we no longer need to filter the records returned from parquet library for null r

spark git commit: [SPARK-4365][SQL] Remove unnecessary filter call on records returned from parquet library

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master f76b96837 -> 63ca3af66 [SPARK-4365][SQL] Remove unnecessary filter call on records returned from parquet library Since parquet library has been updated , we no longer need to filter the records returned from parquet library for null recor

spark git commit: [SPARK-4386] Improve performance when writing Parquet files.

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 1cac30083 -> 7f242dc29 [SPARK-4386] Improve performance when writing Parquet files. If you profile the writing of a Parquet file, the single worst time consuming call inside of org.apache.spark.sql.parquet.MutableRowWriteSupport.write

spark git commit: [SPARK-4386] Improve performance when writing Parquet files.

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 0c7b66bd4 -> f76b96837 [SPARK-4386] Improve performance when writing Parquet files. If you profile the writing of a Parquet file, the single worst time consuming call inside of org.apache.spark.sql.parquet.MutableRowWriteSupport.write is

spark git commit: [SPARK-4322][SQL] Enables struct fields as sub expressions of grouping fields

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 680bc0619 -> 1cac30083 [SPARK-4322][SQL] Enables struct fields as sub expressions of grouping fields While resolving struct fields, the resulted `GetField` expression is wrapped with an `Alias` to make it a named expression. Assume `a`

spark git commit: [SPARK-4322][SQL] Enables struct fields as sub expressions of grouping fields

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 4b4b50c9e -> 0c7b66bd4 [SPARK-4322][SQL] Enables struct fields as sub expressions of grouping fields While resolving struct fields, the resulted `GetField` expression is wrapped with an `Alias` to make it a named expression. Assume `a` is

spark git commit: [SQL] Don't shuffle code generated rows

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master f805025e8 -> 4b4b50c9e [SQL] Don't shuffle code generated rows When sort based shuffle and code gen are on we were trying to ship the code generated rows during a shuffle. This doesn't work because the classes don't exist on the other si

spark git commit: [SQL] Don't shuffle code generated rows

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 e35672e7e -> 680bc0619 [SQL] Don't shuffle code generated rows When sort based shuffle and code gen are on we were trying to ship the code generated rows during a shuffle. This doesn't work because the classes don't exist on the othe

spark git commit: [SQL] Minor cleanup of comments, errors and override.

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 576688aa2 -> e35672e7e [SQL] Minor cleanup of comments, errors and override. Author: Michael Armbrust Closes #3257 from marmbrus/minorCleanup and squashes the following commits: d8b5abc [Michael Armbrust] Use interpolation. 2fdf903 [

spark git commit: [SQL] Minor cleanup of comments, errors and override.

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master e47c38763 -> f805025e8 [SQL] Minor cleanup of comments, errors and override. Author: Michael Armbrust Closes #3257 from marmbrus/minorCleanup and squashes the following commits: d8b5abc [Michael Armbrust] Use interpolation. 2fdf903 [Mich

spark git commit: [SPARK-4391][SQL] Configure parquet filters using SQLConf

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 0dd924178 -> 576688aa2 [SPARK-4391][SQL] Configure parquet filters using SQLConf This is more uniform with the rest of SQL configuration and allows it to be turned on and off without restarting the SparkContext. In this PR I also turn

spark git commit: [SPARK-4391][SQL] Configure parquet filters using SQLConf

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master a0300ea32 -> e47c38763 [SPARK-4391][SQL] Configure parquet filters using SQLConf This is more uniform with the rest of SQL configuration and allows it to be turned on and off without restarting the SparkContext. In this PR I also turn of

spark git commit: [SPARK-4390][SQL] Handle NaN cast to decimal correctly

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 5b63158ac -> 0dd924178 [SPARK-4390][SQL] Handle NaN cast to decimal correctly Author: Michael Armbrust Closes #3256 from marmbrus/NanDecimal and squashes the following commits: 4c3ba46 [Michael Armbrust] fix style d360f83 [Michael Ar

spark git commit: [SPARK-4390][SQL] Handle NaN cast to decimal correctly

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 5930f64bf -> a0300ea32 [SPARK-4390][SQL] Handle NaN cast to decimal correctly Author: Michael Armbrust Closes #3256 from marmbrus/NanDecimal and squashes the following commits: 4c3ba46 [Michael Armbrust] fix style d360f83 [Michael Armbru

spark git commit: [SPARK-4062][Streaming]Add ReliableKafkaReceiver in Spark Streaming Kafka connector

2014-11-14 Thread tdas
Repository: spark Updated Branches: refs/heads/master 0cbdb01e1 -> 5930f64bf [SPARK-4062][Streaming]Add ReliableKafkaReceiver in Spark Streaming Kafka connector Add ReliableKafkaReceiver in Kafka connector to prevent data loss if WAL in Spark Streaming is enabled. Details and design doc can

spark git commit: [SPARK-4062][Streaming]Add ReliableKafkaReceiver in Spark Streaming Kafka connector

2014-11-14 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.2 f8810b6a5 -> 5b63158ac [SPARK-4062][Streaming]Add ReliableKafkaReceiver in Spark Streaming Kafka connector Add ReliableKafkaReceiver in Kafka connector to prevent data loss if WAL in Spark Streaming is enabled. Details and design doc

spark git commit: [SPARK-4333][SQL] Correctly log number of iterations in RuleExecutor

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 d90ddf12b -> f8810b6a5 [SPARK-4333][SQL] Correctly log number of iterations in RuleExecutor When iterator of RuleExecutor breaks, the num of iterator should be (iteration - 1) not (iteration ).Because log looks like "Fixed point reach

spark git commit: [SPARK-4333][SQL] Correctly log number of iterations in RuleExecutor

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master f5f757e4e -> 0cbdb01e1 [SPARK-4333][SQL] Correctly log number of iterations in RuleExecutor When iterator of RuleExecutor breaks, the num of iterator should be (iteration - 1) not (iteration ).Because log looks like "Fixed point reached f

spark git commit: SPARK-4375. no longer require -Pscala-2.10

2014-11-14 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.2 4bdeeb7d2 -> d90ddf12b SPARK-4375. no longer require -Pscala-2.10 It seems like the winds might have moved away from this approach, but wanted to post the PR anyway because I got it working and to show what it would look like. Author:

spark git commit: SPARK-4375. no longer require -Pscala-2.10

2014-11-14 Thread pwendell
Repository: spark Updated Branches: refs/heads/master bbd8f5bee -> f5f757e4e SPARK-4375. no longer require -Pscala-2.10 It seems like the winds might have moved away from this approach, but wanted to post the PR anyway because I got it working and to show what it would look like. Author: San

spark git commit: [SPARK-4245][SQL] Fix containsNull of the result ArrayType of CreateArray expression.

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 51b053a31 -> 4bdeeb7d2 [SPARK-4245][SQL] Fix containsNull of the result ArrayType of CreateArray expression. The `containsNull` of the result `ArrayType` of `CreateArray` should be `true` only if the children is empty or there exists

spark git commit: [SPARK-4245][SQL] Fix containsNull of the result ArrayType of CreateArray expression.

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master ade72c436 -> bbd8f5bee [SPARK-4245][SQL] Fix containsNull of the result ArrayType of CreateArray expression. The `containsNull` of the result `ArrayType` of `CreateArray` should be `true` only if the children is empty or there exists null

[2/2] spark git commit: [SPARK-4239] [SQL] support view in HiveQl

2014-11-14 Thread marmbrus
[SPARK-4239] [SQL] support view in HiveQl Currently still not support view like CREATE VIEW view3(valoo) TBLPROPERTIES ("fear" = "factor") AS SELECT upper(value) FROM src WHERE key=86; because the text in metastore for this view is like select \`_c0\` as \`valoo\` from (select upper(\`src\`.\`v

[1/2] spark git commit: [SPARK-4239] [SQL] support view in HiveQl

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 e7f957437 -> 51b053a31 http://git-wip-us.apache.org/repos/asf/spark/blob/51b053a3/sql/hive/src/test/resources/golden/view_cast-8-2cc0c576f0a008abf5bdf3308d500869 -- diff

[2/2] spark git commit: [SPARK-4239] [SQL] support view in HiveQl

2014-11-14 Thread marmbrus
[SPARK-4239] [SQL] support view in HiveQl Currently still not support view like CREATE VIEW view3(valoo) TBLPROPERTIES ("fear" = "factor") AS SELECT upper(value) FROM src WHERE key=86; because the text in metastore for this view is like select \`_c0\` as \`valoo\` from (select upper(\`src\`.\`v

[1/2] spark git commit: [SPARK-4239] [SQL] support view in HiveQl

2014-11-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master c258db9ed -> ade72c436 http://git-wip-us.apache.org/repos/asf/spark/blob/ade72c43/sql/hive/src/test/resources/golden/view_cast-8-2cc0c576f0a008abf5bdf3308d500869 -- diff --g

spark git commit: Update failed assert text to match code in SizeEstimatorSuite

2014-11-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.2 88278241e -> e7f957437 Update failed assert text to match code in SizeEstimatorSuite Author: Jeff Hammerbacher Closes #3242 from hammer/patch-1 and squashes the following commits: f88d635 [Jeff Hammerbacher] Update failed assert text

spark git commit: Update failed assert text to match code in SizeEstimatorSuite

2014-11-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 156cf -> c258db9ed Update failed assert text to match code in SizeEstimatorSuite Author: Jeff Hammerbacher Closes #3242 from hammer/patch-1 and squashes the following commits: f88d635 [Jeff Hammerbacher] Update failed assert text to

spark git commit: [SPARK-4313][WebUI][Yarn] Fix link issue of the executor thread dump page in yarn-cluster mode

2014-11-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.2 204eaf165 -> 88278241e [SPARK-4313][WebUI][Yarn] Fix link issue of the executor thread dump page in yarn-cluster mode In yarn-cluster mode, the Web UI is running behind a yarn proxy server. Some features(or bugs?) of yarn proxy server

spark git commit: [SPARK-4313][WebUI][Yarn] Fix link issue of the executor thread dump page in yarn-cluster mode

2014-11-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 5c265ccde -> 156cf [SPARK-4313][WebUI][Yarn] Fix link issue of the executor thread dump page in yarn-cluster mode In yarn-cluster mode, the Web UI is running behind a yarn proxy server. Some features(or bugs?) of yarn proxy server wil

spark git commit: SPARK-3663 Document SPARK_LOG_DIR and SPARK_PID_DIR

2014-11-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 0c56a039a -> 5c265ccde SPARK-3663 Document SPARK_LOG_DIR and SPARK_PID_DIR These descriptions are from the header of spark-daemon.sh Author: Andrew Ash Closes #2518 from ash211/SPARK-3663 and squashes the following commits: 058b257 [And

spark git commit: SPARK-3663 Document SPARK_LOG_DIR and SPARK_PID_DIR

2014-11-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.2 d579b3989 -> 204eaf165 SPARK-3663 Document SPARK_LOG_DIR and SPARK_PID_DIR These descriptions are from the header of spark-daemon.sh Author: Andrew Ash Closes #2518 from ash211/SPARK-3663 and squashes the following commits: 058b257

spark git commit: [Spark Core] SPARK-4380 Edit spilling log from MB to B

2014-11-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.2 3014803ea -> d579b3989 [Spark Core] SPARK-4380 Edit spilling log from MB to B https://issues.apache.org/jira/browse/SPARK-4380 Author: Hong Shen Closes #3243 from shenh062326/spark_change and squashes the following commits: 4653378

spark git commit: [Spark Core] SPARK-4380 Edit spilling log from MB to B

2014-11-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master abd581752 -> 0c56a039a [Spark Core] SPARK-4380 Edit spilling log from MB to B https://issues.apache.org/jira/browse/SPARK-4380 Author: Hong Shen Closes #3243 from shenh062326/spark_change and squashes the following commits: 4653378 [Hon

spark git commit: [SPARK-4398][PySpark] specialize sc.parallelize(xrange)

2014-11-14 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.2 3219271f4 -> 3014803ea [SPARK-4398][PySpark] specialize sc.parallelize(xrange) `sc.parallelize(range(1 << 20), 1).count()` may take 15 seconds to finish and the rdd object stores the entire list, making task size very large. This PR a

spark git commit: [SPARK-4398][PySpark] specialize sc.parallelize(xrange)

2014-11-14 Thread meng
Repository: spark Updated Branches: refs/heads/master 77e845ca7 -> abd581752 [SPARK-4398][PySpark] specialize sc.parallelize(xrange) `sc.parallelize(range(1 << 20), 1).count()` may take 15 seconds to finish and the rdd object stores the entire list, making task size very large. This PR adds

spark git commit: Revert "[SPARK-2703][Core]Make Tachyon related unit tests execute without deploying a Tachyon system locally."

2014-11-14 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.2 39257ca1b -> 3219271f4 Revert "[SPARK-2703][Core]Make Tachyon related unit tests execute without deploying a Tachyon system locally." This reverts commit c127ff8c87fc4f3aa6f09697928832dc6d37cc0f. Project: http://git-wip-us.apache.org

spark git commit: [SPARK-4394][SQL] Data Sources API Improvements

2014-11-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.2 f1e7d1c2c -> 39257ca1b [SPARK-4394][SQL] Data Sources API Improvements This PR adds two features to the data sources API: - Support for pushing down `IN` filters - The ability for relations to optionally provide information about thei

spark git commit: [SPARK-4394][SQL] Data Sources API Improvements

2014-11-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master e421072da -> 77e845ca7 [SPARK-4394][SQL] Data Sources API Improvements This PR adds two features to the data sources API: - Support for pushing down `IN` filters - The ability for relations to optionally provide information about their `

spark git commit: [SPARK-3722][Docs]minor improvement and fix in docs

2014-11-14 Thread tgraves
Repository: spark Updated Branches: refs/heads/master 825709a0b -> e421072da [SPARK-3722][Docs]minor improvement and fix in docs https://issues.apache.org/jira/browse/SPARK-3722 Author: WangTao Closes #2579 from WangTaoTheTonic/docsWork and squashes the following commits: 6f91cec [WangTao]