git commit: [SPARK-1720][SPARK-1719] use LD_LIBRARY_PATH instead of -Djava.library.path

2014-10-30 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 123425807 - cd739bd75 [SPARK-1720][SPARK-1719] use LD_LIBRARY_PATH instead of -Djava.library.path - [X] Standalone - [X] YARN - [X] Mesos - [X] Mac OS X - [X] Linux - [ ] Windows This is another implementation about #1031 Author:

git commit: [SPARK-4102] Remove unused ShuffleReader.stop() method.

2014-10-30 Thread kayousterhout
Repository: spark Updated Branches: refs/heads/master cd739bd75 - 6db315746 [SPARK-4102] Remove unused ShuffleReader.stop() method. This method is not implemented by the only subclass (HashShuffleReader), nor is it ever called. While the use of Scala's fancy ??? was pretty exciting, the

git commit: [SPARK-4130][MLlib] Fixing libSVM parser bug with extra whitespace

2014-10-30 Thread meng
Repository: spark Updated Branches: refs/heads/master 6db315746 - c7ad08520 [SPARK-4130][MLlib] Fixing libSVM parser bug with extra whitespace This simple patch filters out extra whitespace entries. Author: Joseph E. Gonzalez joseph.e.gonza...@gmail.com Author: Joey

git commit: SPARK-4111 [MLlib] add regression metrics

2014-10-30 Thread meng
Repository: spark Updated Branches: refs/heads/master c7ad08520 - d9327192e SPARK-4111 [MLlib] add regression metrics Add RegressionMetrics.scala as regression metrics used for evaluation and corresponding test case RegressionMetricsSuite.scala. Author: Yanbo Liang yanboha...@gmail.com

git commit: [SPARK-4028][Streaming] ReceivedBlockHandler interface to abstract the functionality of storage of received data

2014-10-30 Thread tdas
Repository: spark Updated Branches: refs/heads/master d9327192e - 234de9232 [SPARK-4028][Streaming] ReceivedBlockHandler interface to abstract the functionality of storage of received data As part of the initiative to prevent data loss on streaming driver failure, this JIRA tracks the

git commit: [SPARK-4027][Streaming] WriteAheadLogBackedBlockRDD to read received either from BlockManager or WAL in HDFS

2014-10-30 Thread tdas
Repository: spark Updated Branches: refs/heads/master 234de9232 - fb1fbca20 [SPARK-4027][Streaming] WriteAheadLogBackedBlockRDD to read received either from BlockManager or WAL in HDFS As part of the initiative of preventing data loss on streaming driver failure, this sub-task implements a

git commit: [SPARK-4078] New FsPermission instance w/o FsPermission.createImmutable in eventlog

2014-10-30 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master fb1fbca20 - 9142c9b80 [SPARK-4078] New FsPermission instance w/o FsPermission.createImmutable in eventlog By default, Spark builds its package against Hadoop 1.0.4 version. In that version, it has some FsPermission bug (see [HADOOP-7629]

git commit: [SPARK-3319] [SPARK-3338] Resolve Spark submit config paths

2014-10-30 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 9142c9b80 - 24c512925 [SPARK-3319] [SPARK-3338] Resolve Spark submit config paths The bulk of this PR is comprised of tests. All changes in functionality are made in `SparkSubmit.scala` (~20 lines). **SPARK-3319.** There is currently a

git commit: [SPARK-4138][SPARK-4139] Improve dynamic allocation settings

2014-10-30 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 24c512925 - 26f092d4e [SPARK-4138][SPARK-4139] Improve dynamic allocation settings This should be merged after #2746 (SPARK-3795). **SPARK-4138**. If the user sets both the number of executors and `spark.dynamicAllocation.enabled`, we

git commit: [Minor] A few typos in comments and log messages

2014-10-30 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 26f092d4e - 5231a3f22 [Minor] A few typos in comments and log messages Author: Andrew Or andrewo...@gmail.com Author: Andrew Or and...@databricks.com Closes #3021 from andrewor14/typos and squashes the following commits: daaf417 [Andrew

git commit: [SPARK-4155] Consolidate usages of driver

2014-10-30 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 5231a3f22 - 9334d6996 [SPARK-4155] Consolidate usages of driver We use \driver\ everywhere. Let's not do that. Author: Andrew Or and...@databricks.com Closes #3020 from andrewor14/consolidate-driver and squashes the following commits:

git commit: Minor style hot fix after #2711

2014-10-30 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 9334d6996 - 849b43ec0 Minor style hot fix after #2711 I had planned to fix this when I merged it but I forgot to. witgo Author: Andrew Or and...@databricks.com Closes #3018 from andrewor14/command-utils-style and squashes the following

git commit: [SPARK-4153][WebUI] Update the sort keys for HistoryPage

2014-10-30 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 849b43ec0 - d34505783 [SPARK-4153][WebUI] Update the sort keys for HistoryPage Sort Started, Completed, Duration and Last Updated by time. Author: zsxwing zsxw...@gmail.com Closes #3014 from zsxwing/SPARK-4153 and squashes the following

git commit: [SPARK-3661] Respect spark.*.memory in cluster mode

2014-10-30 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master d34505783 - 2f5454381 [SPARK-3661] Respect spark.*.memory in cluster mode This also includes minor re-organization of the code. Tested locally in both client and deploy modes. Author: Andrew Or and...@databricks.com Author: Andrew Or

git commit: SPARK-1209 [CORE] SparkHadoop{MapRed, MapReduce}Util should not use package org.apache.hadoop

2014-10-30 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 2f5454381 - 68cb69daf SPARK-1209 [CORE] SparkHadoop{MapRed,MapReduce}Util should not use package org.apache.hadoop (This is just a look at what completely moving the classes would look like. I know Patrick flagged that as maybe not OK,

git commit: [SPARK-4120][SQL] Join of multiple tables with syntax like SELECT .. FROM T1, T2, T3.. does not work in SparkSQL

2014-10-30 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 68cb69daf - 9b6ebe33d [SPARK-4120][SQL] Join of multiple tables with syntax like SELECT .. FROM T1,T2,T3.. does not work in SparkSQL Right now it works for only 2 tables like below query. sql(SELECT * FROM records1 as a,records2 as b

git commit: [SPARK-3968][SQL] Use parquet-mr filter2 api

2014-10-30 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 9b6ebe33d - 2e35e2429 [SPARK-3968][SQL] Use parquet-mr filter2 api The parquet-mr project has introduced a new filter api (https://github.com/apache/incubator-parquet-mr/pull/4), along with several fixes . It can also eliminate entire

git commit: Revert SPARK-1209 [CORE] SparkHadoop{MapRed, MapReduce}Util should not use package org.apache.hadoop

2014-10-30 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 2e35e2429 - 26d31d15f Revert SPARK-1209 [CORE] SparkHadoop{MapRed,MapReduce}Util should not use package org.apache.hadoop This reverts commit 68cb69daf3022e973422e496ccf827ca3806ff30. Project:

git commit: HOTFIX: Clean up build in network module.

2014-10-30 Thread adav
Repository: spark Updated Branches: refs/heads/master 26d31d15f - 0734d0932 HOTFIX: Clean up build in network module. This is currently breaking the package build for some people (including me). This patch does some general clean-up which also fixes the current issue. - Uses consistent

git commit: [SPARK-4124] [MLlib] [PySpark] simplify serialization in MLlib Python API

2014-10-30 Thread meng
Repository: spark Updated Branches: refs/heads/master 0734d0932 - 872fc669b [SPARK-4124] [MLlib] [PySpark] simplify serialization in MLlib Python API Create several helper functions to call MLlib Java API, convert the arguments to Java type and convert return value to Python object

git commit: [SPARK-3250] Implement Gap Sampling optimization for random sampling

2014-10-30 Thread meng
Repository: spark Updated Branches: refs/heads/master 872fc669b - ad3bd0dff [SPARK-3250] Implement Gap Sampling optimization for random sampling More efficient sampling, based on Gap Sampling optimization: http://erikerlandson.github.io/blog/2014/09/11/faster-random-samples-with-gap-sampling/