[jira] [Commented] (SPARK-1600) flaky "recovery with file input stream" test in streaming.CheckpointSuite

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247902#comment-14247902 ] Josh Rosen commented on SPARK-1600: --- I'm planning to address this as part of https://gi

[jira] [Commented] (SPARK-4790) Flaky test in ReceivedBlockTrackerSuite: "block addition, block to batch allocation, and cleanup with write ahead log"

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247900#comment-14247900 ] Josh Rosen commented on SPARK-4790: --- /cc [~hshreedharan], you might want to take a look

[jira] [Created] (SPARK-4860) Improve performance of sample() and takeSample() on SchemaRDD

2014-12-15 Thread Davies Liu (JIRA)
Davies Liu created SPARK-4860: - Summary: Improve performance of sample() and takeSample() on SchemaRDD Key: SPARK-4860 URL: https://issues.apache.org/jira/browse/SPARK-4860 Project: Spark Issue

[jira] [Updated] (SPARK-4841) Batch serializer bug in PySpark's RDD.zip

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4841: -- Fix Version/s: 1.3.0 Labels: backport-needed (was: ) I've merged https://github.com/apache/s

[jira] [Updated] (SPARK-4792) Add some checks and messages on making local dir

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4792: -- Assignee: meiyoula > Add some checks and messages on making local dir >

[jira] [Resolved] (SPARK-4792) Add some checks and messages on making local dir

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-4792. --- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 3635 [https://github.com/

[jira] [Commented] (SPARK-4859) Improve StreamingListenerBus

2014-12-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247801#comment-14247801 ] Apache Spark commented on SPARK-4859: - User 'zsxwing' has created a pull request for t

[jira] [Created] (SPARK-4859) Improve StreamingListenerBus

2014-12-15 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-4859: --- Summary: Improve StreamingListenerBus Key: SPARK-4859 URL: https://issues.apache.org/jira/browse/SPARK-4859 Project: Spark Issue Type: Improvement Co

[jira] [Commented] (SPARK-4858) Add an option to turn off a progress bar in spark-shell

2014-12-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247722#comment-14247722 ] Apache Spark commented on SPARK-4858: - User 'maropu' has created a pull request for th

[jira] [Created] (SPARK-4858) Add an option to turn off a progress bar in spark-shell

2014-12-15 Thread Takeshi Yamamuro (JIRA)
Takeshi Yamamuro created SPARK-4858: --- Summary: Add an option to turn off a progress bar in spark-shell Key: SPARK-4858 URL: https://issues.apache.org/jira/browse/SPARK-4858 Project: Spark I

[jira] [Commented] (SPARK-4838) StackOverflowError when serialization task

2014-12-15 Thread Hong Shen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247676#comment-14247676 ] Hong Shen commented on SPARK-4838: -- This is the whole stack. All we can know is it thow f

[jira] [Commented] (SPARK-4855) Python tests for hypothesis testing

2014-12-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247665#comment-14247665 ] Apache Spark commented on SPARK-4855: - User 'jbencook' has created a pull request for

[jira] [Commented] (SPARK-4838) StackOverflowError when serialization task

2014-12-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247662#comment-14247662 ] Sean Owen commented on SPARK-4838: -- [~shenhong] This stack trace is very large but does n

[jira] [Commented] (SPARK-4844) SGD should support custom sampling.

2014-12-15 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247652#comment-14247652 ] Guoqiang Li commented on SPARK-4844: Sorry, I mean, all of the data need to be seriali

[jira] [Commented] (SPARK-4838) StackOverflowError when serialization task

2014-12-15 Thread Hong Shen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247629#comment-14247629 ] Hong Shen commented on SPARK-4838: -- Here is the sql, it contain 2928 partitions. {code:ti

[jira] [Commented] (SPARK-3702) Standardize MLlib classes for learners, models

2014-12-15 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247582#comment-14247582 ] Joseph K. Bradley commented on SPARK-3702: -- I'm canceling my WIP PR for this sinc

[jira] [Created] (SPARK-4857) Add Executor Events to SparkListener

2014-12-15 Thread Kostas Sakellis (JIRA)
Kostas Sakellis created SPARK-4857: -- Summary: Add Executor Events to SparkListener Key: SPARK-4857 URL: https://issues.apache.org/jira/browse/SPARK-4857 Project: Spark Issue Type: Improvemen

[jira] [Commented] (SPARK-4857) Add Executor Events to SparkListener

2014-12-15 Thread Kostas Sakellis (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247578#comment-14247578 ] Kostas Sakellis commented on SPARK-4857: I'll work on this. > Add Executor Events

[jira] [Commented] (SPARK-4856) Null & empty string should not be considered as StringType at begining in Json schema inferring

2014-12-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247571#comment-14247571 ] Apache Spark commented on SPARK-4856: - User 'chenghao-intel' has created a pull reques

[jira] [Updated] (SPARK-4856) Null & empty string should not be considered as StringType at begining in Json schema inferring

2014-12-15 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-4856: - Description: We have data like: {code:java} TestSQLContext.sparkContext.parallelize( """{"ip":"27.31.100

[jira] [Updated] (SPARK-4856) Null & empty string should not be considered as StringType at begining in Json schema inferring

2014-12-15 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-4856: - Description: We have data like: {panel} TestSQLContext.sparkContext.parallelize( """{"ip":"27.31.100.29"

[jira] [Created] (SPARK-4856) Null & empty string should not be considered as StringType at begining in Json schema inferring

2014-12-15 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-4856: Summary: Null & empty string should not be considered as StringType at begining in Json schema inferring Key: SPARK-4856 URL: https://issues.apache.org/jira/browse/SPARK-4856

[jira] [Updated] (SPARK-4814) Enable assertions in SBT, Maven tests / AssertionError from Hive's LazyBinaryInteger

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4814: -- Target Version/s: 1.0.3 (was: 1.3.0) Assignee: Sean Owen Labels: backport-need

[jira] [Updated] (SPARK-4814) Enable assertions in SBT, Maven tests / AssertionError from Hive's LazyBinaryInteger

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4814: -- Fix Version/s: 1.2.1 1.1.2 1.3.0 > Enable assertions in SBT, Maven

[jira] [Created] (SPARK-4855) Python tests for hypothesis testing

2014-12-15 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-4855: Summary: Python tests for hypothesis testing Key: SPARK-4855 URL: https://issues.apache.org/jira/browse/SPARK-4855 Project: Spark Issue Type: Test

[jira] [Closed] (SPARK-2980) Python support for chi-squared test

2014-12-15 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley closed SPARK-2980. Resolution: Duplicate Fix Version/s: 1.2.0 Assignee: Davies Liu > Python sup

[jira] [Commented] (SPARK-2980) Python support for chi-squared test

2014-12-15 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247505#comment-14247505 ] Joseph K. Bradley commented on SPARK-2980: -- Duplicated by later JIRA which has be

[jira] [Updated] (SPARK-4793) way to find assembly jar is too strict

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4793: -- Labels: backport-needed (was: ) > way to find assembly jar is too strict >

[jira] [Resolved] (SPARK-4323) Utils#fetchFile method should close lock file certainly

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-4323. --- Resolution: Not a Problem Resolving this as "Not a Problem" since the pull request was closed after so

[jira] [Updated] (SPARK-4232) Truncate table not works when specific the table from non-current database session

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4232: -- Target Version/s: 1.1.2, 1.2.1 (was: 1.1.0) Fix Version/s: (was: 1.1.2)

[jira] [Updated] (SPARK-4151) Add string operation function trim, ltrim, rtrim, length to support SparkSql (HiveQL)

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4151: -- Target Version/s: 1.1.2 (was: 1.1.0) Fix Version/s: (was: 1.1.2) (w

[jira] [Commented] (SPARK-4846) When the vocabulary size is large, Word2Vec may yield "OutOfMemoryError: Requested array size exceeds VM limit"

2014-12-15 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247487#comment-14247487 ] Joseph K. Bradley commented on SPARK-4846: -- I agree with [~srowen] that the curre

[jira] [Updated] (SPARK-4265) Better extensibility for TaskEndReason

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4265: -- Fix Version/s: (was: 1.1.2) Removing the "Fix Version/s" field, since we reserve that for versions w

[jira] [Updated] (SPARK-4320) JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4320: -- Target Version/s: 1.1.2, 1.2.1 Fix Version/s: (was: 1.1.2) (was: 1.2

[jira] [Updated] (SPARK-4355) OnlineSummarizer doesn't merge mean correctly

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4355: -- Fix Version/s: (was: 1.1.2) 1.1.1 > OnlineSummarizer doesn't merge mean correctly

[jira] [Updated] (SPARK-2823) GraphX jobs throw IllegalArgumentException

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-2823: -- Target Version/s: 1.0.3, 1.3.0, 1.1.2, 1.2.1 > GraphX jobs throw IllegalArgumentException >

[jira] [Updated] (SPARK-2823) GraphX jobs throw IllegalArgumentException

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-2823: -- Fix Version/s: (was: 1.1.2) (was: 1.0.3) (was: 1.2.0)

[jira] [Commented] (SPARK-2823) GraphX jobs throw IllegalArgumentException

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247480#comment-14247480 ] Josh Rosen commented on SPARK-2823: --- I've removed the "Fix Versions" from this JIRA beca

[jira] [Updated] (SPARK-4148) PySpark's sample uses the same seed for all partitions

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4148: -- Fix Version/s: (was: 1.1.2) 1.1.1 > PySpark's sample uses the same seed for all p

[jira] [Updated] (SPARK-3987) NNLS generates incorrect result

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-3987: -- Fix Version/s: (was: 1.1.2) 1.1.1 > NNLS generates incorrect result > ---

[jira] [Updated] (SPARK-4006) Spark Driver crashes whenever an Executor is registered twice

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4006: -- Fix Version/s: (was: 1.1.2) 1.1.1 > Spark Driver crashes whenever an Executor is

[jira] [Updated] (SPARK-3901) Add SocketSink capability for Spark metrics

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-3901: -- Fix Version/s: (was: 1.1.2) > Add SocketSink capability for Spark metrics >

[jira] [Updated] (SPARK-785) ClosureCleaner not invoked on most PairRDDFunctions

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-785: - Fix Version/s: (was: 1.1.1) 1.1.2 > ClosureCleaner not invoked on most PairRDDFuncti

[jira] [Updated] (SPARK-4006) Spark Driver crashes whenever an Executor is registered twice

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4006: -- Fix Version/s: (was: 1.1.1) 1.1.2 > Spark Driver crashes whenever an Executor is

[jira] [Updated] (SPARK-2823) GraphX jobs throw IllegalArgumentException

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-2823: -- Fix Version/s: (was: 1.1.1) 1.1.2 > GraphX jobs throw IllegalArgumentException >

[jira] [Updated] (SPARK-3987) NNLS generates incorrect result

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-3987: -- Fix Version/s: (was: 1.1.1) 1.1.2 > NNLS generates incorrect result > ---

[jira] [Updated] (SPARK-4148) PySpark's sample uses the same seed for all partitions

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4148: -- Fix Version/s: (was: 1.1.1) 1.1.2 > PySpark's sample uses the same seed for all p

[jira] [Updated] (SPARK-4355) OnlineSummarizer doesn't merge mean correctly

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4355: -- Fix Version/s: (was: 1.1.1) 1.1.2 > OnlineSummarizer doesn't merge mean correctly

[jira] [Updated] (SPARK-4265) Better extensibility for TaskEndReason

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4265: -- Fix Version/s: (was: 1.1.1) 1.1.2 > Better extensibility for TaskEndReason >

[jira] [Updated] (SPARK-4151) Add string operation function trim, ltrim, rtrim, length to support SparkSql (HiveQL)

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4151: -- Fix Version/s: (was: 1.1.1) 1.1.2 > Add string operation function trim, ltrim, rt

[jira] [Updated] (SPARK-3901) Add SocketSink capability for Spark metrics

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-3901: -- Fix Version/s: (was: 1.1.1) 1.1.2 > Add SocketSink capability for Spark metrics >

[jira] [Updated] (SPARK-4320) JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4320: -- Fix Version/s: (was: 1.1.1) 1.1.2 > JavaPairRDD should supply a saveAsNewHadoopDa

[jira] [Updated] (SPARK-4232) Truncate table not works when specific the table from non-current database session

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4232: -- Fix Version/s: (was: 1.1.1) 1.1.2 > Truncate table not works when specific the ta

[jira] [Updated] (SPARK-785) ClosureCleaner not invoked on most PairRDDFunctions

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-785: - Fix Version/s: 1.3.0 1.0.3 0.9.3 1.1.1 > ClosureCl

[jira] [Updated] (SPARK-785) ClosureCleaner not invoked on most PairRDDFunctions

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-785: - Labels: backport-needed (was: ) > ClosureCleaner not invoked on most PairRDDFunctions > --

[jira] [Updated] (SPARK-785) ClosureCleaner not invoked on most PairRDDFunctions

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-785: - Target Version/s: 1.2.1 Assignee: Sean Owen > ClosureCleaner not invoked on most PairRDDFunctio

[jira] [Commented] (SPARK-785) ClosureCleaner not invoked on most PairRDDFunctions

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247473#comment-14247473 ] Josh Rosen commented on SPARK-785: -- I've merged https://github.com/apache/spark/pull/3690

[jira] [Commented] (SPARK-4501) Create build/mvn to automatically download maven/zinc/scalac

2014-12-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247448#comment-14247448 ] Apache Spark commented on SPARK-4501: - User 'brennonyork' has created a pull request f

[jira] [Commented] (SPARK-1216) Add a OneHotEncoder for handling categorical features

2014-12-15 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247414#comment-14247414 ] Joseph K. Bradley commented on SPARK-1216: -- (Addressing old comments I just saw n

[jira] [Updated] (SPARK-4841) Batch serializer bug in PySpark's RDD.zip

2014-12-15 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-4841: - Priority: Blocker (was: Major) > Batch serializer bug in PySpark's RDD.zip >

[jira] [Commented] (SPARK-4510) Add k-medoids Partitioning Around Medoids (PAM) algorithm

2014-12-15 Thread Fan Jiang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247389#comment-14247389 ] Fan Jiang commented on SPARK-4510: -- We propose include a user configurable parameter that

[jira] [Resolved] (SPARK-4668) Fix documentation typos

2014-12-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4668. Resolution: Fixed Fix Version/s: 1.2.0 Assignee: Ryan Williams > Fix documen

[jira] [Updated] (SPARK-1037) the name of findTaskFromList & findTask in TaskSetManager.scala is confusing

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-1037: -- Assignee: Ilya Ganelin (was: Josh Rosen) > the name of findTaskFromList & findTask in TaskSetManager.sc

[jira] [Resolved] (SPARK-1037) the name of findTaskFromList & findTask in TaskSetManager.scala is confusing

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-1037. --- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 3665 [https://github.com/

[jira] [Assigned] (SPARK-1037) the name of findTaskFromList & findTask in TaskSetManager.scala is confusing

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reassigned SPARK-1037: - Assignee: Josh Rosen > the name of findTaskFromList & findTask in TaskSetManager.scala is confusi

[jira] [Updated] (SPARK-1037) the name of findTaskFromList & findTask in TaskSetManager.scala is confusing

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-1037: -- Description: the name of these two functions is confusing though in the comments the author said that

[jira] [Commented] (SPARK-4841) Batch serializer bug in PySpark's RDD.zip

2014-12-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247376#comment-14247376 ] Apache Spark commented on SPARK-4841: - User 'davies' has created a pull request for th

[jira] [Resolved] (SPARK-4826) Possible flaky tests in WriteAheadLogBackedBlockRDDSuite: "java.lang.IllegalStateException: File exists and there is no append support!"

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-4826. --- Resolution: Fixed Fix Version/s: 1.2.1 1.3.0 Issue resolved by pull request

[jira] [Assigned] (SPARK-4826) Possible flaky tests in WriteAheadLogBackedBlockRDDSuite: "java.lang.IllegalStateException: File exists and there is no append support!"

2014-12-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reassigned SPARK-4826: - Assignee: Josh Rosen (was: Tathagata Das) > Possible flaky tests in WriteAheadLogBackedBlockRDDS

[jira] [Commented] (SPARK-4834) Spark fails to clean up cache / lock files in local dirs

2014-12-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247348#comment-14247348 ] Apache Spark commented on SPARK-4834: - User 'vanzin' has created a pull request for th

[jira] [Updated] (SPARK-4494) IDFModel.transform() add support for single vector

2014-12-15 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-4494: - Assignee: Yu Ishikawa > IDFModel.transform() add support for single vector > -

[jira] [Resolved] (SPARK-4494) IDFModel.transform() add support for single vector

2014-12-15 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-4494. -- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 3603 [https://githu

[jira] [Commented] (SPARK-4510) Add k-medoids Partitioning Around Medoids (PAM) algorithm

2014-12-15 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247269#comment-14247269 ] Xiangrui Meng commented on SPARK-4510: -- The N^2 factor was what I was worried about.

[jira] [Commented] (SPARK-4605) Proposed Contribution: Spark Kernel to enable interactive Spark applications

2014-12-15 Thread Chip Senkbeil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247235#comment-14247235 ] Chip Senkbeil commented on SPARK-4605: -- [~rdhyee], the short answer is no. The Spark

[jira] [Commented] (SPARK-4849) Pass partitioning information (distribute by) to In-memory caching

2014-12-15 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247184#comment-14247184 ] Michael Armbrust commented on SPARK-4849: - The trick here will be to make sure tha

[jira] [Commented] (SPARK-2121) Not fully cached when there is enough memory in ALS

2014-12-15 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247167#comment-14247167 ] sam commented on SPARK-2121: I've tried increasing spark.yarn.executor.memoryOverhead but I ge

[jira] [Commented] (SPARK-2121) Not fully cached when there is enough memory in ALS

2014-12-15 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247148#comment-14247148 ] sam commented on SPARK-2121: I seem to be getting this problem too, I have a job that should b

[jira] [Updated] (SPARK-4841) Batch serializer bug in PySpark's RDD.zip

2014-12-15 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-4841: - Assignee: Davies Liu > Batch serializer bug in PySpark's RDD.zip > ---

[jira] [Commented] (SPARK-4854) Custom UDTF with Lateral View throws ClassNotFound exception in Spark SQL CLI

2014-12-15 Thread Shenghua Wan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247087#comment-14247087 ] Shenghua Wan commented on SPARK-4854: - both are ClassNotFound for custom UDTF. However

[jira] [Created] (SPARK-4854) Custom UDTF with Lateral View throws ClassNotFound exception in Spark SQL CLI

2014-12-15 Thread Shenghua Wan (JIRA)
Shenghua Wan created SPARK-4854: --- Summary: Custom UDTF with Lateral View throws ClassNotFound exception in Spark SQL CLI Key: SPARK-4854 URL: https://issues.apache.org/jira/browse/SPARK-4854 Project: Sp

[jira] [Resolved] (SPARK-4810) Failed to run collect

2014-12-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4810. Resolution: Invalid > Failed to run collect > - > > Key:

[jira] [Commented] (SPARK-4826) Possible flaky tests in WriteAheadLogBackedBlockRDDSuite: "java.lang.IllegalStateException: File exists and there is no append support!"

2014-12-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247082#comment-14247082 ] Apache Spark commented on SPARK-4826: - User 'JoshRosen' has created a pull request for

[jira] [Commented] (SPARK-4810) Failed to run collect

2014-12-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247081#comment-14247081 ] Patrick Wendell commented on SPARK-4810: Actually can I suggest we move this to th

[jira] [Commented] (SPARK-4826) Possible flaky tests in WriteAheadLogBackedBlockRDDSuite: "java.lang.IllegalStateException: File exists and there is no append support!"

2014-12-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247043#comment-14247043 ] Patrick Wendell commented on SPARK-4826: I pushed a hotfix disabling these tests,

[jira] [Commented] (SPARK-4837) NettyBlockTransferService does not abide by spark.blockManager.port config option

2014-12-15 Thread Andrew Ash (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247030#comment-14247030 ] Andrew Ash commented on SPARK-4837: --- Ok that's fair -- a release note and targeting 1.2.

[jira] [Updated] (SPARK-4837) NettyBlockTransferService does not abide by spark.blockManager.port config option

2014-12-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4837: --- Target Version/s: 1.2.1 > NettyBlockTransferService does not abide by spark.blockManager.port

[jira] [Commented] (SPARK-4837) NettyBlockTransferService does not abide by spark.blockManager.port config option

2014-12-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246996#comment-14246996 ] Patrick Wendell commented on SPARK-4837: Hey [~aash] because there is a work aroun

[jira] [Resolved] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

2014-12-15 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-4740. Resolution: Fixed Fix Version/s: 1.2.0 > Netty's network throughput is about 1/2 of NIO's in

[jira] [Updated] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

2014-12-15 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-4740: --- Description: When testing current spark master (1.3.0-snapshot) with spark-perf (sort-by-key, aggrega

[jira] [Created] (SPARK-4853) Automatically adjust the number of connections between two peers to achieve good performance

2014-12-15 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-4853: -- Summary: Automatically adjust the number of connections between two peers to achieve good performance Key: SPARK-4853 URL: https://issues.apache.org/jira/browse/SPARK-4853

[jira] [Commented] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

2014-12-15 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246935#comment-14246935 ] Reynold Xin commented on SPARK-4740: Thanks for the analysis - it looks like your resu

[jira] [Commented] (SPARK-1442) Add Window function support

2014-12-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246910#comment-14246910 ] Apache Spark commented on SPARK-1442: - User 'liancheng' has created a pull request for

[jira] [Commented] (SPARK-3619) Upgrade to Mesos 0.21 to work around MESOS-1688

2014-12-15 Thread Jing Dong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246831#comment-14246831 ] Jing Dong commented on SPARK-3619: -- [~tnachen] Will this be released with Spark 1.2.0? I

[jira] [Commented] (SPARK-4852) Hive query plan deserialization failure caused by shaded hive-exec jar file when generating golden answers

2014-12-15 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246799#comment-14246799 ] Cheng Lian commented on SPARK-4852: --- Lowered priority to Minor since this issue only aff

[jira] [Updated] (SPARK-4852) Hive query plan deserialization failure caused by shaded hive-exec jar file when generating golden answers

2014-12-15 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-4852: -- Priority: Minor (was: Major) > Hive query plan deserialization failure caused by shaded hive-exec jar f

[jira] [Created] (SPARK-4852) Hive query plan deserialization failure caused by shaded hive-exec jar file when generating golden answers

2014-12-15 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-4852: - Summary: Hive query plan deserialization failure caused by shaded hive-exec jar file when generating golden answers Key: SPARK-4852 URL: https://issues.apache.org/jira/browse/SPARK-4852

[jira] [Commented] (SPARK-4547) OOM when making bins in BinaryClassificationMetrics

2014-12-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246794#comment-14246794 ] Apache Spark commented on SPARK-4547: - User 'srowen' has created a pull request for th

[jira] [Commented] (SPARK-4844) SGD should support custom sampling.

2014-12-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246749#comment-14246749 ] Sean Owen commented on SPARK-4844: -- No, it definitely does not. See {{PartitionwiseSample

[jira] [Commented] (SPARK-4844) SGD should support custom sampling.

2014-12-15 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246738#comment-14246738 ] Guoqiang Li commented on SPARK-4844: The main reason is that {{RDD.sample}} is not eff

[jira] [Commented] (SPARK-4846) When the vocabulary size is large, Word2Vec may yield "OutOfMemoryError: Requested array size exceeds VM limit"

2014-12-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246732#comment-14246732 ] Sean Owen commented on SPARK-4846: -- But being lazy doesn't really change whether it is se

  1   2   >