[jira] [Created] (SPARK-4760) ANALYZE TABLE table COMPUTE STATISTICS noscan failed estimating table size for tables created from Parquet files

2014-12-05 Thread Jianshi Huang (JIRA)
Jianshi Huang created SPARK-4760: Summary: ANALYZE TABLE table COMPUTE STATISTICS noscan failed estimating table size for tables created from Parquet files Key: SPARK-4760 URL:

[jira] [Created] (SPARK-4761) With JDBC server, set Kryo as default serializer and disable reference tracking

2014-12-05 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-4761: -- Summary: With JDBC server, set Kryo as default serializer and disable reference tracking Key: SPARK-4761 URL: https://issues.apache.org/jira/browse/SPARK-4761

[jira] [Created] (SPARK-4762) Add support for tuples in where in clause query

2014-12-05 Thread Yash Datta (JIRA)
Yash Datta created SPARK-4762: - Summary: Add support for tuples in where in clause query Key: SPARK-4762 URL: https://issues.apache.org/jira/browse/SPARK-4762 Project: Spark Issue Type:

[jira] [Updated] (SPARK-4762) Add support for tuples in 'where in' clause query

2014-12-05 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-4762: -- Summary: Add support for tuples in 'where in' clause query (was: Add support for tuples in where in

[jira] [Commented] (SPARK-4762) Add support for tuples in 'where in' clause query

2014-12-05 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235280#comment-14235280 ] Yash Datta commented on SPARK-4762: --- Already created a PR for the hive parser Add

[jira] [Commented] (SPARK-4734) [Streaming]limit the file Dstream size for each batch

2014-12-05 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235304#comment-14235304 ] 宿荣全 commented on SPARK-4734: [~srowen] I am very sorry that I can't describe the suggestion

[jira] [Created] (SPARK-4763) All-pairs shortest paths algorithm

2014-12-05 Thread Ankur Dave (JIRA)
Ankur Dave created SPARK-4763: - Summary: All-pairs shortest paths algorithm Key: SPARK-4763 URL: https://issues.apache.org/jira/browse/SPARK-4763 Project: Spark Issue Type: New Feature

[jira] [Updated] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

2014-12-05 Thread Zhang, Liye (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhang, Liye updated SPARK-4740: --- Attachment: TestRunner sort-by-key - Thread dump for executor 1_files (Nio-48 cores per node).zip

[jira] [Commented] (SPARK-4763) All-pairs shortest paths algorithm

2014-12-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235324#comment-14235324 ] Apache Spark commented on SPARK-4763: - User 'ankurdave' has created a pull request for

[jira] [Commented] (SPARK-3717) DecisionTree, RandomForest: Partition by feature

2014-12-05 Thread SUMANTH B B N (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235437#comment-14235437 ] SUMANTH B B N commented on SPARK-3717: -- [~josephkb][~manishamde][~codedeft]

[jira] [Resolved] (SPARK-4748) PySpark can't read data in HDFS in YARN mode

2014-12-05 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastián Ramírez resolved SPARK-4748. -- Resolution: Invalid I don't know what was happening, but once I restarted the cluster

[jira] [Commented] (SPARK-4734) [Streaming]limit the file Dstream size for each batch

2014-12-05 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235493#comment-14235493 ] Sean Owen commented on SPARK-4734: -- Well, here you have the apparent problem that you

[jira] [Commented] (SPARK-4759) Deadlock in complex spark job.

2014-12-05 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235509#comment-14235509 ] Sean Owen commented on SPARK-4759: -- Can you dump the thread state with kill -QUIT and

[jira] [Created] (SPARK-4764) Ensure that files are fetched atomically

2014-12-05 Thread JIRA
Christophe PRÉAUD created SPARK-4764: Summary: Ensure that files are fetched atomically Key: SPARK-4764 URL: https://issues.apache.org/jira/browse/SPARK-4764 Project: Spark Issue Type:

[jira] [Commented] (SPARK-4764) Ensure that files are fetched atomically

2014-12-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235525#comment-14235525 ] Apache Spark commented on SPARK-4764: - User 'preaudc' has created a pull request for

[jira] [Commented] (SPARK-4761) With JDBC server, set Kryo as default serializer and disable reference tracking

2014-12-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235604#comment-14235604 ] Cheng Lian commented on SPARK-4761: --- The JDBC Thrift server is started by

[jira] [Commented] (SPARK-4761) With JDBC server, set Kryo as default serializer and disable reference tracking

2014-12-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235606#comment-14235606 ] Apache Spark commented on SPARK-4761: - User 'liancheng' has created a pull request for

[jira] [Updated] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

2014-12-05 Thread Zhang, Liye (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhang, Liye updated SPARK-4740: --- Attachment: Spark-perf Test Report 16 Cores per Executor.pdf Hi [~rxin] [~adav], the difference

[jira] [Commented] (SPARK-4759) Deadlock in complex spark job.

2014-12-05 Thread Davis Shepherd (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235651#comment-14235651 ] Davis Shepherd commented on SPARK-4759: ---

[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job.

2014-12-05 Thread Davis Shepherd (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235651#comment-14235651 ] Davis Shepherd edited comment on SPARK-4759 at 12/5/14 3:48 PM:

[jira] [Comment Edited] (SPARK-4759) Deadlock in complex spark job.

2014-12-05 Thread Davis Shepherd (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235651#comment-14235651 ] Davis Shepherd edited comment on SPARK-4759 at 12/5/14 3:52 PM:

[jira] [Updated] (SPARK-4759) Deadlock in complex spark job.

2014-12-05 Thread Davis Shepherd (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davis Shepherd updated SPARK-4759: -- Description: The attached test class runs two identical jobs that perform some iterative

[jira] [Commented] (SPARK-4759) Deadlock in complex spark job.

2014-12-05 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235673#comment-14235673 ] Sean Owen commented on SPARK-4759: -- I don't see a deadlock here, so maybe that's not the

[jira] [Commented] (SPARK-3147) Implement A/B testing

2014-12-05 Thread Yu Ishikawa (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235672#comment-14235672 ] Yu Ishikawa commented on SPARK-3147: Hi [~mengxr], I agree that A/B testing is widely

[jira] [Commented] (SPARK-4759) Deadlock in complex spark job.

2014-12-05 Thread Davis Shepherd (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235703#comment-14235703 ] Davis Shepherd commented on SPARK-4759: --- Fair enough. As far as I can work out, it

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235758#comment-14235758 ] Sandy Ryza commented on SPARK-3655: --- Hey [~koert], I think the transform that would most

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-05 Thread koert kuipers (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235766#comment-14235766 ] koert kuipers commented on SPARK-3655: -- something that takes in an ordering, and

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235795#comment-14235795 ] Sandy Ryza commented on SPARK-3655: --- The repartitionAndSortWithinPartitions approach

[jira] [Commented] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

2014-12-05 Thread Aaron Davidson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235813#comment-14235813 ] Aaron Davidson commented on SPARK-4740: --- I think we have ourselves a winner. NIO is

[jira] [Resolved] (SPARK-4761) With JDBC server, set Kryo as default serializer and disable reference tracking

2014-12-05 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4761. Resolution: Fixed With JDBC server, set Kryo as default serializer and disable reference

[jira] [Created] (SPARK-4765) Add GC back to default metrics

2014-12-05 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-4765: -- Summary: Add GC back to default metrics Key: SPARK-4765 URL: https://issues.apache.org/jira/browse/SPARK-4765 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-4765) Add GC back to default metrics

2014-12-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235936#comment-14235936 ] Apache Spark commented on SPARK-4765: - User 'kayousterhout' has created a pull request

[jira] [Commented] (SPARK-4737) Prevent serialization errors from ever crashing the DAG scheduler

2014-12-05 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235939#comment-14235939 ] Patrick Wendell commented on SPARK-4737: [~marmbrus] this is a good idea. I'm sure

[jira] [Comment Edited] (SPARK-4737) Prevent serialization errors from ever crashing the DAG scheduler

2014-12-05 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235939#comment-14235939 ] Patrick Wendell edited comment on SPARK-4737 at 12/5/14 7:19 PM:

[jira] [Created] (SPARK-4766) ML Estimator Params should subclass Transformer Params

2014-12-05 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-4766: Summary: ML Estimator Params should subclass Transformer Params Key: SPARK-4766 URL: https://issues.apache.org/jira/browse/SPARK-4766 Project: Spark

[jira] [Commented] (SPARK-4501) Create build/mvn to automatically download maven/zinc/scalac

2014-12-05 Thread Ryan Williams (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235960#comment-14235960 ] Ryan Williams commented on SPARK-4501: -- Would this ideally just fork out to some

[jira] [Commented] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

2014-12-05 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235986#comment-14235986 ] Reynold Xin commented on SPARK-4740: Looking at the nio stacktrace it does confirm one

[jira] [Created] (SPARK-4767) Add support for launching in a specified placement group to spark ec2 scripts.

2014-12-05 Thread holdenk (JIRA)
holdenk created SPARK-4767: -- Summary: Add support for launching in a specified placement group to spark ec2 scripts. Key: SPARK-4767 URL: https://issues.apache.org/jira/browse/SPARK-4767 Project: Spark

[jira] [Commented] (SPARK-3633) Fetches failure observed after SPARK-2711

2014-12-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235999#comment-14235999 ] Josh Rosen commented on SPARK-3633: --- [~stephen], Do you know if the hosts that failed

[jira] [Updated] (SPARK-4005) handle message replies in receive instead of in the individual private methods

2014-12-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4005: -- Assignee: Zhang, Liye handle message replies in receive instead of in the individual private methods

[jira] [Resolved] (SPARK-4005) handle message replies in receive instead of in the individual private methods

2014-12-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-4005. --- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 2853

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-05 Thread koert kuipers (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236015#comment-14236015 ] koert kuipers commented on SPARK-3655: -- should there be a foldLeft that does not

[jira] [Commented] (SPARK-4414) SparkContext.wholeTextFiles Doesn't work with S3 Buckets

2014-12-05 Thread Marc Millstone (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236020#comment-14236020 ] Marc Millstone commented on SPARK-4414: --- Is there any update to this ticket? We are

[jira] [Commented] (SPARK-4767) Add support for launching in a specified placement group to spark ec2 scripts.

2014-12-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236028#comment-14236028 ] Apache Spark commented on SPARK-4767: - User 'holdenk' has created a pull request for

[jira] [Created] (SPARK-4768) Add Support For Impala Encoded Timestamp (INT96)

2014-12-05 Thread Pat McDonough (JIRA)
Pat McDonough created SPARK-4768: Summary: Add Support For Impala Encoded Timestamp (INT96) Key: SPARK-4768 URL: https://issues.apache.org/jira/browse/SPARK-4768 Project: Spark Issue Type:

[jira] [Updated] (SPARK-4768) Add Support For Impala Encoded Timestamp (INT96)

2014-12-05 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-4768: Target Version/s: 1.3.0 Add Support For Impala Encoded Timestamp (INT96)

[jira] [Created] (SPARK-4769) CTAS does not work when reading from temporary tables

2014-12-05 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-4769: --- Summary: CTAS does not work when reading from temporary tables Key: SPARK-4769 URL: https://issues.apache.org/jira/browse/SPARK-4769 Project: Spark

[jira] [Created] (SPARK-4770) spark.scheduler.minRegisteredResourcesRatio documented default is incorrect for YARN

2014-12-05 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4770: - Summary: spark.scheduler.minRegisteredResourcesRatio documented default is incorrect for YARN Key: SPARK-4770 URL: https://issues.apache.org/jira/browse/SPARK-4770

[jira] [Commented] (SPARK-4770) spark.scheduler.minRegisteredResourcesRatio documented default is incorrect for YARN

2014-12-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236090#comment-14236090 ] Apache Spark commented on SPARK-4770: - User 'sryza' has created a pull request for

[jira] [Updated] (SPARK-4759) Deadlock in complex spark job.

2014-12-05 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4759: --- Priority: Critical (was: Major) Deadlock in complex spark job.

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236131#comment-14236131 ] Sandy Ryza commented on SPARK-3655: --- foldLeft only conceptually makes sense when applied

[jira] [Commented] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

2014-12-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236134#comment-14236134 ] Apache Spark commented on SPARK-4740: - User 'rxin' has created a pull request for this

[jira] [Updated] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

2014-12-05 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-4740: --- Assignee: Reynold Xin Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

[jira] [Commented] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

2014-12-05 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236133#comment-14236133 ] Reynold Xin commented on SPARK-4740: I submitted a WIP PR:

[jira] [Closed] (SPARK-2754) Document standalone-cluster mode now that it's working

2014-12-05 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-2754. Resolution: Duplicate Document standalone-cluster mode now that it's working

[jira] [Reopened] (SPARK-4506) Update documentation to clarify whether standalone-cluster mode is now officially supported

2014-12-05 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reopened SPARK-4506: -- Update documentation to clarify whether standalone-cluster mode is now officially supported

[jira] [Closed] (SPARK-4506) Update documentation to clarify whether standalone-cluster mode is now officially supported

2014-12-05 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-4506. Resolution: Fixed Fix Version/s: (was: 1.1.2) 1.1.1 Target

[jira] [Closed] (SPARK-4506) Update documentation to clarify whether standalone-cluster mode is now officially supported

2014-12-05 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-4506. Resolution: Fixed Fix Version/s: 1.1.2 1.2.0 This was actually fixed by

[jira] [Created] (SPARK-4771) Document standalone --supervise feature

2014-12-05 Thread Andrew Or (JIRA)
Andrew Or created SPARK-4771: Summary: Document standalone --supervise feature Key: SPARK-4771 URL: https://issues.apache.org/jira/browse/SPARK-4771 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-4761) With JDBC server, set Kryo as default serializer and disable reference tracking

2014-12-05 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4761: --- Fix Version/s: 1.2.0 With JDBC server, set Kryo as default serializer and disable reference

[jira] [Commented] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

2014-12-05 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236162#comment-14236162 ] Patrick Wendell commented on SPARK-4740: I'd like to escalate this to a 1.2

[jira] [Updated] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

2014-12-05 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4740: --- Priority: Blocker (was: Major) Netty's network throughput is about 1/2 of NIO's in

[jira] [Updated] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

2014-12-05 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4740: --- Target Version/s: 1.2.0 Netty's network throughput is about 1/2 of NIO's in spark-perf

[jira] [Commented] (SPARK-3633) Fetches failure observed after SPARK-2711

2014-12-05 Thread Stephen Haberman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236163#comment-14236163 ] Stephen Haberman commented on SPARK-3633: - Hi Josh, Yes, it was GC issues;

[jira] [Commented] (SPARK-4362) Make prediction probability available in NaiveBayesModel

2014-12-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236165#comment-14236165 ] Apache Spark commented on SPARK-4362: - User 'alanctgardner' has created a pull request

[jira] [Created] (SPARK-4772) Accumulators leak memory, both temporarily and permanently

2014-12-05 Thread Nathan Kronenfeld (JIRA)
Nathan Kronenfeld created SPARK-4772: Summary: Accumulators leak memory, both temporarily and permanently Key: SPARK-4772 URL: https://issues.apache.org/jira/browse/SPARK-4772 Project: Spark

[jira] [Updated] (SPARK-4761) With JDBC server, set Kryo as default serializer and disable reference tracking

2014-12-05 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4761: --- Component/s: SQL With JDBC server, set Kryo as default serializer and disable reference

[jira] [Commented] (SPARK-4772) Accumulators leak memory, both temporarily and permanently

2014-12-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236181#comment-14236181 ] Apache Spark commented on SPARK-4772: - User 'nkronenfeld' has created a pull request

[jira] [Commented] (SPARK-4759) Deadlock in complex spark job.

2014-12-05 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236257#comment-14236257 ] Patrick Wendell commented on SPARK-4759: Thanks [~dgshep] a ton for creating a

[jira] [Updated] (SPARK-4759) Deadlock in complex spark job.

2014-12-05 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4759: --- Assignee: Andrew Or Deadlock in complex spark job. --

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-05 Thread koert kuipers (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236267#comment-14236267 ] koert kuipers commented on SPARK-3655: -- [~sandyr] i updated pullreq to include

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236278#comment-14236278 ] Sandy Ryza commented on SPARK-3655: --- Thanks Koert, will take a look soon. Can we

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-05 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236287#comment-14236287 ] Patrick Wendell commented on SPARK-3655: +1 to Sandy's comment. I think

[jira] [Commented] (SPARK-4759) Deadlock in complex spark job.

2014-12-05 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236301#comment-14236301 ] Andrew Or commented on SPARK-4759: -- Hey just wanted to let you know that I am able to

[jira] [Created] (SPARK-4773) CTAS Doesn't Use the Current Schema

2014-12-05 Thread David Ross (JIRA)
David Ross created SPARK-4773: - Summary: CTAS Doesn't Use the Current Schema Key: SPARK-4773 URL: https://issues.apache.org/jira/browse/SPARK-4773 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-4774) Make HiveFromSpark example more portable

2014-12-05 Thread Kostas Sakellis (JIRA)
Kostas Sakellis created SPARK-4774: -- Summary: Make HiveFromSpark example more portable Key: SPARK-4774 URL: https://issues.apache.org/jira/browse/SPARK-4774 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-4759) Deadlock in complex spark job.

2014-12-05 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-4759: - Affects Version/s: 1.3.0 1.2.0 Deadlock in complex spark job.

[jira] [Commented] (SPARK-4774) Make HiveFromSpark example more portable

2014-12-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236438#comment-14236438 ] Apache Spark commented on SPARK-4774: - User 'ksakellis' has created a pull request for

[jira] [Updated] (SPARK-4774) Make HiveFromSpark example more portable

2014-12-05 Thread Kostas Sakellis (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kostas Sakellis updated SPARK-4774: --- Component/s: SQL Make HiveFromSpark example more portable

[jira] [Updated] (SPARK-4759) Deadlock in complex spark job in local mode with multiple cores

2014-12-05 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-4759: - Summary: Deadlock in complex spark job in local mode with multiple cores (was: Deadlock in complex spark

[jira] [Updated] (SPARK-4759) Deadlock in complex spark job in local mode

2014-12-05 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-4759: - Summary: Deadlock in complex spark job in local mode (was: Deadlock in complex spark job.) Deadlock in

[jira] [Resolved] (SPARK-3625) In some cases, the RDD.checkpoint does not work

2014-12-05 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li resolved SPARK-3625. Resolution: Won't Fix In some cases, the RDD.checkpoint does not work

[jira] [Commented] (SPARK-4737) Prevent serialization errors from ever crashing the DAG scheduler

2014-12-05 Thread Matt Cheah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236494#comment-14236494 ] Matt Cheah commented on SPARK-4737: --- Forgot to mention that I'm actively working on this

[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-12-05 Thread Jacky Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236501#comment-14236501 ] Jacky Li commented on SPARK-4001: - Sure, Xiangrui. I will update it on next Monday while

[jira] [Commented] (SPARK-4759) Deadlock in complex spark job in local mode with multiple cores

2014-12-05 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236517#comment-14236517 ] Andrew Or commented on SPARK-4759: -- Quick update, I was only able to reproduce this in

[jira] [Commented] (SPARK-4734) [Streaming]limit the file Dstream size for each batch

2014-12-05 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236519#comment-14236519 ] 宿荣全 commented on SPARK-4734: [~srowen] [~srowen] I think that I still do not describe the

[jira] [Commented] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

2014-12-05 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236533#comment-14236533 ] Reynold Xin commented on SPARK-4740: [~jerryshao] [~liyezhang556520] I understand it's

[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-12-05 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236540#comment-14236540 ] Nicholas Chammas commented on SPARK-3431: - Here's an example failure I don't

[jira] [Comment Edited] (SPARK-3431) Parallelize execution of tests

2014-12-05 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234702#comment-14234702 ] Nicholas Chammas edited comment on SPARK-3431 at 12/6/14 3:11 AM:

[jira] [Commented] (SPARK-4740) Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey

2014-12-05 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236552#comment-14236552 ] Saisai Shao commented on SPARK-4740: I will test it on my 24 cores and 12 HDDs cluster

[jira] [Created] (SPARK-4775) Possible problem in a simple join? Getting duplicate rows and missing rows

2014-12-05 Thread Stephen Boesch (JIRA)
Stephen Boesch created SPARK-4775: - Summary: Possible problem in a simple join? Getting duplicate rows and missing rows Key: SPARK-4775 URL: https://issues.apache.org/jira/browse/SPARK-4775 Project:

[jira] [Commented] (SPARK-4775) Possible problem in a simple join? Getting duplicate rows and missing rows

2014-12-05 Thread Stephen Boesch (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236568#comment-14236568 ] Stephen Boesch commented on SPARK-4775: --- Here is the abridged output Testing

[jira] [Commented] (SPARK-4775) Possible problem in a simple join? Getting duplicate rows and missing rows

2014-12-05 Thread Stephen Boesch (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236567#comment-14236567 ] Stephen Boesch commented on SPARK-4775: --- How do I attached files here?Since I am

[jira] [Commented] (SPARK-4775) Possible problem in a simple join? Getting duplicate rows and missing rows

2014-12-05 Thread Stephen Boesch (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236577#comment-14236577 ] Stephen Boesch commented on SPARK-4775: --- Here is the same logic for mysql: note the