[jira] [Resolved] (SPARK-2553) CoGroupedRDD unnecessarily allocates a Tuple2 per dep per key

2014-07-17 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2553. -- Resolution: Fixed Target Version/s: 1.1.0 > CoGroupedRDD unnecessarily allocates a Tu

[jira] [Updated] (SPARK-2553) CoGroupedRDD unnecessarily allocates a Tuple2 per dep per key

2014-07-17 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2553: - Priority: Minor (was: Major) > CoGroupedRDD unnecessarily allocates a Tuple2 per dep per key > -

[jira] [Updated] (SPARK-2553) CoGroupedRDD unnecessarily allocates a Tuple2 per dep per key

2014-07-17 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2553: - Assignee: Sandy Ryza > CoGroupedRDD unnecessarily allocates a Tuple2 per dep per key > --

[jira] [Resolved] (SPARK-2570) ClassCastException from HiveFromSpark(examples)

2014-07-17 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-2570. Resolution: Fixed Fix Version/s: 1.0.2 1.1.0 > ClassCastException from Hi

[jira] [Commented] (SPARK-2494) Hash of None is different cross machines in CPython

2014-07-17 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066098#comment-14066098 ] Davies Liu commented on SPARK-2494: --- The PR for this issue: https://github.com/apache/sp

[jira] [Commented] (SPARK-1764) EOF reached before Python server acknowledged

2014-07-17 Thread nigel (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066099#comment-14066099 ] nigel commented on SPARK-1764: -- Hi; Never used yarn. Doesn't happen on standalone. > EOF r

[jira] [Updated] (SPARK-2543) Resizable serialization buffers for kryo

2014-07-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2543: --- Assignee: Koert Kuipers > Resizable serialization buffers for kryo >

[jira] [Updated] (SPARK-2411) Standalone Master - direct users to turn on event logs

2014-07-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2411: --- Fix Version/s: 1.1.0 > Standalone Master - direct users to turn on event logs > -

[jira] [Updated] (SPARK-2411) Standalone Master - direct users to turn on event logs

2014-07-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2411: --- Assignee: Andrew Or > Standalone Master - direct users to turn on event logs > --

[jira] [Commented] (SPARK-2411) Standalone Master - direct users to turn on event logs

2014-07-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065934#comment-14065934 ] Patrick Wendell commented on SPARK-2411: Fixed by: https://github.com/apache/spark

[jira] [Commented] (SPARK-1458) Expose sc.version in PySpark

2014-07-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065929#comment-14065929 ] Patrick Wendell commented on SPARK-1458: Isn't it possible to just have the python

[jira] [Updated] (SPARK-2571) Shuffle read bytes are reported incorrectly for stages with multiple shuffle dependencies

2014-07-17 Thread Kay Ousterhout (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Ousterhout updated SPARK-2571: -- Description: In BlockStoreShuffleFetcher, we set the shuffle metrics for a task to include inf

[jira] [Created] (SPARK-2571) Shuffle read bytes are reported incorrectly for stages with multiple shuffle dependencies

2014-07-17 Thread Kay Ousterhout (JIRA)
Kay Ousterhout created SPARK-2571: - Summary: Shuffle read bytes are reported incorrectly for stages with multiple shuffle dependencies Key: SPARK-2571 URL: https://issues.apache.org/jira/browse/SPARK-2571

[jira] [Commented] (SPARK-2570) ClassCastException from HiveFromSpark(examples)

2014-07-17 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065905#comment-14065905 ] Cheng Hao commented on SPARK-2570: -- https://github.com/apache/spark/pull/1475 > ClassCas

[jira] [Created] (SPARK-2570) ClassCastException from HiveFromSpark(examples)

2014-07-17 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-2570: Summary: ClassCastException from HiveFromSpark(examples) Key: SPARK-2570 URL: https://issues.apache.org/jira/browse/SPARK-2570 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-2569) Customized UDFs in hive not running with Spark SQL

2014-07-17 Thread jacky hung (JIRA)
jacky hung created SPARK-2569: - Summary: Customized UDFs in hive not running with Spark SQL Key: SPARK-2569 URL: https://issues.apache.org/jira/browse/SPARK-2569 Project: Spark Issue Type: Bug

[jira] [Resolved] (SPARK-2299) Consolidate various stageIdTo* hash maps

2014-07-17 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-2299. Resolution: Fixed Fix Version/s: 1.1.0 > Consolidate various stageIdTo* hash maps >

[jira] [Created] (SPARK-2568) RangePartitioner should go through the data only once

2014-07-17 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-2568: -- Summary: RangePartitioner should go through the data only once Key: SPARK-2568 URL: https://issues.apache.org/jira/browse/SPARK-2568 Project: Spark Issue Type: B

[jira] [Updated] (SPARK-2567) Resubmitted stage sometimes remains as active stage in the web UI

2014-07-17 Thread Masayoshi TSUZUKI (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masayoshi TSUZUKI updated SPARK-2567: - Attachment: SPARK-2567.png > Resubmitted stage sometimes remains as active stage in the w

[jira] [Created] (SPARK-2567) Resubmitted stage sometimes remains as active stage in the web UI

2014-07-17 Thread Masayoshi TSUZUKI (JIRA)
Masayoshi TSUZUKI created SPARK-2567: Summary: Resubmitted stage sometimes remains as active stage in the web UI Key: SPARK-2567 URL: https://issues.apache.org/jira/browse/SPARK-2567 Project: Spar

[jira] [Commented] (SPARK-2564) ShuffleReadMetrics.totalBlocksFetched is redundant

2014-07-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065826#comment-14065826 ] Sandy Ryza commented on SPARK-2564: --- https://github.com/apache/spark/pull/1474 > Shuffl

[jira] [Created] (SPARK-2566) Update ShuffleWriteMetrics as data is written

2014-07-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2566: - Summary: Update ShuffleWriteMetrics as data is written Key: SPARK-2566 URL: https://issues.apache.org/jira/browse/SPARK-2566 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-2565) Update ShuffleReadMetrics as blocks are fetched

2014-07-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2565: - Summary: Update ShuffleReadMetrics as blocks are fetched Key: SPARK-2565 URL: https://issues.apache.org/jira/browse/SPARK-2565 Project: Spark Issue Type: Improveme

[jira] [Created] (SPARK-2564) ShuffleReadMetrics.totalBlocksFetched is redundant

2014-07-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2564: - Summary: ShuffleReadMetrics.totalBlocksFetched is redundant Key: SPARK-2564 URL: https://issues.apache.org/jira/browse/SPARK-2564 Project: Spark Issue Type: Improv

[jira] [Resolved] (SPARK-2534) Avoid pulling in the entire RDD or PairRDDFunctions in various operators

2014-07-17 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-2534. Resolution: Fixed Fix Version/s: 1.0.2 1.1.0 > Avoid pulling in the entir

[jira] [Commented] (SPARK-2491) When an OOM is thrown,the executor does not stop properly.

2014-07-17 Thread Kousuke Saruta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065757#comment-14065757 ] Kousuke Saruta commented on SPARK-2491: --- Hi [~gq] I found the issue related to you r

[jira] [Commented] (SPARK-2563) Make number of connection retries configurable

2014-07-17 Thread Shivaram Venkataraman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065735#comment-14065735 ] Shivaram Venkataraman commented on SPARK-2563: -- https://github.com/apache/spa

[jira] [Created] (SPARK-2563) Make number of connection retries configurable

2014-07-17 Thread Shivaram Venkataraman (JIRA)
Shivaram Venkataraman created SPARK-2563: Summary: Make number of connection retries configurable Key: SPARK-2563 URL: https://issues.apache.org/jira/browse/SPARK-2563 Project: Spark

[jira] [Commented] (SPARK-1764) EOF reached before Python server acknowledged

2014-07-17 Thread Timothy Chen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065709#comment-14065709 ] Timothy Chen commented on SPARK-1764: - I'm not sure how this is related to Mesos, is t

[jira] [Commented] (SPARK-1702) Mesos executor won't start because of a ClassNotFoundException

2014-07-17 Thread Timothy Chen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065706#comment-14065706 ] Timothy Chen commented on SPARK-1702: - The PR is merged and closed already, is this st

[jira] [Updated] (SPARK-2454) Separate driver spark home from executor spark home

2014-07-17 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-2454: - Description: The driver may not always share the same directory structure as the executors. It makes lit

[jira] [Commented] (SPARK-872) Should revive offer after tasks finish in Mesos fine-grained mode

2014-07-17 Thread Timothy Chen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065703#comment-14065703 ] Timothy Chen commented on SPARK-872: I'm not quite understanding your statement where M

[jira] [Updated] (SPARK-2365) Add IndexedRDD, an efficient updatable key-value store

2014-07-17 Thread Ankur Dave (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Dave updated SPARK-2365: -- Attachment: 2014-07-07-IndexedRDD-design-review.pdf Slides explaining the motivation, design, and perfo

[jira] [Comment Edited] (SPARK-2365) Add IndexedRDD, an efficient updatable key-value store

2014-07-17 Thread Ankur Dave (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065694#comment-14065694 ] Ankur Dave edited comment on SPARK-2365 at 7/17/14 10:31 PM: -

[jira] [Commented] (SPARK-1458) Expose sc.version in PySpark

2014-07-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065693#comment-14065693 ] Nicholas Chammas commented on SPARK-1458: - Perhaps that could also be some kind of

[jira] [Created] (SPARK-2562) Add Date datatype support to Spark SQL

2014-07-17 Thread Zongheng Yang (JIRA)
Zongheng Yang created SPARK-2562: Summary: Add Date datatype support to Spark SQL Key: SPARK-2562 URL: https://issues.apache.org/jira/browse/SPARK-2562 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-2494) Hash of None is different cross machines in CPython

2014-07-17 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065671#comment-14065671 ] Matthew Farrellee commented on SPARK-2494: -- thank you. i've confirmed this: {cod

[jira] [Commented] (SPARK-2470) Fix PEP 8 violations

2014-07-17 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065655#comment-14065655 ] Reynold Xin commented on SPARK-2470: That PR only covers a small fraction of the chang

[jira] [Resolved] (SPARK-1215) Clustering: Index out of bounds error

2014-07-17 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-1215. -- Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1468 [https://gith

[jira] [Resolved] (SPARK-2525) Remove as many compilation warning messages as possible in Spark SQL

2014-07-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-2525. - Resolution: Fixed > Remove as many compilation warning messages as possible in Spark SQL

[jira] [Commented] (SPARK-2554) CountDistinct and SumDistinct should do partial aggregation

2014-07-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065621#comment-14065621 ] Michael Armbrust commented on SPARK-2554: - I think this may be hard to do if there

[jira] [Created] (SPARK-2561) Repartitioning a SchemaRDD breaks resolution

2014-07-17 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-2561: --- Summary: Repartitioning a SchemaRDD breaks resolution Key: SPARK-2561 URL: https://issues.apache.org/jira/browse/SPARK-2561 Project: Spark Issue Type:

[jira] [Commented] (SPARK-2282) PySpark crashes if too many tasks complete quickly

2014-07-17 Thread Ken Carlile (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065549#comment-14065549 ] Ken Carlile commented on SPARK-2282: Awesome. I was afraid we were trying to chase dow

[jira] [Commented] (SPARK-2542) Exit Code Class should be renamed and placed package properly

2014-07-17 Thread Kousuke Saruta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065528#comment-14065528 ] Kousuke Saruta commented on SPARK-2542: --- PR: https://github.com/apache/spark/pull/14

[jira] [Updated] (SPARK-2560) Create Spark SQL syntax reference

2014-07-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-2560: Description: Does Spark SQL support {{LEN()}}? How about {{LIMIT}}? And what about {{MY FA

[jira] [Created] (SPARK-2560) Create Spark SQL syntax reference

2014-07-17 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-2560: --- Summary: Create Spark SQL syntax reference Key: SPARK-2560 URL: https://issues.apache.org/jira/browse/SPARK-2560 Project: Spark Issue Type: Documentati

[jira] [Commented] (SPARK-2501) Handle stage re-submissions properly in the UI

2014-07-17 Thread Masayoshi TSUZUKI (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065353#comment-14065353 ] Masayoshi TSUZUKI commented on SPARK-2501: -- Yes, this ticket covers it. I think t

[jira] [Updated] (SPARK-2528) spark-ec2 security group permissions are too open

2014-07-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-2528: Description: {{spark-ec2}} configures EC2 security groups with ports [open to the world |

[jira] [Commented] (SPARK-2494) Hash of None is different cross machines in CPython

2014-07-17 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065317#comment-14065317 ] Davies Liu commented on SPARK-2494: --- The tip version already handle hash of None, but it

[jira] [Updated] (SPARK-2447) Add common solution for sending upsert actions to HBase (put, deletes, and increment)

2014-07-17 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-2447: - Assignee: Ted Malaska > Add common solution for sending upsert actions to HBase (put, deletes, an

[jira] [Commented] (SPARK-2282) PySpark crashes if too many tasks complete quickly

2014-07-17 Thread Aaron Davidson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065306#comment-14065306 ] Aaron Davidson commented on SPARK-2282: --- This problem is kinda silly because we're a

[jira] [Commented] (SPARK-2282) PySpark crashes if too many tasks complete quickly

2014-07-17 Thread Ken Carlile (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065272#comment-14065272 ] Ken Carlile commented on SPARK-2282: So I've tried a few different things at this poin

[jira] [Commented] (SPARK-2494) Hash of None is different cross machines in CPython

2014-07-17 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065267#comment-14065267 ] Matthew Farrellee commented on SPARK-2494: -- i'm trying to reproduce using the tip

[jira] [Commented] (SPARK-2501) Handle stage re-submissions properly in the UI

2014-07-17 Thread Shivaram Venkataraman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065266#comment-14065266 ] Shivaram Venkataraman commented on SPARK-2501: -- I don't know if this issue wi

[jira] [Commented] (SPARK-2316) StorageStatusListener should avoid O(blocks) operations

2014-07-17 Thread Shivaram Venkataraman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065256#comment-14065256 ] Shivaram Venkataraman commented on SPARK-2316: -- I'd just like to add that in

[jira] [Commented] (SPARK-2256) pyspark: .take doesn't work ... sometimes ...

2014-07-17 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065249#comment-14065249 ] Matthew Farrellee commented on SPARK-2256: -- maybe there's an issue in the platfor

[jira] [Commented] (SPARK-2494) Hash of None is different cross machines in CPython

2014-07-17 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065240#comment-14065240 ] Davies Liu commented on SPARK-2494: --- This bug only happen in cluster mode, so it's can n

[jira] [Commented] (SPARK-2256) pyspark: .take doesn't work ... sometimes ...

2014-07-17 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065230#comment-14065230 ] Ángel Álvarez commented on SPARK-2256: -- I've tried using local and master spark in st

[jira] [Created] (SPARK-2559) Add A Link to Download the Application Events Log for Offline Analysis

2014-07-17 Thread Pat McDonough (JIRA)
Pat McDonough created SPARK-2559: Summary: Add A Link to Download the Application Events Log for Offline Analysis Key: SPARK-2559 URL: https://issues.apache.org/jira/browse/SPARK-2559 Project: Spark

[jira] [Commented] (SPARK-2083) Allow local task to retry after failure.

2014-07-17 Thread Bill Havanki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065143#comment-14065143 ] Bill Havanki commented on SPARK-2083: - Pull request available: https://github.com/apac

[jira] [Commented] (SPARK-2282) PySpark crashes if too many tasks complete quickly

2014-07-17 Thread Aaron Davidson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065121#comment-14065121 ] Aaron Davidson commented on SPARK-2282: --- This problem does look identical. I think I

[jira] [Commented] (SPARK-1458) Expose sc.version in PySpark

2014-07-17 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065094#comment-14065094 ] Matthew Farrellee commented on SPARK-1458: -- [~pwendell] if you're ok with having

[jira] [Updated] (SPARK-2558) Mention --queue argument in YARN documentation

2014-07-17 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2558: - Labels: Starter (was: ) > Mention --queue argument in YARN documentation >

[jira] [Created] (SPARK-2558) Mention --queue argument in YARN documentation

2014-07-17 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2558: Summary: Mention --queue argument in YARN documentation Key: SPARK-2558 URL: https://issues.apache.org/jira/browse/SPARK-2558 Project: Spark Issue Type: Doc

[jira] [Commented] (SPARK-2256) pyspark: .take doesn't work ... sometimes ...

2014-07-17 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065066#comment-14065066 ] Matthew Farrellee commented on SPARK-2256: -- are you using a local master, mesos,

[jira] [Commented] (SPARK-1662) PySpark fails if python class is used as a data container

2014-07-17 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065059#comment-14065059 ] Matthew Farrellee commented on SPARK-1662: -- [~nrchandan] and [~pwendell] - i reco

[jira] [Updated] (SPARK-2256) pyspark: .take doesn't work ... sometimes ...

2014-07-17 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ángel Álvarez updated SPARK-2256: - Attachment: A_test.zip I've tried with different files and sizes ... but I can't figure out the r

[jira] [Commented] (SPARK-1670) PySpark Fails to Create SparkContext Due To Debugging Options in conf/java-opts

2014-07-17 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065047#comment-14065047 ] Matthew Farrellee commented on SPARK-1670: -- SPARK-2313 is the root cause of this.

[jira] [Commented] (SPARK-2021) External hashing in PySpark

2014-07-17 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065040#comment-14065040 ] Matthew Farrellee commented on SPARK-2021: -- [~matei][~prashant_] what do you mean

[jira] [Commented] (SPARK-2523) Potential Bugs if SerDe is not the identical among partitions and table

2014-07-17 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065035#comment-14065035 ] Yin Huai commented on SPARK-2523: - I see. Although we are using the right SerDe to deseria

[jira] [Commented] (SPARK-2256) pyspark: .take doesn't work ... sometimes ...

2014-07-17 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065032#comment-14065032 ] Matthew Farrellee commented on SPARK-2256: -- [~angel2014] i've tried this using a

[jira] [Updated] (SPARK-2523) Potential Bugs if SerDe is not the identical among partitions and table

2014-07-17 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-2523: Target Version/s: 1.1.0 > Potential Bugs if SerDe is not the identical among partitions and table > ---

[jira] [Commented] (SPARK-2557) createTaskScheduler should be consistent between local and local-n-failures

2014-07-17 Thread Ye Xianjin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065029#comment-14065029 ] Ye Xianjin commented on SPARK-2557: --- Github pr: https://github.com/apache/spark/pull/146

[jira] [Commented] (SPARK-2282) PySpark crashes if too many tasks complete quickly

2014-07-17 Thread Ken Carlile (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065025#comment-14065025 ] Ken Carlile commented on SPARK-2282: A little more info: Nodes are running Scientific

[jira] [Commented] (SPARK-2470) Fix PEP 8 violations

2014-07-17 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065014#comment-14065014 ] Matthew Farrellee commented on SPARK-2470: -- [~prashant_][~rxin] it looks like the

[jira] [Commented] (SPARK-2494) Hash of None is different cross machines in CPython

2014-07-17 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065006#comment-14065006 ] Matthew Farrellee commented on SPARK-2494: -- [~davies] will you provide an example

[jira] [Commented] (SPARK-2557) createTaskScheduler should be consistent between local and local-n-failures

2014-07-17 Thread Ye Xianjin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065001#comment-14065001 ] Ye Xianjin commented on SPARK-2557: --- I will send a pr for this. > createTaskScheduler s

[jira] [Created] (SPARK-2557) createTaskScheduler should be consistent between local and local-n-failures

2014-07-17 Thread Ye Xianjin (JIRA)
Ye Xianjin created SPARK-2557: - Summary: createTaskScheduler should be consistent between local and local-n-failures Key: SPARK-2557 URL: https://issues.apache.org/jira/browse/SPARK-2557 Project: Spark

[jira] [Commented] (SPARK-2282) PySpark crashes if too many tasks complete quickly

2014-07-17 Thread Ken Carlile (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064996#comment-14064996 ] Ken Carlile commented on SPARK-2282: So we've just given this a try with a 32 node clu

[jira] [Commented] (SPARK-2156) When the size of serialized results for one partition is slightly smaller than 10MB (the default akka.frameSize), the execution blocks

2014-07-17 Thread DjvuLee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064965#comment-14064965 ] DjvuLee commented on SPARK-2156: I see this fixed in the spark branch-0.9 in the github, b

[jira] [Updated] (SPARK-2491) When an OOM is thrown,the executor does not stop properly.

2014-07-17 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-2491: --- Summary: When an OOM is thrown,the executor does not stop properly. (was: When an OOM is thrown,the

[jira] [Created] (SPARK-2556) Multiple SparkContexts can coexist in one process

2014-07-17 Thread YanTang Zhai (JIRA)
YanTang Zhai created SPARK-2556: --- Summary: Multiple SparkContexts can coexist in one process Key: SPARK-2556 URL: https://issues.apache.org/jira/browse/SPARK-2556 Project: Spark Issue Type: Imp

[jira] [Updated] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.

2014-07-17 Thread Zhihui (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihui updated SPARK-2555: -- Description: In SPARK-1946, configuration spark.scheduler.minRegisteredExecutorsRatio was introduced, but it o

[jira] [Commented] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.

2014-07-17 Thread Zhihui (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064743#comment-14064743 ] Zhihui commented on SPARK-2555: --- I submit a PR https://github.com/apache/spark/pull/1462 >

[jira] [Updated] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.

2014-07-17 Thread Zhihui (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihui updated SPARK-2555: -- Description: In SPARK-1946, configuration spark.scheduler.minRegisteredExecutorsRatio was introduced, but it o

[jira] [Updated] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.

2014-07-17 Thread Zhihui (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihui updated SPARK-2555: -- Description: In SPARK-1946, > Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos > mo

[jira] [Created] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.

2014-07-17 Thread Zhihui (JIRA)
Zhihui created SPARK-2555: - Summary: Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode. Key: SPARK-2555 URL: https://issues.apache.org/jira/browse/SPARK-2555 Project: Spark

[jira] [Commented] (SPARK-2492) KafkaReceiver minor changes to align with Kafka 0.8

2014-07-17 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064721#comment-14064721 ] Saisai Shao commented on SPARK-2492: Hi TD, Also I did some experiments on the previ

[jira] [Updated] (SPARK-2551) Cleanup FilteringParquetRowInputFormat

2014-07-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-2551: -- Issue Type: Improvement (was: Bug) > Cleanup FilteringParquetRowInputFormat >

[jira] [Created] (SPARK-2554) CountDistinct and SumDistinct should do partial aggregation

2014-07-17 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-2554: - Summary: CountDistinct and SumDistinct should do partial aggregation Key: SPARK-2554 URL: https://issues.apache.org/jira/browse/SPARK-2554 Project: Spark Issue Ty

[jira] [Commented] (SPARK-2553) CoGroupedRDD unnecessarily allocates a Tuple2 per dep per key

2014-07-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064709#comment-14064709 ] Sandy Ryza commented on SPARK-2553: --- https://github.com/apache/spark/pull/1461 > CoGrou

[jira] [Created] (SPARK-2553) CoGroupedRDD unnecessarily allocates a Tuple2 per dep per key

2014-07-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2553: - Summary: CoGroupedRDD unnecessarily allocates a Tuple2 per dep per key Key: SPARK-2553 URL: https://issues.apache.org/jira/browse/SPARK-2553 Project: Spark Issue

[jira] [Updated] (SPARK-2476) Have sbt-assembly include runtime dependencies in jar

2014-07-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2476: --- Priority: Minor (was: Major) > Have sbt-assembly include runtime dependencies in jar > -

[jira] [Commented] (SPARK-2497) @DeveloperApi tag does not suppress MIMA warnings

2014-07-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064703#comment-14064703 ] Patrick Wendell commented on SPARK-2497: I wonder if maybe the MIMA exclude genera

[jira] [Commented] (SPARK-2119) Reading Parquet InputSplits dominates query execution time when reading off S3

2014-07-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064701#comment-14064701 ] Cheng Lian commented on SPARK-2119: --- Agree. Created SPARK-2551 for removing those hacks

[jira] [Updated] (SPARK-2552) Stabilize the computation of logistic function in pyspark

2014-07-17 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-2552: - Labels: Starter (was: ) > Stabilize the computation of logistic function in pyspark > --

[jira] [Created] (SPARK-2552) Stabilize the computation of logistic function in pyspark

2014-07-17 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-2552: Summary: Stabilize the computation of logistic function in pyspark Key: SPARK-2552 URL: https://issues.apache.org/jira/browse/SPARK-2552 Project: Spark Issue

[jira] [Updated] (SPARK-2552) Stabilize the computation of logistic function in pyspark

2014-07-17 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-2552: - Description: exp(1000) throws an error in python. For logistic function, we can use either 1 / (

[jira] [Resolved] (SPARK-2423) Clean up SparkSubmit for readability

2014-07-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2423. Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1349 [https://

[jira] [Created] (SPARK-2551) Cleanup FilteringParquetRowInputFormat

2014-07-17 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-2551: - Summary: Cleanup FilteringParquetRowInputFormat Key: SPARK-2551 URL: https://issues.apache.org/jira/browse/SPARK-2551 Project: Spark Issue Type: Bug Comp

  1   2   >