[jira] [Resolved] (SPARK-6404) Call broadcast() in each interval for spark streaming programs.

2015-03-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-6404. Resolution: Invalid I'm closing the issue because broadcast variables are immutable, so you

[jira] [Updated] (SPARK-6403) Launch master as spot instance on EC2

2015-03-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6403: --- Fix Version/s: (was: 1.2.1) Launch master as spot instance on EC2

[jira] [Updated] (SPARK-6403) Launch master as spot instance on EC2

2015-03-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6403: --- Target Version/s: (was: 1.2.1) Launch master as spot instance on EC2

[jira] [Commented] (SPARK-6401) Unable to load a old API input format in Spark streaming

2015-03-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371699#comment-14371699 ] Patrick Wendell commented on SPARK-6401: If this is a matter of just adding

[jira] [Updated] (SPARK-6414) Spark driver failed with NPE on job cancelation

2015-03-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6414: --- Description: When a job group is cancelled, we scan through all jobs to determine which

[jira] [Updated] (SPARK-6414) Spark driver failed with NPE on job cancelation

2015-03-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6414: --- Affects Version/s: 1.3.0 Spark driver failed with NPE on job cancelation

[jira] [Updated] (SPARK-6414) Spark driver failed with NPE on job cancelation

2015-03-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6414: --- Priority: Critical (was: Major) Spark driver failed with NPE on job cancelation

[jira] [Commented] (SPARK-5081) Shuffle write increases

2015-03-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372266#comment-14372266 ] Patrick Wendell commented on SPARK-5081: Hey @cbbetz - the last movement

[jira] [Updated] (SPARK-6415) Spark Streaming fail-fast: Stop scheduling jobs when a batch fails, and kills the app

2015-03-19 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6415: --- Component/s: Streaming Spark Streaming fail-fast: Stop scheduling jobs when a batch fails

[jira] [Updated] (SPARK-6362) Broken pipe error when training a RandomForest on a union of two RDDs

2015-03-16 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6362: --- Component/s: PySpark Broken pipe error when training a RandomForest on a union of two RDDs

Re: enum-like types in Spark

2015-03-16 Thread Patrick Wendell
{ private[this] case object _MemoryOnly extends StorageLevel final val MemoryOnly: StorageLevel = _MemoryOnly private[this] case object _DiskOnly extends StorageLevel final val DiskOnly: StorageLevel = _DiskOnly } On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell pwend...@gmail.com wrote: I

Re: Wrong version on the Spark documentation page

2015-03-15 Thread Patrick Wendell
Cheng - what if you hold shift+refresh? For me the /latest link correctly points to 1.3.0 On Sun, Mar 15, 2015 at 10:40 AM, Cheng Lian lian.cs@gmail.com wrote: It's still marked as 1.2.1 here http://spark.apache.org/docs/latest/ But this page is updated (1.3.0)

[jira] [Commented] (SPARK-5310) Update SQL programming guide for 1.3

2015-03-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362757#comment-14362757 ] Patrick Wendell commented on SPARK-5310: [~lian cheng] and [~marmbrus

[jira] [Commented] (SPARK-6313) Fetch File Lock file creation doesnt work when Spark working dir is on a NFS mount

2015-03-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362687#comment-14362687 ] Patrick Wendell commented on SPARK-6313: [~joshrosen] changing default caching

[jira] [Updated] (SPARK-6313) Fetch File Lock file creation doesnt work when Spark working dir is on a NFS mount

2015-03-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6313: --- Target Version/s: 1.3.1 Fetch File Lock file creation doesnt work when Spark working dir

[jira] [Updated] (SPARK-4964) Exactly-once + WAL-free Kafka Support in Spark Streaming

2015-03-13 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4964: --- Assignee: Cody Koeninger Exactly-once + WAL-free Kafka Support in Spark Streaming

[jira] [Updated] (SPARK-6313) Fetch File Lock file creation doesnt work when Spark working dir is on a NFS mount

2015-03-13 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6313: --- Priority: Critical (was: Major) Fetch File Lock file creation doesnt work when Spark

[ANNOUNCE] Announcing Spark 1.3!

2015-03-13 Thread Patrick Wendell
Hi All, I'm happy to announce the availability of Spark 1.3.0! Spark 1.3.0 is the fourth release on the API-compatible 1.X line. It is Spark's largest release ever, with contributions from 172 developers and more than 1,000 commits! Visit the release notes [1] to read about the new features, or

[ANNOUNCE] Announcing Spark 1.3!

2015-03-13 Thread Patrick Wendell
Hi All, I'm happy to announce the availability of Spark 1.3.0! Spark 1.3.0 is the fourth release on the API-compatible 1.X line. It is Spark's largest release ever, with contributions from 172 developers and more than 1,000 commits! Visit the release notes [1] to read about the new features, or

[jira] [Resolved] (SPARK-6311) ChiSqTest should check for too few counts

2015-03-12 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-6311. Resolution: Duplicate ChiSqTest should check for too few counts

[jira] [Resolved] (SPARK-6310) ChiSqTest should check for too few counts

2015-03-12 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-6310. Resolution: Duplicate ChiSqTest should check for too few counts

[jira] [Commented] (SPARK-5654) Integrate SparkR into Apache Spark

2015-03-12 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359166#comment-14359166 ] Patrick Wendell commented on SPARK-5654: I see the decision here as somewhat

Re: How to set per-user spark.local.dir?

2015-03-11 Thread Patrick Wendell
We don't support expressions or wildcards in that configuration. For each application, the local directories need to be constant. If you have users submitting different Spark applications, those can each set spark.local.dirs. - Patrick On Wed, Mar 11, 2015 at 12:14 AM, Jianshi Huang

[jira] [Resolved] (SPARK-4924) Factor out code to launch Spark applications into a separate library

2015-03-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4924. Resolution: Fixed Fix Version/s: 1.4.0 Glad to finally have this in. Thanks for all

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-09 Thread Patrick Wendell
on yarn on hadoop 2.6 in cluster and client mode. Tom On Thursday, March 5, 2015 8:53 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.3.0! The tag to be voted on is v1.3.0-rc2 (commit 4aaf48d4): https://git-wip

[jira] [Updated] (SPARK-6050) Spark on YARN does not work --executor-cores is specified

2015-03-09 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6050: --- Fix Version/s: (was: 1.4.0) Spark on YARN does not work --executor-cores is specified

[jira] [Commented] (SPARK-5134) Bump default Hadoop version to 2+

2015-03-08 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352268#comment-14352268 ] Patrick Wendell commented on SPARK-5134: Hey [~rdub] [~srowen], As part

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-08 Thread Patrick Wendell
We probably want to revisit the way we do binaries in general for 1.4+. IMO, something worth forking a separate thread for. I've been hesitating to add new binaries because people (understandably) complain if you ever stop packaging older ones, but on the other hand the ASF has complained that we

[jira] [Commented] (SPARK-1239) Don't fetch all map output statuses at each reducer during shuffles

2015-03-08 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352356#comment-14352356 ] Patrick Wendell commented on SPARK-1239: It would be helpful if any users who have

Re: Block Transfer Service encryption support

2015-03-08 Thread Patrick Wendell
I think that yes, longer term we want to have encryption of all communicated data. However Jeff, can you open a JIRA to discuss the design before opening a pull request (it's fine to link to a WIP branch if you'd like)? I'd like to better understand the performance and operational complexity of

[jira] [Comment Edited] (SPARK-5134) Bump default Hadoop version to 2+

2015-03-08 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352268#comment-14352268 ] Patrick Wendell edited comment on SPARK-5134 at 3/8/15 11:27 PM

[jira] [Commented] (SPARK-5134) Bump default Hadoop version to 2+

2015-03-08 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352341#comment-14352341 ] Patrick Wendell commented on SPARK-5134: [~shivaram] did it end up working alright

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Patrick Wendell
. There the vendors can add the latest downloads - for example when 1.4 is released, HDP can build a release of HDP Spark 1.4 bundle. Cheers k/ On Sun, Mar 8, 2015 at 2:11 PM, Patrick Wendell pwend...@gmail.com wrote: We probably want to revisit the way we do binaries in general for 1.4+. IMO, something

[jira] [Updated] (SPARK-6189) Pandas to DataFrame conversion should check field names for periods

2015-03-07 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6189: --- Component/s: DataFrame Pandas to DataFrame conversion should check field names for periods

[jira] [Updated] (SPARK-6208) executor-memory does not work when using local cluster

2015-03-07 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6208: --- Issue Type: New Feature (was: Bug) executor-memory does not work when using local cluster

[jira] [Updated] (SPARK-4123) Show new dependencies added in pull requests

2015-03-07 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4123: --- Assignee: Brennon York Show new dependencies added in pull requests

[jira] [Commented] (SPARK-4123) Show new dependencies added in pull requests

2015-03-07 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351932#comment-14351932 ] Patrick Wendell commented on SPARK-4123: Hey [~boyork] sorry for the delay

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Patrick Wendell
Hey Sean, SPARK-5310 Update SQL programming guide for 1.3 SPARK-5183 Document data source API SPARK-6128 Update Spark Streaming Guide for Spark 1.3 For these, the issue is that they are documentation JIRA's, which don't need to be timed exactly with the release vote, since we can update the

[jira] [Updated] (SPARK-6154) Build error with Scala 2.11 for v1.3.0-rc2

2015-03-06 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6154: --- Component/s: (was: SQL) Build Build error with Scala 2.11 for v1.3.0

[jira] [Commented] (SPARK-6154) Build error with Scala 2.11 for v1.3.0-rc2

2015-03-06 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350915#comment-14350915 ] Patrick Wendell commented on SPARK-6154: Can you give the exact set of flags you

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Patrick Wendell
affects a subset of build profiles. On Fri, Mar 6, 2015 at 6:43 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Sean, SPARK-5310 Update SQL programming guide for 1.3 SPARK-5183 Document data source API SPARK-6128 Update Spark Streaming Guide for Spark 1.3 For these, the issue

[jira] [Commented] (SPARK-6154) Build error with Scala 2.11 for v1.3.0-rc2

2015-03-06 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350975#comment-14350975 ] Patrick Wendell commented on SPARK-6154: Oh I remember now, we don't support

[jira] [Updated] (SPARK-5183) Document data source API

2015-03-06 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5183: --- Priority: Critical (was: Blocker) Document data source API

[jira] [Updated] (SPARK-5310) Update SQL programming guide for 1.3

2015-03-06 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5310: --- Priority: Critical (was: Blocker) Update SQL programming guide for 1.3

[jira] [Updated] (SPARK-6128) Update Spark Streaming Guide for Spark 1.3

2015-03-06 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6128: --- Priority: Critical (was: Blocker) Update Spark Streaming Guide for Spark 1.3

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Patrick Wendell
, Mar 6, 2015 at 9:17 PM, Patrick Wendell pwend...@gmail.com wrote: Sean, The docs are distributed and consumed in a fundamentally different way than Spark code itself. So we've always considered the deadline for doc changes to be when the release is finally posted. If there are small

[jira] [Updated] (SPARK-5345) Flaky test: o.a.s.deploy.history.FsHistoryProviderSuite

2015-03-06 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5345: --- Fix Version/s: 1.3.0 Flaky test: o.a.s.deploy.history.FsHistoryProviderSuite

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Patrick Wendell
I'll kick it off with a +1. On Thu, Mar 5, 2015 at 6:52 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.3.0! The tag to be voted on is v1.3.0-rc2 (commit 4aaf48d4): https://git-wip-us.apache.org/repos/asf?p=spark.git

[jira] [Resolved] (SPARK-6182) spark-parent pom needs to be published for both 2.10 and 2.11

2015-03-05 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-6182. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: Sean Owen spark-parent

Re: Spark v1.2.1 failing under BigTop build in External Flume Sink (due to missing Netty library)

2015-03-05 Thread Patrick Wendell
You may need to add the -Phadoop-2.4 profile. When building or release packages for Hadoop 2.4 we use the following flags: -Phadoop-2.4 -Phive -Phive-thriftserver -Pyarn - Patrick On Thu, Mar 5, 2015 at 12:47 PM, Kelly, Jonathan jonat...@amazon.com wrote: I confirmed that this has nothing to

[jira] [Updated] (SPARK-6175) Executor log links are using internal addresses in EC2; display `:0` when ephemeral ports are used

2015-03-05 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6175: --- Priority: Blocker (was: Major) Executor log links are using internal addresses in EC2

[jira] [Updated] (SPARK-6141) Upgrade Breeze to 0.11 to fix convergence bug

2015-03-05 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6141: --- Fix Version/s: (was: 1.3.1) 1.3.0 Upgrade Breeze to 0.11 to fix

Re: enum-like types in Spark

2015-03-05 Thread Patrick Wendell
= _MemoryOnly private[this] case object _DiskOnly extends StorageLevel final val DiskOnly: StorageLevel = _DiskOnly } On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell pwend...@gmail.com wrote: I like #4 as well and agree with Aaron's suggestion. - Patrick On Wed, Mar 4, 2015

Re: [VOTE] Release Apache Spark 1.3.0 (RC2)

2015-03-04 Thread Patrick Wendell
consider https://issues.apache.org/jira/browse/SPARK-6144 a serious regression from 1.2 (since it affects existing addFile() functionality if the URL is hdfs:...). Will test other parts separately. On Tue, Mar 3, 2015 at 8:19 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote

[jira] [Updated] (SPARK-6149) Spark SQL CLI doesn't work when compiled against Hive 12 with SBT because of runtime incompatibility issues caused by Guava 15

2015-03-04 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6149: --- Priority: Critical (was: Blocker) Spark SQL CLI doesn't work when compiled against Hive 12

[jira] [Commented] (SPARK-6149) Spark SQL CLI doesn't work when compiled against Hive 12 with SBT because of runtime incompatibility issues caused by Guava 15

2015-03-04 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347210#comment-14347210 ] Patrick Wendell commented on SPARK-6149: Since this only affects the sbt build

[jira] [Commented] (SPARK-5143) spark-network-yarn 2.11 depends on spark-network-shuffle 2.10

2015-03-04 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347567#comment-14347567 ] Patrick Wendell commented on SPARK-5143: Yes - good catch Sean. Curious

[jira] [Updated] (SPARK-6144) When in cluster mode using ADD JAR with a hdfs:// sourced jar will fail

2015-03-04 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6144: --- Component/s: Spark Core When in cluster mode using ADD JAR with a hdfs:// sourced jar

Re: Task result is serialized twice by serializer and closure serializer

2015-03-04 Thread Patrick Wendell
Hey Mingyu, I think it's broken out separately so we can record the time taken to serialize the result. Once we serializing it once, the second serialization should be really simple since it's just wrapping something that has already been turned into a byte buffer. Do you see a specific issue

Re: enum-like types in Spark

2015-03-04 Thread Patrick Wendell
I like #4 as well and agree with Aaron's suggestion. - Patrick On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson ilike...@gmail.com wrote: I'm cool with #4 as well, but make sure we dictate that the values should be defined within an object with the same name as the enumeration (like we do for

[jira] [Created] (SPARK-6182) spark-parent pom needs to be published for both 2.10 and 2.11

2015-03-04 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-6182: -- Summary: spark-parent pom needs to be published for both 2.10 and 2.11 Key: SPARK-6182 URL: https://issues.apache.org/jira/browse/SPARK-6182 Project: Spark

[jira] [Resolved] (SPARK-6149) Spark SQL CLI doesn't work when compiled against Hive 12 with SBT because of runtime incompatibility issues caused by Guava 15

2015-03-04 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-6149. Resolution: Fixed Fix Version/s: 1.3.0 Spark SQL CLI doesn't work when compiled

Re: Task result is serialized twice by serializer and closure serializer

2015-03-04 Thread Patrick Wendell
for the serialized task result shouldn¹t account for the majority of memory footprint anyways, I¹m okay with leaving it as is, then. Thanks, Mingyu On 3/4/15, 5:07 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Mingyu, I think it's broken out separately so we can record the time taken

[jira] [Resolved] (SPARK-5143) spark-network-yarn 2.11 depends on spark-network-shuffle 2.10

2015-03-04 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5143. Resolution: Fixed Fix Version/s: 1.3.0 spark-network-yarn 2.11 depends on spark

[jira] [Commented] (SPARK-6149) Spark SQL CLI doesn't work when compiled against Hive 12 with SBT because of runtime incompatibility issues caused by Guava 15

2015-03-03 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346515#comment-14346515 ] Patrick Wendell commented on SPARK-6149: Yes - because of this I think simply

[jira] [Commented] (SPARK-6149) Spark SQL CLI doesn't work when compiled against Hive 12 with SBT because of runtime incompatibility issues caused by Guava 15

2015-03-03 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346519#comment-14346519 ] Patrick Wendell commented on SPARK-6149: To be more specific, I am suggesting

[VOTE] Release Apache Spark 1.3.0 (RC2)

2015-03-03 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.3.0! The tag to be voted on is v1.3.0-rc2 (commit 3af2687): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=3af26870e5163438868c4eb2df88380a533bb232 The release files, including signatures, digests, etc.

[jira] [Updated] (SPARK-6144) When in cluster mode using ADD JAR with a hdfs:// sourced jar will fail

2015-03-03 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6144: --- Target Version/s: 1.3.0 When in cluster mode using ADD JAR with a hdfs:// sourced jar

[jira] [Updated] (SPARK-6144) When in cluster mode using ADD JAR with a hdfs:// sourced jar will fail

2015-03-03 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6144: --- Priority: Blocker (was: Major) When in cluster mode using ADD JAR with a hdfs:// sourced

[RESULT] [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-03-03 Thread Patrick Wendell
, Patrick Wendell pwend...@gmail.com wrote: Hey All, Just a quick updated on this thread. Issues have continued to trickle in. Not all of them are blocker level but enough to warrant another RC: I've been keeping the JIRA dashboard up and running with the latest status (sorry, long link): https

[jira] [Updated] (SPARK-6122) Upgrade Tachyon dependency to 0.6.0

2015-03-02 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6122: --- Assignee: Calvin Jia Upgrade Tachyon dependency to 0.6.0

[jira] [Resolved] (SPARK-6048) SparkConf.translateConfKey should not translate on set

2015-03-02 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-6048. Resolution: Fixed Fix Version/s: 1.3.0 SparkConf.translateConfKey should

[jira] [Updated] (SPARK-6122) Upgrade Tachyon dependency to 0.6.0

2015-03-02 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6122: --- Assignee: (was: Patrick Wendell) Upgrade Tachyon dependency to 0.6.0

[jira] [Updated] (SPARK-6122) Upgrade Tachyon dependency to 0.6.0

2015-03-02 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6122: --- Target Version/s: 1.4.0 Upgrade Tachyon dependency to 0.6.0

[jira] [Updated] (SPARK-6122) Upgrade Tachyon dependency to 0.6.0

2015-03-02 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6122: --- Fix Version/s: (was: 1.3.0) Upgrade Tachyon dependency to 0.6.0

[jira] [Updated] (SPARK-6122) Upgrade Tachyon dependency to 0.6.0

2015-03-02 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6122: --- Assignee: Patrick Wendell Upgrade Tachyon dependency to 0.6.0

[jira] [Resolved] (SPARK-6066) Metadata in event log makes it very difficult for external libraries to parse event log

2015-03-02 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-6066. Resolution: Fixed Fix Version/s: 1.3.0 Thanks Andrew and Marcelo for your work

Re: spark-ec2 default to Hadoop 2

2015-03-01 Thread Patrick Wendell
Yeah calling it Hadoop 2 was a very bad naming choice (of mine!), this was back when CDH4 was the only real distribution available with some of the newer Hadoop API's and packaging. I think to not surprise people using this, it's best to keep v1 as the default. Overall, we try not to change

[jira] [Updated] (SPARK-6087) Provide actionable exception if Kryo buffer is not large enough

2015-03-01 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6087: --- Labels: starter (was: ) Provide actionable exception if Kryo buffer is not large enough

[jira] [Commented] (SPARK-6066) Metadata in event log makes it very difficult for external libraries to parse event log

2015-02-28 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341881#comment-14341881 ] Patrick Wendell commented on SPARK-6066: [~vanzin] - yes you are right (an early

[jira] [Updated] (SPARK-6086) Exceptions in DAGScheduler.updateAccumulators

2015-02-28 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6086: --- Component/s: Spark Core Exceptions in DAGScheduler.updateAccumulators

[jira] [Updated] (SPARK-6086) Exceptions in DAGScheduler.updateAccumulators

2015-02-28 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6086: --- Description: Class Cast Exceptions in DAGScheduler.updateAccumulators, when DAGScheduler

[jira] [Created] (SPARK-6087) Provide actionable exception if Kryo buffer is not large enough

2015-02-28 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-6087: -- Summary: Provide actionable exception if Kryo buffer is not large enough Key: SPARK-6087 URL: https://issues.apache.org/jira/browse/SPARK-6087 Project: Spark

[jira] [Updated] (SPARK-6087) Provide actionable exception if Kryo buffer is not large enough

2015-02-28 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6087: --- Description: Right now if you don't have a large enough Kryo buffer, you get a really

[jira] [Resolved] (SPARK-5979) `--packages` should not exclude spark streaming assembly jars for kafka and flume

2015-02-27 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5979. Resolution: Fixed Fix Version/s: 1.3.0 `--packages` should not exclude spark

[jira] [Resolved] (SPARK-6070) Yarn Shuffle Service jar packages too many dependencies

2015-02-27 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-6070. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: Marcelo Vanzin Yarn

[jira] [Updated] (SPARK-5979) `--packages` should not exclude spark streaming assembly jars for kafka and flume

2015-02-27 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5979: --- Assignee: Burak Yavuz `--packages` should not exclude spark streaming assembly jars

[jira] [Resolved] (SPARK-6032) Move ivy logging to System.err in --packages

2015-02-27 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-6032. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: Burak Yavuz Move ivy

[jira] [Updated] (SPARK-6055) Memory leak in pyspark sql due to incorrect equality check

2015-02-27 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6055: --- Summary: Memory leak in pyspark sql due to incorrect equality check (was: memory leak

[jira] [Commented] (SPARK-6048) SparkConf.translateConfKey should translate on get, not set

2015-02-27 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340740#comment-14340740 ] Patrick Wendell commented on SPARK-6048: Okay I just talked to [~vanzin] offline

[jira] [Commented] (SPARK-6050) Spark on YARN does not work --executor-cores is specified

2015-02-27 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339857#comment-14339857 ] Patrick Wendell commented on SPARK-6050: [~mrid...@yahoo-inc.com] thanks

[jira] [Commented] (SPARK-6066) Metadata in event log makes it very difficult for external libraries to parse event log

2015-02-27 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340882#comment-14340882 ] Patrick Wendell commented on SPARK-6066: What if as a simple fix we do

Re: Is SPARK_CLASSPATH really deprecated?

2015-02-27 Thread Patrick Wendell
I think we need to just update the docs, it is a bit unclear right now. At the time, we made it worded fairly sternly because we really wanted people to use --jars when we deprecated SPARK_CLASSPATH. But there are other types of deployments where there is a legitimate need to augment the classpath

[jira] [Updated] (SPARK-6066) Metadata in event log makes it very difficult for external libraries to parse event log

2015-02-27 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6066: --- Component/s: Spark Core Metadata in event log makes it very difficult for external libraries

[jira] [Comment Edited] (SPARK-6048) SparkConf.translateConfKey should translate on get, not set

2015-02-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339629#comment-14339629 ] Patrick Wendell edited comment on SPARK-6048 at 2/27/15 2:33 AM

[jira] [Commented] (SPARK-6048) SparkConf.translateConfKey should translate on get, not set

2015-02-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339629#comment-14339629 ] Patrick Wendell commented on SPARK-6048: Hey All, No options on which design we

Re: UnusedStubClass in 1.3.0-rc1

2015-02-25 Thread Patrick Wendell
This has been around for multiple versions of Spark, so I am a bit surprised to see it not working in your build. - Patrick On Wed, Feb 25, 2015 at 9:41 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Cody, What build command are you using? In any case, we can actually comment out

Re: UnusedStubClass in 1.3.0-rc1

2015-02-25 Thread Patrick Wendell
Hey Cody, What build command are you using? In any case, we can actually comment out the unused thing now in the root pom.xml. It existed just to ensure that at least one dependency was listed in the shade plugin configuration (otherwise, some work we do that requires the shade plugin does not

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-25 Thread Patrick Wendell
:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) any ideas on this? Tom On Wednesday, February 18, 2015 2:14 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote

Re: Add PredictionIO to Powered by Spark

2015-02-24 Thread Patrick Wendell
Added - thanks! I trimmed it down a bit to fit our normal description length. On Mon, Jan 5, 2015 at 8:24 AM, Thomas Stone tho...@prediction.io wrote: Please can we add PredictionIO to https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark PredictionIO http://prediction.io/

<    3   4   5   6   7   8   9   10   11   12   >