[ANNOUNCE] Announcing Spark 1.6.0

2016-01-04 Thread Michael Armbrust
Hi All, Spark 1.6.0 is the seventh release on the 1.x line. This release includes patches from 248+ contributors! To download Spark 1.6.0 visit the downloads page. (It may take a while for all mirrors to update.) A huge thanks go to all of the individuals and organizations involved in

[jira] [Resolved] (SPARK-12495) use true as default value for propagateNull in NewInstance

2015-12-30 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12495. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10443

Re: problem with reading source code-pull out nondeterministic expresssions

2015-12-30 Thread Michael Armbrust
The goal here is to ensure that the non-deterministic value is evaluated only once, so the result won't change for a given row (i.e. when sorting). On Tue, Dec 29, 2015 at 10:57 PM, 汪洋 wrote: > Hi fellas, > I am new to spark and I have a newbie question. I am currently

[jira] [Created] (SPARK-12568) Add BINARY to Encoders

2015-12-29 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-12568: Summary: Add BINARY to Encoders Key: SPARK-12568 URL: https://issues.apache.org/jira/browse/SPARK-12568 Project: Spark Issue Type: Sub-task

[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality

2015-12-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074327#comment-15074327 ] Michael Armbrust commented on SPARK-6459: - We do special case that very common problem, but I bet

[jira] [Created] (SPARK-12564) Improve missing column AnalysisException

2015-12-29 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-12564: Summary: Improve missing column AnalysisException Key: SPARK-12564 URL: https://issues.apache.org/jira/browse/SPARK-12564 Project: Spark Issue Type

[jira] [Created] (SPARK-12562) DataFrame.write.format("text") requires the column name to be called value

2015-12-29 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-12562: Summary: DataFrame.write.format("text") requires the column name to be called value Key: SPARK-12562 URL: https://issues.apache.org/jira/browse/S

[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality

2015-12-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074279#comment-15074279 ] Michael Armbrust commented on SPARK-6459: - I don't think so, why do you say that? > Warn w

[jira] [Resolved] (SPARK-12441) Fixing missingInput in all Logical/Physical operators

2015-12-28 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12441. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10393

Re: trouble understanding data frame memory usage ³java.io.IOException: Unable to acquire memory²

2015-12-28 Thread Michael Armbrust
Unfortunately in 1.5 we didn't force operators to spill when ran out of memory so there is not a lot you can do. It would be awesome if you could test with 1.6 and see if things are any better? On Mon, Dec 28, 2015 at 2:25 PM, Andy Davidson < a...@santacruzintegration.com> wrote: > I am using

[jira] [Resolved] (SPARK-12231) Failed to generate predicate Error when using dropna

2015-12-28 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12231. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10388

[jira] [Updated] (SPARK-12231) Failed to generate predicate Error when using dropna

2015-12-28 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12231: - Assignee: kevin yu > Failed to generate predicate Error when using dro

[jira] [Resolved] (SPARK-7727) Avoid inner classes in RuleExecutor

2015-12-28 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-7727. - Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10174 [https

[jira] [Commented] (SPARK-12533) hiveContext.table() throws the wrong exception

2015-12-28 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073052#comment-15073052 ] Michael Armbrust commented on SPARK-12533: -- Just call the table method with a table that doesn't

Re: partitioning json data in spark

2015-12-28 Thread Michael Armbrust
I don't think thats true (though if the docs are wrong we should fix that). In Spark 1.5 we converted JSON to go through the same code path as parquet. On Mon, Dec 28, 2015 at 12:20 AM, Նարեկ Գալստեան wrote: > Well, I could try to do that, > but *partitionBy *method is

[jira] [Updated] (SPARK-12319) Address endian specific problems surfaced in 1.6

2015-12-28 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12319: - Target Version/s: 2.0.0 > Address endian specific problems surfaced in

[jira] [Commented] (SPARK-12319) Address endian specific problems surfaced in 1.6

2015-12-28 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073099#comment-15073099 ] Michael Armbrust commented on SPARK-12319: -- Do you want to open a PR with your failing test case

[jira] [Updated] (SPARK-12287) Support UnsafeRow in MapPartitions/MapGroups/CoGroup

2015-12-28 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12287: - Assignee: Xiao Li > Support UnsafeRow in MapPartitions/MapGroups/CoGr

[jira] [Resolved] (SPARK-12287) Support UnsafeRow in MapPartitions/MapGroups/CoGroup

2015-12-28 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12287. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10398

[jira] [Updated] (SPARK-11600) Spark MLlib 1.6 QA umbrella

2015-12-27 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-11600: - Target Version/s: 1.6.1 (was: 1.6.0) > Spark MLlib 1.6 QA umbre

[jira] [Updated] (SPARK-8447) Test external shuffle service with all shuffle managers

2015-12-27 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-8447: Target Version/s: 1.6.1 (was: 1.6.0) > Test external shuffle service with all shuf

[jira] [Updated] (SPARK-11224) Flaky test: o.a.s.ExternalShuffleServiceSuite

2015-12-27 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-11224: - Target Version/s: 1.6.1 (was: 1.6.0) > Flaky test: o.a.s.ExternalShuffleServiceSu

[jira] [Updated] (SPARK-11266) Peak memory tests swallow failures

2015-12-27 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-11266: - Target Version/s: 1.6.1 (was: 1.6.0) > Peak memory tests swallow failu

[jira] [Updated] (SPARK-11607) Update MLlib website for 1.6

2015-12-27 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-11607: - Target Version/s: 1.6.1 (was: 1.6.0) > Update MLlib website for

[jira] [Updated] (SPARK-11603) ML 1.6 QA: API: Experimental, DeveloperApi, final, sealed audit

2015-12-27 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-11603: - Target Version/s: 1.6.1 (was: 1.6.0) > ML 1.6 QA: API: Experimental, Developer

[jira] [Updated] (SPARK-10680) Flaky test: network.RequestTimeoutIntegrationSuite.timeoutInactiveRequests

2015-12-27 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-10680: - Target Version/s: 1.6.1 (was: 1.6.0) > Flaky t

[jira] [Updated] (SPARK-12507) Update Streaming configurations for 1.6

2015-12-27 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12507: - Target Version/s: 1.6.1 (was: 1.6.0) > Update Streaming configurations for

[jira] [Updated] (SPARK-12532) Join-key Pushdown via Predicate Transitivity

2015-12-27 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12532: - Shepherd: Michael Armbrust > Join-key Pushdown via Predicate Transitiv

[jira] [Updated] (SPARK-12532) Join-key Pushdown via Predicate Transitivity

2015-12-27 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12532: - Target Version/s: 2.0.0 > Join-key Pushdown via Predicate Transitiv

[jira] [Updated] (SPARK-12532) Join-key Pushdown via Predicate Transitivity

2015-12-27 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12532: - Assignee: Xiao Li > Join-key Pushdown via Predicate Transitiv

Re: Passing parameters to spark SQL

2015-12-27 Thread Michael Armbrust
The only way to do this for SQL is though the JDBC driver. However, you can use literal values without lossy/unsafe string conversions by using the DataFrame API. For example, to filter: import org.apache.spark.sql.functions._ df.filter($"columnName" === lit(value)) On Sun, Dec 27, 2015 at

[jira] [Updated] (SPARK-12505) Pushdown a Limit on top of an Outer-Join

2015-12-27 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12505: - Target Version/s: 2.0.0 > Pushdown a Limit on top of an Outer-J

[jira] [Created] (SPARK-12533) hiveContext.table() throws the wrong exception

2015-12-27 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-12533: Summary: hiveContext.table() throws the wrong exception Key: SPARK-12533 URL: https://issues.apache.org/jira/browse/SPARK-12533 Project: Spark Issue

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-22 Thread Michael Armbrust
I'll kick the voting off with a +1. On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust <mich...@databricks.com> wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.6.0! > > The vote is open until Friday, December 25, 2015 at 18:00 UTC and pass

[VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-22 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 1.6.0! The vote is open until Friday, December 25, 2015 at 18:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.6.0 [ ] -1 Do not release this package because

Re: Writing partitioned Avro data to HDFS

2015-12-22 Thread Michael Armbrust
You need to say .mode("append") if you want to append to existing data. On Tue, Dec 22, 2015 at 6:48 AM, Yash Sharma wrote: > Well you are right. Having a quick glance at the source[1] I see that the > path creation does not consider the partitions. > > It tries to create

[jira] [Resolved] (SPARK-12321) JSON format for logical/physical execution plans

2015-12-21 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12321. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10311

[jira] [Resolved] (SPARK-12398) Smart truncation of DataFrame / Dataset toString

2015-12-21 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12398. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10373

[jira] [Updated] (SPARK-12321) JSON format for logical/physical execution plans

2015-12-21 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12321: - Assignee: Wenchen Fan > JSON format for logical/physical execution pl

[jira] [Resolved] (SPARK-12374) Improve performance of Range APIs via adding logical/physical operators

2015-12-21 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12374. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10335

[jira] [Resolved] (SPARK-12150) numPartitions argument to sqlContext.range() should be optional

2015-12-21 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12150. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10335

Re: Expression/LogicalPlan dichotomy in Spark SQL Catalyst

2015-12-21 Thread Michael Armbrust
> > Why was the choice made in Catalyst to make LogicalPlan/QueryPlan and > Expression separate subclasses of TreeNode, instead of e.g. also make > QueryPlan inherit from Expression? > I think this is a pretty common way to model things (glancing at postgres it looks similar). Expression and

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-21 Thread Michael Armbrust
couple Stream Apps, all seem ok. >> >> On Wed, Dec 16, 2015 at 1:32 PM, Michael Armbrust <mich...@databricks.com >> > wrote: >> >>> Please vote on releasing the following candidate as Apache Spark version >>> 1.6.0! >>> >>> Th

Re: Joining DataFrames - Causing Cartesian Product

2015-12-18 Thread Michael Armbrust
This is fixed in Spark 1.6. On Fri, Dec 18, 2015 at 3:06 PM, Prasad Ravilla wrote: > Changing equality check from “<=>”to “===“ solved the problem. > Performance skyrocketed. > > I am wondering why “<=>” cause a performance degrade? > > val dates = new RetailDates() > val

Re: Is DataFrame.groupBy supposed to preserve order within groups?

2015-12-18 Thread Michael Armbrust
You need to use window functions to get this kind of behavior. Or use max and a struct ( http://stackoverflow.com/questions/13523049/hive-sql-find-the-latest-record) On Thu, Dec 17, 2015 at 11:55 PM, Timothée Carayol < timothee.cara...@gmail.com> wrote: > Hi all, > > I tried to do something

[jira] [Resolved] (SPARK-12404) Ensure objects passed to StaticInvoke is Serializable

2015-12-18 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12404. -- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 10357

[jira] [Resolved] (SPARK-12397) Improve error messages for data sources when they are not found

2015-12-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12397. -- Resolution: Fixed Fix Version/s: 1.6.0 2.0.0 Issue resolved

[jira] [Resolved] (SPARK-12164) [SQL] Display the binary/encoded values

2015-12-16 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12164. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10215

[jira] [Updated] (SPARK-12164) [SQL] Display the binary/encoded values

2015-12-16 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12164: - Assignee: Xiao Li > [SQL] Display the binary/encoded val

[jira] [Resolved] (SPARK-12320) throw exception if the number of fields does not line up for Tuple encoder

2015-12-16 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12320. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10293

[jira] [Updated] (SPARK-12320) throw exception if the number of fields does not line up for Tuple encoder

2015-12-16 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12320: - Assignee: Wenchen Fan > throw exception if the number of fields does not line

[jira] [Resolved] (SPARK-11677) ORC filter tests all pass if filters are actually not pushed down.

2015-12-16 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-11677. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 9687

[VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 1.6.0! The vote is open until Saturday, December 19, 2015 at 18:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.6.0 [ ] -1 Do not release this package

[jira] [Updated] (SPARK-11677) ORC filter tests all pass if filters are actually not pushed down.

2015-12-16 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-11677: - Assignee: Hyukjin Kwon > ORC filter tests all pass if filters are actually not pus

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Michael Armbrust
+1 On Wed, Dec 16, 2015 at 4:37 PM, Andrew Or wrote: > +1 > > Mesos cluster mode regression in RC2 is now fixed (SPARK-12345 > / PR10332 > ). > > Also tested on standalone

[jira] [Resolved] (SPARK-6936) SQLContext.sql() caused deadlock in multi-thread env

2015-12-16 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-6936. - Resolution: Fixed Fix Version/s: 1.5.0 > SQLContext.sql() caused deadlock in mu

Re: how to make a dataframe of Array[Doubles] ?

2015-12-15 Thread Michael Armbrust
You don't have to turn your array into a tuple, but you do need to have a product that wraps it (this is how we get names for the columns). case class MyData(data: Array[Double]) val df = Seq(MyData(Array(1.0, 2.0, 3.0, 4.0)), ...).toDF() On Mon, Dec 14, 2015 at 9:35 PM, Jeff Zhang

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-15 Thread Michael Armbrust
l:SPARK_VERSION: 1.6.0-SNAPSHOT > > On Mon, Dec 14, 2015 at 6:51 PM, Krishna Sankar <ksanka...@gmail.com> > wrote: > >> Guys, >>The sc.version gives 1.6.0-SNAPSHOT. Need to change to 1.6.0. Can you >> pl verify ? >> Cheers >> >> &g

[jira] [Resolved] (SPARK-12271) Improve error message for Dataset.as[] when the schema is incompatible.

2015-12-15 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12271. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10260

[jira] [Resolved] (SPARK-12236) JDBC filter tests all pass if filters are not really pushed down

2015-12-15 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12236. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10221

[jira] [Resolved] (SPARK-12274) WrapOption should not have type constraint for child

2015-12-14 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12274. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10263

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-14 Thread Michael Armbrust
wrote: > +1 tested SparkSQL and Streaming on some production sized workloads > > On Sat, Dec 12, 2015 at 4:16 PM, Mark Hamstra <m...@clearstorydata.com> > wrote: > >> +1 >> >> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust <mich...@databricks.com >>

Re: Concatenate a string to a Column of type string in DataFrame

2015-12-14 Thread Michael Armbrust
In earlier versions you should be able to use callUdf or callUDF (depending on which version) and call the hive function "concat". On Sun, Dec 13, 2015 at 3:05 AM, Yanbo Liang wrote: > Sorry, it was added since 1.5.0. > > 2015-12-13 2:07 GMT+08:00 Satish

Re: RuntimeException: Failed to check null bit for primitive int type

2015-12-14 Thread Michael Armbrust
Your code (at com.ctrip.ml.toolimpl.MetadataImpl$$anonfun$1.apply(MetadataImpl.scala:22)) needs to check isNullAt before calling getInt. This is because you cannot return null for a primitive value (Int). On Mon, Dec 14, 2015 at 3:40 AM, zml张明磊 wrote: > Hi, > > > >

Re: Kryo serialization fails when using SparkSQL and HiveContext

2015-12-14 Thread Michael Armbrust
You'll need to either turn off registration (spark.kryo.registrationRequired) or create a custom register spark.kryo.registrator http://spark.apache.org/docs/latest/configuration.html#compression-and-serialization On Mon, Dec 14, 2015 at 2:17 AM, Linh M. Tran wrote: >

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Michael Armbrust
I'll kick off the voting with a +1. On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust <mich...@databricks.com> wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.6.0! > > The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and pass

[VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 1.6.0! The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.6.0 [ ] -1 Do not release this package because

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Michael Armbrust
t; <https://github.com/apache/spark/pull/10193>: > Element[W|w]iseProductExample.scala is not the same in the docs and the > actual file name. > > On Sat, Dec 12, 2015 at 6:39 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> I'll kick off the vot

Re: spark data frame write.mode("append") bug

2015-12-12 Thread Michael Armbrust
If you want to contribute to the project open a JIRA/PR: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Sat, Dec 12, 2015 at 3:13 AM, kali.tumm...@gmail.com < kali.tumm...@gmail.com> wrote: > Hi All, > > >

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Michael Armbrust
> > I'm surprised you're suggesting there's not a coupling between a release's > code and the docs for that release. If a release happens and some time > later docs come out, that has some effect on people's usage. > I'm only suggesting that we shouldn't delay testing of the actual bits, or wait

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Michael Armbrust
can be released a little bit > after the code artifacts with last minute fixes. But, the whole release can > just happen later too. Why wouldn't this be a valid reason to block the > release? > > On Sat, Dec 12, 2015 at 6:31 PM, Michael Armbrust <mich...@databricks.com> > w

Re: Window function in Spark SQL

2015-12-11 Thread Michael Armbrust
Can you change permissions on that directory so that hive can write to it? We start up a mini version of hive so that we can use some of its functionality. On Fri, Dec 11, 2015 at 12:47 PM, Sourav Mazumder < sourav.mazumde...@gmail.com> wrote: > In 1.5.x whenever I try to create a HiveContext

Re: is Multiple Spark Contexts is supported in spark 1.5.0 ?

2015-12-11 Thread Michael Armbrust
and explained that multiple contexts per JVM is not really > supported. So, via job server, how does one support multiple contexts in > DIFFERENT JVM's? I specify multiple contexts in the conf file and the > initialization of the subsequent contexts fail. > > > > On Fr

Re: Using TestHiveContext/HiveContext in unit tests

2015-12-11 Thread Michael Armbrust
Just use TestHive. Its a global singlton that you can share across test cases. It has a reset function if you want to clear out the state at the begining of a test. On Fri, Dec 11, 2015 at 2:06 AM, Sahil Sareen wrote: > I'm trying to do this in unit tests: > > val

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

2015-12-10 Thread Michael Armbrust
We are getting close to merging patches for SPARK-12155 <https://issues.apache.org/jira/browse/SPARK-12155> and SPARK-12253 <https://issues.apache.org/jira/browse/SPARK-12253>. I'll be cutting RC2 shortly after that. Michael On Tue, Dec 8, 2015 at 10:31 AM, Michael Ar

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

2015-12-10 Thread Michael Armbrust
Cutting RC2 now. On Thu, Dec 10, 2015 at 12:59 PM, Michael Armbrust <mich...@databricks.com> wrote: > We are getting close to merging patches for SPARK-12155 > <https://issues.apache.org/jira/browse/SPARK-12155> and SPARK-12253 > <https://issues.apache.org/jira/br

Re: SQL language vs DataFrame API

2015-12-09 Thread Michael Armbrust
com> wrote: > Hi, Michael, > > Does that mean SqlContext will be built on HiveQL in the near future? > > Thanks, > > Xiao Li > > > 2015-12-09 10:36 GMT-08:00 Michael Armbrust <mich...@databricks.com>: > >> I think that it is generally good to have p

Re: [Spark-1.5.2][Hadoop-2.6][Spark SQL] Cannot run queries in SQLContext, getting java.lang.NoSuchMethodError

2015-12-09 Thread Michael Armbrust
java.lang.NoSuchMethodError almost always means you have the wrong version of some library (different than what Spark was compiled with) on your classpath.; In this case the Jackson parser. On Wed, Dec 9, 2015 at 10:38 AM, Matheus Ramos wrote: > ​I have a Java

[jira] [Commented] (SPARK-12231) Failed to generate predicate Error when using dropna

2015-12-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049127#comment-15049127 ] Michael Armbrust commented on SPARK-12231: -- It would also be helpful to know if this bug can

[jira] [Updated] (SPARK-12231) Failed to generate predicate Error when using dropna

2015-12-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12231: - Description: code to reproduce error # write.py {code} import pyspark sc

Re: Release data for spark 1.6?

2015-12-09 Thread Michael Armbrust
The release date is "as soon as possible". In order to make an Apache release we must present a release candidate and have 72-hours of voting by the PMC. As soon as there are no known bugs, the vote will pass and 1.6 will be released. In the mean time, I'd love support from the community

[jira] [Updated] (SPARK-12231) Failed to generate predicate Error when using dropna

2015-12-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12231: - Affects Version/s: 1.6.0 > Failed to generate predicate Error when using dro

[jira] [Updated] (SPARK-12231) Failed to generate predicate Error when using dropna

2015-12-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12231: - Target Version/s: 1.6.0 > Failed to generate predicate Error when using dro

Re: SQL language vs DataFrame API

2015-12-09 Thread Michael Armbrust
at 19:41, Xiao Li <gatorsm...@gmail.com> wrote: > >> That sounds great! When it is decided, please let us know and we can add >> more features and make it ANSI SQL compliant. >> >> Thank you! >> >> Xiao Li >> >> >> 2015-12-09 11:31

[jira] [Resolved] (SPARK-12201) add type coercion rule for greatest/least

2015-12-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12201. -- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 10196

[jira] [Resolved] (SPARK-12188) [SQL] Code refactoring and comment correction in Dataset APIs

2015-12-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12188. -- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 10184

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

2015-12-08 Thread Michael Armbrust
>> earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time, >>> >>> scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1, >>> >>> scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.Of

[jira] [Resolved] (SPARK-12195) Adding BigDecimal, Date and Timestamp into Encoder

2015-12-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12195. -- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 10188

[jira] [Resolved] (SPARK-12069) Documentation update for Datasets

2015-12-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12069. -- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 10060

Re: Dataset and lambas

2015-12-07 Thread Michael Armbrust
ternal types. could you point me to the jiras, > if they exist already? i just tried to find them but had little luck. > best, koert > > On Sat, Dec 5, 2015 at 4:09 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> On Sat, Dec 5, 2015 at 9:42 AM, Koert Kuipe

Re: Dataset and lambas

2015-12-07 Thread Michael Armbrust
On Sat, Dec 5, 2015 at 3:27 PM, Deenar Toraskar wrote: > > On a similar note, what is involved in getting native support for some > user defined functions, so that they are as efficient as native Spark SQL > expressions? I had one particular one - an arraySum (element

[jira] [Commented] (SPARK-12045) Use joda's DateTime to replace Calendar

2015-12-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1504#comment-1504 ] Michael Armbrust commented on SPARK-12045: -- Our general policy for exceptions is that we return

[jira] [Resolved] (SPARK-11884) Drop multiple columns in the DataFrame API

2015-12-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-11884. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 9862

Re: SparkSQL API to insert DataFrame into a static partition?

2015-12-05 Thread Michael Armbrust
> > Follow up question in this case: what is the cost of registering a temp > table? Is there a limit to the number of temp tables that can be registered > by Spark context? > It is pretty cheap. Just an entry in an in-memory hashtable to a query plan (similar to a view).

Re: Exception in thread "main" java.lang.IncompatibleClassChangeError:

2015-12-05 Thread Michael Armbrust
It seems likely you have conflicting versions of hadoop on your classpath. On Fri, Dec 4, 2015 at 2:52 PM, Prem Sure wrote: > Getting below exception while executing below program in eclipse. > any clue on whats wrong here would be helpful > > *public* *class* WordCount {

Re: Broadcasting a parquet file using spark and python

2015-12-05 Thread Michael Armbrust
e > it by myself (create a broadcast val and implement lookup by myself), but > it will make code super ugly. > > > > I hope we can have either API or hint to enforce the hashjoin (instead of > this suspicious autoBroadcastJoinThreshold parameter). Do we have any > ticket

Re: Dataset and lambas

2015-12-05 Thread Michael Armbrust
On Sat, Dec 5, 2015 at 9:42 AM, Koert Kuipers wrote: > hello all, > DataFrame internally uses a different encoding for values then what the > user sees. i assume the same is true for Dataset? > This is true. We encode objects in the tungsten binary format using code

Re: is Multiple Spark Contexts is supported in spark 1.5.0 ?

2015-12-04 Thread Michael Armbrust
On Fri, Dec 4, 2015 at 11:24 AM, Anfernee Xu wrote: > If multiple users are looking at the same data set, then it's good choice > to share the SparkContext. > > But my usercases are different, users are looking at different data(I use > custom Hadoop InputFormat to load

Re: is Multiple Spark Contexts is supported in spark 1.5.0 ?

2015-12-04 Thread Michael Armbrust
To be clear, I don't think there is ever a compelling reason to create more than one SparkContext in a single application. The context is threadsafe and can launch many jobs in parallel from multiple threads. Even if there wasn't global state that made it unsafe to do so, creating more than one

Re: Spark SQL IN Clause

2015-12-04 Thread Michael Armbrust
The best way to run this today is probably to manually convert the query into a join. I.e. create a dataframe that has all the numbers in it, and join/outer join it with the other table. This way you avoid parsing a gigantic string. On Fri, Dec 4, 2015 at 10:36 AM, Ted Yu

<    8   9   10   11   12   13   14   15   16   17   >