Re: Spark 1.6.1

2016-02-01 Thread Michael Armbrust
there other blockers for Spark 1.6.1 ? >> >> Thanks >> >> On Wed, Jan 13, 2016 at 5:39 PM, Michael Armbrust <mich...@databricks.com >> > wrote: >> >>> Hey All, >>> >>> While I'm not aware of any critical issues with 1.6.0, ther

[jira] [Commented] (SPARK-13083) Small spark sql queries get blocked if there is a long running query over a lot a partitions

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126905#comment-15126905 ] Michael Armbrust commented on SPARK-13083: -- You need to also ensure the queries are running

[jira] [Resolved] (SPARK-12989) Bad interaction between StarExpansion and ExtractWindowExpressions

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12989. -- Resolution: Fixed Fix Version/s: 1.6.1 2.0.0 Issue resolved

[jira] [Resolved] (SPARK-13083) Small spark sql queries get blocked if there is a long running query over a lot a partitions

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-13083. -- Resolution: Not A Problem Assignee: Michael Armbrust > Small spark sql quer

[jira] [Resolved] (SPARK-11780) Provide type aliases in org.apache.spark.sql.types for backwards compatibility

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-11780. -- Resolution: Fixed Fix Version/s: 1.6.1 Issue resolved by pull request 10915

[jira] [Updated] (SPARK-13087) Grouping by a complex expression may lead to incorrect AttributeReferences in aggregations

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13087: - Affects Version/s: 2.0.0 > Grouping by a complex expression may lead to incorr

[jira] [Updated] (SPARK-13122) Race condition in MemoryStore.unrollSafely() causes memory leak

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13122: - Target Version/s: 1.6.1 > Race condition in MemoryStore.unrollSafely() causes mem

[jira] [Commented] (SPARK-13087) Grouping by a complex expression may lead to incorrect AttributeReferences in aggregations

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127353#comment-15127353 ] Michael Armbrust commented on SPARK-13087: -- Here's a self-contained test case: {code} test

[jira] [Updated] (SPARK-12705) Sorting column can't be resolved if it's not in projection

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12705: - Assignee: Xiao Li > Sorting column can't be resolved if it's not in project

[jira] [Updated] (SPARK-10777) order by fails when column is aliased and projection includes windowed aggregate

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-10777: - Assignee: Xiao Li > order by fails when column is aliased and projection inclu

[jira] [Created] (SPARK-13128) API for building arrays / lists encoders

2016-02-01 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-13128: Summary: API for building arrays / lists encoders Key: SPARK-13128 URL: https://issues.apache.org/jira/browse/SPARK-13128 Project: Spark Issue Type

[jira] [Commented] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126678#comment-15126678 ] Michael Armbrust commented on SPARK-13101: -- /cc [~lian cheng] [~cloud_fan] > Dataset comp

[jira] [Updated] (SPARK-13091) Rewrite/Propagate constraints for Aliases

2016-01-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13091: - Assignee: Sameer Agarwal > Rewrite/Propagate constraints for Alia

[jira] [Updated] (SPARK-13090) Add initial support for constraint propagation in SparkSQL

2016-01-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13090: - Assignee: Sameer Agarwal > Add initial support for constraint propagation in Spark

[jira] [Updated] (SPARK-13092) Track constraints in ExpressionSet

2016-01-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13092: - Assignee: Sameer Agarwal > Track constraints in Expression

[jira] [Created] (SPARK-13099) ccjlbr

2016-01-29 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-13099: Summary: ccjlbr Key: SPARK-13099 URL: https://issues.apache.org/jira/browse/SPARK-13099 Project: Spark Issue Type: Bug Reporter: Michael

Re: Spark 2.0.0 release plan

2016-01-29 Thread Michael Armbrust
ark builds to > Scala > > 2.11 with Spark 2.0? > > > > Regards > > Deenar > > > > On 27 January 2016 at 19:55, Michael Armbrust <mich...@databricks.com> > > wrote: > >> > >> We do maintenance releases on demand when there is enough to justify >

[jira] [Updated] (SPARK-13087) Grouping by a complex expression may lead to incorrect AttributeReferences in aggregations

2016-01-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13087: - Target Version/s: 1.6.1 > Grouping by a complex expression may lead to incorr

Re: Spark 2.0.0 release plan

2016-01-29 Thread Michael Armbrust
ark builds to > Scala > > 2.11 with Spark 2.0? > > > > Regards > > Deenar > > > > On 27 January 2016 at 19:55, Michael Armbrust <mich...@databricks.com> > > wrote: > >> > >> We do maintenance releases on demand when there is enough to justify >

Re: Spark 1.6.1

2016-01-29 Thread Michael Armbrust
I think this is fixed in branch-1.6 already. If you can reproduce it there can you please open a JIRA and ping me? On Fri, Jan 29, 2016 at 12:16 PM, deenar < deenar.toras...@thinkreactive.co.uk> wrote: > Hi Michael > > The Dataset aggregators do not appear to support complex Spark-SQL types. I

[jira] [Commented] (SPARK-13094) Dataset Aggregators do not work with complex types

2016-01-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124205#comment-15124205 ] Michael Armbrust commented on SPARK-13094: -- Sorry, I think I was unclear. When I said branch

[jira] [Updated] (SPARK-13094) Dataset Aggregators do not work with complex types

2016-01-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13094: - Target Version/s: 1.6.1 > Dataset Aggregators do not work with complex ty

[jira] [Commented] (SPARK-12725) SQL generation suffers from name conficts introduced by some analysis rules

2016-01-28 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122016#comment-15122016 ] Michael Armbrust commented on SPARK-12725: -- Why don't we just add a flag to AttributeReference

[jira] [Resolved] (SPARK-12926) SQLContext to display warning message when non-sql configs are being set

2016-01-28 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12926. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10849

Re: Broadcast join on multiple dataframes

2016-01-28 Thread Michael Armbrust
Can you provide the analyzed and optimized plans (explain(true)) On Thu, Jan 28, 2016 at 12:26 PM, Srikanth wrote: > Hello, > > I have a use case where one large table has to be joined with several > smaller tables. > I've added broadcast hint for all small tables in the

Re: Spark 2.0.0 release plan

2016-01-27 Thread Michael Armbrust
We do maintenance releases on demand when there is enough to justify doing one. I'm hoping to cut 1.6.1 soon, but have not had time yet. On Wed, Jan 27, 2016 at 8:12 AM, Daniel Siegmann < daniel.siegm...@teamaol.com> wrote: > Will there continue to be monthly releases on the 1.6.x branch during

Re: Spark 2.0.0 release plan

2016-01-27 Thread Michael Armbrust
We do maintenance releases on demand when there is enough to justify doing one. I'm hoping to cut 1.6.1 soon, but have not had time yet. On Wed, Jan 27, 2016 at 8:12 AM, Daniel Siegmann < daniel.siegm...@teamaol.com> wrote: > Will there continue to be monthly releases on the 1.6.x branch during

[jira] [Commented] (SPARK-8279) udf_round_3 test fails

2016-01-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117838#comment-15117838 ] Michael Armbrust commented on SPARK-8279: - I don't think it was a super principled decision

[jira] [Commented] (SPARK-12988) Can't drop columns that contain dots

2016-01-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118433#comment-15118433 ] Michael Armbrust commented on SPARK-12988: -- Here are my thoughts after discussing with [~rxin

Re: NPE from sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply?

2016-01-26 Thread Michael Armbrust
That is a bug in generated code. It would be great if you could post a reproduction. On Tue, Jan 26, 2016 at 9:15 AM, Jacek Laskowski wrote: > Hi, > > Does this say anything to anyone? :) It's with Spark 2.0.0-SNAPSHOT > built today. Is this something I could fix myself in my

[jira] [Created] (SPARK-12987) Drop fails when columns contain quotes

2016-01-25 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-12987: Summary: Drop fails when columns contain quotes Key: SPARK-12987 URL: https://issues.apache.org/jira/browse/SPARK-12987 Project: Spark Issue Type

[jira] [Updated] (SPARK-12987) Drop fails when columns contain dots

2016-01-25 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12987: - Summary: Drop fails when columns contain dots (was: Drop fails when columns contain

[jira] [Created] (SPARK-12988) Can't drop columns that contain dots

2016-01-25 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-12988: Summary: Can't drop columns that contain dots Key: SPARK-12988 URL: https://issues.apache.org/jira/browse/SPARK-12988 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-12987) Drop fails when columns contain dots

2016-01-25 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12987: - Priority: Critical (was: Major) > Drop fails when columns contain d

Re: Trouble dropping columns from a DataFrame that has other columns with dots in their names

2016-01-25 Thread Michael Armbrust
Looks like you found a bug. I've filed them here: SPARK-12987 - Drop fails when columns contain dots SPARK-12988 - Can't drop columns that contain dots On Fri, Jan 22, 2016 at 3:18 PM, Joshua

[jira] [Created] (SPARK-12989) Bad interaction between StarExpansion and ExtractWindowExpressions

2016-01-25 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-12989: Summary: Bad interaction between StarExpansion and ExtractWindowExpressions Key: SPARK-12989 URL: https://issues.apache.org/jira/browse/SPARK-12989 Project

Re: Datasets and columns

2016-01-25 Thread Michael Armbrust
The encoder is responsible for mapping your class onto some set of columns. Try running: datasetMyType.printSchema() On Mon, Jan 25, 2016 at 1:16 PM, Steve Lewis wrote: > assume I have the following code > > SparkConf sparkConf = new SparkConf(); > > JavaSparkContext

[jira] [Updated] (SPARK-12975) Throwing Exception when Bucketing Columns are part of Partitioning Columns

2016-01-25 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12975: - Assignee: Xiao Li > Throwing Exception when Bucketing Columns are part of Partition

[jira] [Resolved] (SPARK-12975) Throwing Exception when Bucketing Columns are part of Partitioning Columns

2016-01-25 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12975. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10891

Re: Datasets and columns

2016-01-25 Thread Michael Armbrust
he schema it looks like KRYO makes one column is there > a way to do a custom encoder with my own columns > On Jan 25, 2016 1:30 PM, "Michael Armbrust" <mich...@databricks.com> > wrote: > >> The encoder is responsible for mapping your class onto some set of >>

Re: I need help mapping a PairRDD solution to Dataset

2016-01-20 Thread Michael Armbrust
The analog to PairRDD is a GroupedDataset (created by calling groupBy), which offers similar functionality, but doesn't require you to construct new object that are in the form of key/value pairs. It doesn't matter if they are complex objects, as long as you can create an encoder for them

Re: I need help mapping a PairRDD solution to Dataset

2016-01-20 Thread Michael Armbrust
the issue of looking at schools in > neighboring regions > > On Wed, Jan 20, 2016 at 10:43 AM, Michael Armbrust <mich...@databricks.com > > wrote: > >> The analog to PairRDD is a GroupedDataset (created by calling groupBy), >> which offers similar functionality, b

Re: Redundant common columns of nature full outer join

2016-01-20 Thread Michael Armbrust
If you use the join that takes USING columns it should automatically coalesce (take the non null value from) the left/right columns: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L405 On Tue, Jan 19, 2016 at 10:51 PM, Zhong Wang

[jira] [Resolved] (SPARK-12816) Schema generation for type aliases does not work

2016-01-19 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12816. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10749

Re: Spark SQL -Hive transactions support

2016-01-19 Thread Michael Armbrust
We don't support Hive style transaction. On Tue, Jan 19, 2016 at 11:32 AM, hnagar wrote: > Hive has transactions support since version 0.14. > > I am using Spark 1.6, and Hive 1.2.1, are transactions supported in Spark > SQL now. I tried in the Spark-Shell and it

Re: Spark Dataset doesn't have api for changing columns

2016-01-19 Thread Michael Armbrust
In Spark 2.0 we are planning to combine DataFrame and Dataset so that all the methods will be available on either class. On Tue, Jan 19, 2016 at 3:42 AM, Milad khajavi wrote: > Hi Spark users, > > when I want to map the result of count on groupBy, I need to convert the >

Re: Serializing DataSets

2016-01-18 Thread Michael Armbrust
What error? On Mon, Jan 18, 2016 at 9:01 AM, Simon Hafner <reactorm...@gmail.com> wrote: > And for deserializing, > `sqlContext.read.parquet("path/to/parquet").as[T]` and catch the > error? > > 2016-01-14 3:43 GMT+08:00 Michael Armbrust <mich...@databricks.

[jira] [Created] (SPARK-12841) UnresolvedException with cast

2016-01-15 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-12841: Summary: UnresolvedException with cast Key: SPARK-12841 URL: https://issues.apache.org/jira/browse/SPARK-12841 Project: Spark Issue Type: Bug

Re: DataFrameWriter on partitionBy for parquet eat all RAM

2016-01-15 Thread Michael Armbrust
See here for some workarounds: https://issues.apache.org/jira/browse/SPARK-12546 On Thu, Jan 14, 2016 at 6:46 PM, Jerry Lam wrote: > Hi Arkadiusz, > > the partitionBy is not designed to have many distinct value the last time > I used it. If you search in the mailing list,

[jira] [Resolved] (SPARK-12813) Eliminate serialization for back to back operations

2016-01-14 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12813. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10747

Re: SQL UDF problem (with re to types)

2016-01-14 Thread Michael Armbrust
We automatically convert types for UDFs defined in Scala, but we can't do it in Java because the types are erased by the compiler. If you want to use double you should cast before calling the UDF. On Wed, Jan 13, 2016 at 8:10 PM, Raghu Ganti wrote: > So, when I try

Re: SQL UDF problem (with re to types)

2016-01-14 Thread Michael Armbrust
type erasure is solved through proper generics > implementation in Java 1.8). > > On Thu, Jan 14, 2016 at 1:42 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> We automatically convert types for UDFs defined in Scala, but we can't do >> it in Java because

Re: How to make Dataset api as fast as DataFrame

2016-01-13 Thread Michael Armbrust
The focus of this release was to get the API out there and there's a lot of low hanging performance optimizations. That said, there is likely always going to be some cost of materializing objects. Another note, anytime your comparing performance its useful to include the output of explain so we

Re: Serializing DataSets

2016-01-13 Thread Michael Armbrust
Yeah, thats the best way for now (note the conversion is purely logical so there is no cost of calling toDF()). We'll likely be combining the classes in Spark 2.0 to remove this awkwardness. On Tue, Jan 12, 2016 at 11:20 PM, Simon Hafner wrote: > What's the proper way to

[jira] [Created] (SPARK-12813) Eliminate serialization for back to back operations

2016-01-13 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-12813: Summary: Eliminate serialization for back to back operations Key: SPARK-12813 URL: https://issues.apache.org/jira/browse/SPARK-12813 Project: Spark

[jira] [Updated] (SPARK-11780) Provide type aliases in org.apache.spark.sql.types for backwards compatibility

2016-01-13 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-11780: - Target Version/s: 1.6.1 > Provide type aliases in org.apache.spark.sql.ty

[jira] [Resolved] (SPARK-12478) Dataset fields of product types can't be null

2016-01-13 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12478. -- Resolution: Fixed Fix Version/s: 1.6.1 This is fixed in branch-1.6 now

Spark 1.6.1

2016-01-13 Thread Michael Armbrust
Hey All, While I'm not aware of any critical issues with 1.6.0, there are several corner cases that users are hitting with the Dataset API that are fixed in branch-1.6. As such I'm considering a 1.6.1 release. At the moment there are only two critical issues targeted for 1.6.1: - SPARK-12624 -

[jira] [Updated] (SPARK-12478) Dataset fields of product types can't be null

2016-01-13 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12478: - Fix Version/s: 2.0.0 > Dataset fields of product types can't be n

[jira] [Updated] (SPARK-12783) Dataset map serialization error

2016-01-13 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12783: - Summary: Dataset map serialization error (was: Dataset map) > Dataset map serializat

Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-12 Thread Michael Armbrust
> > df1.as[TestCaseClass].map(_.toMyMap).show() //fails > > This looks like a bug. What is the error? It might be fixed in branch-1.6/master if you can test there. > Please advice on what I may be missing here? > > > Also for join, may I suggest to have a custom encoder / transformation to >

[jira] [Resolved] (SPARK-9843) Catalyst: Allow adding custom optimizers

2016-01-12 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-9843. - Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10210 [https

[jira] [Updated] (SPARK-12783) Dataset map

2016-01-12 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12783: - Assignee: Wenchen Fan Target Version/s: 1.6.1, 2.0.0 Priority

Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-12 Thread Michael Armbrust
.scala:861) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1607) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProce

Re: [Spark SQL]: Issues with writing dataframe with Append Mode to Parquet

2016-01-12 Thread Michael Armbrust
There can be dataloss when you are using the DirectOutputCommitter and speculation is turned on, so we disable it automatically. On Tue, Jan 12, 2016 at 1:11 PM, Jerry Lam wrote: > Hi spark users and developers, > > I wonder if the following observed behaviour is expected.

Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-11 Thread Michael Armbrust
> > Also, while extracting a value into Dataset using as[U] method, how could > I specify a custom encoder/translation to case class (where I don't have > the same column-name mapping or same data-type mapping)? > There is no public API yet for defining your own encoders. You change the column

[jira] [Resolved] (SPARK-12758) Add note to Spark SQL Migration section about SPARK-11724

2016-01-11 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12758. -- Resolution: Fixed Fix Version/s: 1.6.1 2.0.0 Issue resolved

[jira] [Commented] (SPARK-12714) Transforming Dataset with sequences of case classes to RDD causes Task Not Serializable exception

2016-01-11 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092591#comment-15092591 ] Michael Armbrust commented on SPARK-12714: -- Would you be able to test with {{branch-1.6}}? I

[jira] [Resolved] (SPARK-12696) Dataset serialization error

2016-01-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12696. -- Resolution: Fixed Fix Version/s: 1.6.1 Issue resolved by pull request 10650

[jira] [Updated] (SPARK-12704) we may repartition a relation even it's not needed

2016-01-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12704: - Issue Type: Improvement (was: Bug) > we may repartition a relation even it's not nee

[jira] [Commented] (SPARK-12704) we may repartition a relation even it's not needed

2016-01-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088638#comment-15088638 ] Michael Armbrust commented on SPARK-12704: -- I think this explanation might be clearer

[jira] [Created] (SPARK-12696) Dataset serialization error

2016-01-07 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-12696: Summary: Dataset serialization error Key: SPARK-12696 URL: https://issues.apache.org/jira/browse/SPARK-12696 Project: Spark Issue Type: Bug

Re: Dataset throws: Task not serializable

2016-01-07 Thread Michael Armbrust
Were you running in the REPL? On Thu, Jan 7, 2016 at 10:34 AM, Michael Armbrust <mich...@databricks.com> wrote: > Thanks for providing a great description. I've opened > https://issues.apache.org/jira/browse/SPARK-12696 > > I'm actually getting a different error (running i

[jira] [Updated] (SPARK-12696) Dataset serialization error

2016-01-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12696: - Priority: Blocker (was: Major) > Dataset serialization er

Re: Dataset throws: Task not serializable

2016-01-07 Thread Michael Armbrust
Thanks for providing a great description. I've opened https://issues.apache.org/jira/browse/SPARK-12696 I'm actually getting a different error (running in notebooks though). Something seems wrong either way. > > *P.S* mapping by name with case classes doesn't work if the order of the > fields

[jira] [Updated] (SPARK-12696) Dataset serialization error

2016-01-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12696: - Target Version/s: 1.6.1 (was: 1.6.1, 2.0.0) > Dataset serialization er

[jira] [Resolved] (SPARK-11878) Eliminate distribute by in case group by is present with exactly the same grouping expressions

2016-01-06 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-11878. -- Resolution: Fixed Assignee: Yash Datta Fix Version/s: 2.0.0

Re: [Spark-SQL] Custom aggregate function for GrouppedData

2016-01-06 Thread Michael Armbrust
In Spark 1.6 GroupedDataset has mapGroups, which sounds like what you are looking for. You can also write a custom Aggregator

Re: problem with DataFrame df.withColumn() org.apache.spark.sql.AnalysisException: resolved attribute(s) missing

2016-01-06 Thread Michael Armbrust
> > I really appreciate your help. I The following code works. > Glad you got it to work! Is there a way this example can be added to the distribution to make it > easier for future java programmers? It look me a long time get to this > simple solution. > I'd welcome a pull request that added

Re: problem with DataFrame df.withColumn() org.apache.spark.sql.AnalysisException: resolved attribute(s) missing

2016-01-06 Thread Michael Armbrust
oh, and I think I installed jekyll using "gem install jekyll" On Wed, Jan 6, 2016 at 4:17 PM, Michael Armbrust <mich...@databricks.com> wrote: > from docs/ run: > > SKIP_API=1 jekyll serve --watch > > On Wed, Jan 6, 2016 at 4:12 PM, Andy Davidson < > a...@san

Re: problem with DataFrame df.withColumn() org.apache.spark.sql.AnalysisException: resolved attribute(s) missing

2016-01-06 Thread Michael Armbrust
amples are not rendering correctly. I am on a mac and using > https://itunes.apple.com/us/app/marked-2/id890031187?mt=12 > > I use a emacs or some other text editor to change the md. > > What tools do you use for editing viewing spark markdown files? > > Andy > > > &g

Re: Timeout connecting between workers after upgrade to 1.6

2016-01-06 Thread Michael Armbrust
Logs from the workers? On Wed, Jan 6, 2016 at 1:57 PM, Jeff Jones wrote: > I upgraded our Spark standalone cluster from 1.4.1 to 1.6.0 yesterday. We > are now seeing regular timeouts between two of the workers when making > connections. These workers and the same

[jira] [Resolved] (SPARK-12438) Add SQLUserDefinedType support for encoder

2016-01-05 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12438. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10390

Re: Spark 1.6 - Datasets and Avro Encoders

2016-01-05 Thread Michael Armbrust
You could try with the `Encoders.bean` method. It detects classes that have getters and setters. Please report back! On Tue, Jan 5, 2016 at 9:45 AM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Hi everyone, > considering the new Datasets API, will there be Encoders defined for

[jira] [Resolved] (SPARK-12439) Fix toCatalystArray and MapObjects

2016-01-05 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12439. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10391

Re: problem with DataFrame df.withColumn() org.apache.spark.sql.AnalysisException: resolved attribute(s) missing

2016-01-05 Thread Michael Armbrust
> > I am trying to implement org.apache.spark.ml.Transformer interface in > Java 8. > My understanding is the sudo code for transformers is something like > > @Override > > public DataFrame transform(DataFrame df) { > > 1. Select the input column > > 2. Create a new column > > 3. Append the

Re: Spark 1.6 - Datasets and Avro Encoders

2016-01-05 Thread Michael Armbrust
On Tue, Jan 5, 2016 at 1:31 PM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > I'll do, but if you want my two cents, creating a dedicated "optimised" > encoder for Avro would be great (especially if it's possible to do better > than plain AvroKeyValueOutputFormat with

[jira] [Updated] (SPARK-12439) Fix toCatalystArray and MapObjects

2016-01-05 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12439: - Assignee: Liang-Chi Hsieh (was: Apache Spark) > Fix toCatalystArray and MapObje

Re: How to concat few rows into a new column in dataframe

2016-01-05 Thread Michael Armbrust
This would also be possible with an Aggregator in Spark 1.6: https://docs.cloud.databricks.com/docs/spark/1.6/index.html#examples/Dataset%20Aggregator.html On Tue, Jan 5, 2016 at 2:59 PM, Ted Yu wrote: > Something like the following: > > val zeroValue =

[jira] [Resolved] (SPARK-12504) JDBC data source credentials are not masked in the data frame explain output.

2016-01-05 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12504. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10452

[jira] [Resolved] (SPARK-12421) Fix copy() method of GenericRow

2016-01-04 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12421. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10553

[jira] [Updated] (SPARK-12421) Fix copy() method of GenericRow

2016-01-04 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12421: - Assignee: Herman van Hovell > Fix copy() method of Generic

Re: Is Spark 1.6 released?

2016-01-04 Thread Michael Armbrust
I also wrote about it here: https://databricks.com/blog/2016/01/04/introducing-spark-datasets.html And put together a bunch of examples here: https://docs.cloud.databricks.com/docs/spark/1.6/index.html On Mon, Jan 4, 2016 at 12:02 PM, Annabel Melongo < melongo_anna...@yahoo.com.invalid> wrote:

[jira] [Updated] (SPARK-12512) WithColumn does not work on multiple column with special character

2016-01-04 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12512: - Assignee: Xiu (Joe) Guo (was: Apache Spark) > WithColumn does not work on multi

[jira] [Resolved] (SPARK-12512) WithColumn does not work on multiple column with special character

2016-01-04 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12512. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10500

Re: Is Spark 1.6 released?

2016-01-04 Thread Michael Armbrust
> > bq. In many cases, the current implementation of the Dataset API does not > yet leverage the additional information it has and can be slower than RDDs. > > Are the characteristics of cases above known so that users can decide which > API to use ? > Lots of back to back operations aren't great

[jira] [Resolved] (SPARK-12600) Remove deprecated methods in SQL / DataFrames

2016-01-04 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12600. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10559

Re: problem with DataFrame df.withColumn() org.apache.spark.sql.AnalysisException: resolved attribute(s) missing

2016-01-04 Thread Michael Armbrust
Its not really possible to convert an RDD to a Column. You can think of a Column as an expression that produces a single output given some set of input columns. If I understand your code correctly, I think this might be easier to express as a UDF: sqlContext.udf().register("stem", new

[jira] [Resolved] (SPARK-12568) Add BINARY to Encoders

2016-01-04 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12568. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10516

[ANNOUNCE] Announcing Spark 1.6.0

2016-01-04 Thread Michael Armbrust
Hi All, Spark 1.6.0 is the seventh release on the 1.x line. This release includes patches from 248+ contributors! To download Spark 1.6.0 visit the downloads page. (It may take a while for all mirrors to update.) A huge thanks go to all of the individuals and organizations involved in

<    7   8   9   10   11   12   13   14   15   16   >