Re: [vote] Apache Spark 3.0 RC3

2020-06-08 Thread Michael Armbrust
+1 (binding) On Mon, Jun 8, 2020 at 1:22 PM DB Tsai wrote: > +1 (binding) > > Sincerely, > > DB Tsai > -- > Web: https://www.dbtsai.com > PGP Key ID: 42E5B25A8F7A82C1 > > On Mon, Jun 8, 2020 at 1:03 PM Dongjoon Hyun > wrote: > > > > +1 > >

Re: FYI: The evolution on `CHAR` type behavior

2020-03-17 Thread Michael Armbrust
> > What I'd oppose is to just ban char for the native data sources, and do > not have a plan to address this problem systematically. > +1 > Just forget about padding, like what Snowflake and MySQL have done. > Document that char(x) is just an alias for string. And then move on. Almost > no work

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-11 Thread Michael Armbrust
Thank you for the discussion everyone! This vote passes. I'll work to get this posed on the website. +1 Michael Armbrust Sean Owen Jules Damji 大啊 Ismaël Mejía Wenchen Fan Matei Zaharia Gengliang Wang Takeshi Yamamuro Denny Lee Xiao Li Xingbo Jiang Tkuya UESHIN Hichael Heuer John Zhuge Reynol

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-06 Thread Michael Armbrust
I'll start off the vote with a strong +1 (binding). On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust wrote: > I propose to add the following text to Spark's Semantic Versioning policy > <https://spark.apache.org/versioning-policy.html> and adopt it as the > rubric

[VOTE] Amend Spark's Semantic Versioning Policy

2020-03-06 Thread Michael Armbrust
I propose to add the following text to Spark's Semantic Versioning policy and adopt it as the rubric that should be used when deciding to break APIs (even at major versions such as 3.0). I'll leave the vote open until Tuesday, March 10th at 2pm. A

Re: [Proposal] Modification to Spark's Semantic Versioning Policy

2020-02-27 Thread Michael Armbrust
Thanks for the discussion! A few responses: The decision needs to happen at api/config change time, otherwise the > deprecated warning has no purpose if we are never going to remove them. > Even if we never remove an API, I think deprecation warnings (when done right) can still serve a purpose. F

Re: Clarification on the commit protocol

2020-02-27 Thread Michael Armbrust
No, it is not. Although the commit protocol has mostly been superseded by Delta Lake , which is available as a separate open source project that works natively with Apache Spark. In contrast to the commit protocol, Delta can guarantee full ACID (rather than just partition level at

[Proposal] Modification to Spark's Semantic Versioning Policy

2020-02-24 Thread Michael Armbrust
Hello Everyone, As more users have started upgrading to Spark 3.0 preview (including myself), there have been many discussions around APIs that have been broken compared with Spark 2.x. In many of these discussions, one of the rationales for breaking an API seems to be "Spark follows semantic vers

Re: [DISCUSSION] Esoteric Spark function `TRIM/LTRIM/RTRIM`

2020-02-21 Thread Michael Armbrust
This plan for evolving the TRIM function to be more standards compliant sounds much better to me than the original change to just switch the order. It pushes users in the right direction and cleans up our tech debt without silently breaking existing workloads. It means that programs won't return di

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-19 Thread Michael Armbrust
+1 (binding), we've test this and it LGTM. On Thu, Apr 18, 2019 at 7:51 PM Wenchen Fan wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.4.2. > > The vote is open until April 23 PST and passes if a majority +1 PMC votes > are cast, with > a minimum of 3 +1 vot

Re: Spark 2.4.2

2019-04-16 Thread Michael Armbrust
is is a small change and looks safe enough to me. I was just a > little surprised since I was expecting a correctness issue if this is > prompting a release. I'm definitely on the side of case-by-case judgments > on what to allow in patch releases and this looks fine. > > On

Re: Spark 2.4.2

2019-04-16 Thread Michael Armbrust
this behavior. Do you have a different proposal about how this should be handled? On Tue, Apr 16, 2019 at 4:23 PM Ryan Blue wrote: > Is this a bug fix? It looks like a new feature to me. > > On Tue, Apr 16, 2019 at 4:13 PM Michael Armbrust > wrote: > >> Hello All, >>

Spark 2.4.2

2019-04-16 Thread Michael Armbrust
Hello All, I know we just released Spark 2.4.1, but in light of fixing SPARK-27453 I was wondering if it might make sense to follow up quickly with 2.4.2. Without this fix its very hard to build a datasource that correctly handles partitioning w

Re: Plan on Structured Streaming in next major/minor release?

2018-10-30 Thread Michael Armbrust
> > Agree. Just curious, could you explain what do you mean by "negation"? > Does it mean applying retraction on aggregated? > Yeah exactly. Our current streaming aggregation assumes that the input is in append-mode and multiple aggregations break this.

Re: Plan on Structured Streaming in next major/minor release?

2018-10-30 Thread Michael Armbrust
Thanks for bringing up some possible future directions for streaming. Here are some thoughts: - I personally view all of the activity on Spark SQL also as activity on Structured Streaming. The great thing about building streaming on catalyst / tungsten is that continued improvement to these compon

Re: Sorting on a streaming dataframe

2018-04-30 Thread Michael Armbrust
performance as compared to implementing this > functionality inside the applications. > > Hemant > > On Thu, Apr 26, 2018 at 11:59 PM, Michael Armbrust > wrote: > >> The basic tenet of structured streaming is that a query should return the >> same answer in streami

Re: Sorting on a streaming dataframe

2018-04-26 Thread Michael Armbrust
The basic tenet of structured streaming is that a query should return the same answer in streaming or batch mode. We support sorting in complete mode because we have all the data and can sort it correctly and return the full answer. In update or append mode, sorting would only return a correct ans

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-26 Thread Michael Armbrust
+1 all our pipelines have been running the RC for several days now. On Mon, Feb 26, 2018 at 10:33 AM, Dongjoon Hyun wrote: > +1 (non-binding). > > Bests, > Dongjoon. > > > > On Mon, Feb 26, 2018 at 9:14 AM, Ryan Blue > wrote: > >> +1 (non-binding) >> >> On Sat, Feb 24, 2018 at 4:17 PM, Xiao Li

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-21 Thread Michael Armbrust
I'm -1 on any changes that aren't fixing major regressions from 2.2 at this point. Also in any cases where its possible we should be flipping new features off if they are still regressing, rather than continuing to attempt to fix them. Since its experimental, I would support backporting the DataSo

Re: DataSourceV2: support for named tables

2018-02-02 Thread Michael Armbrust
I am definitely in favor of first-class / consistent support for tables and data sources. One thing that is not clear to me from this proposal is exactly what the interfaces are between: - Spark - A (The?) metastore - A data source If we pass in the table identifier is the data source then res

Re: SQL logical plans and DataSourceV2 (was: data source v2 online meetup)

2018-02-02 Thread Michael Armbrust
> > So here are my recommendations for moving forward, with DataSourceV2 as a > starting point: > >1. Use well-defined logical plan nodes for all high-level operations: >insert, create, CTAS, overwrite table, etc. >2. Use rules that match on these high-level plan nodes, so that it >

Re: Max number of streams supported ?

2018-01-31 Thread Michael Armbrust
-dev +user > Similarly for structured streaming, Would there be any limit on number of > of streaming sources I can have ? > There is no fundamental limit, but each stream will have a thread on the driver that is doing coordination of execution. We comfortably run 20+ streams on a single cluste

Re: Spark error while trying to spark.read.json()

2017-12-19 Thread Michael Armbrust
- dev java.lang.AbstractMethodError almost always means that you have different libraries on the classpath than at compilation time. In this case I would check to make sure you have the correct version of Scala (and only have one version of scala) on the classpath. On Tue, Dec 19, 2017 at 5:42 P

Re: Timeline for Spark 2.3

2017-12-19 Thread Michael Armbrust
Do people really need to be around for the branch cut (modulo the person cutting the branch)? 1st or 2nd doesn't really matter to me, but I am +1 kicking this off as soon as we enter the new year :) Michael On Tue, Dec 19, 2017 at 4:39 PM, Holden Karau wrote: > Sounds reasonable, although I'd

Re: queryable state & streaming

2017-12-08 Thread Michael Armbrust
https://issues.apache.org/jira/browse/SPARK-16738 I don't believe anyone is working on it yet. I think the most useful thing is to start enumerating requirements and use cases and then we can talk about how to build it. On Fri, Dec 8, 2017 at 10:47 AM, Stavros Kontopoulos < st.kontopou...@gmail.

Timeline for Spark 2.3

2017-11-09 Thread Michael Armbrust
According to the timeline posted on the website, we are nearing branch cut for Spark 2.3. I'd like to propose pushing this out towards mid to late December for a couple of reasons and would like to hear what people think. 1. I've done release management during the Thanksgiving / Christmas time be

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-06 Thread Michael Armbrust
+1 On Sat, Nov 4, 2017 at 11:02 AM, Xiao Li wrote: > +1 > > 2017-11-04 11:00 GMT-07:00 Burak Yavuz : > >> +1 >> >> On Fri, Nov 3, 2017 at 10:02 PM, vaquar khan >> wrote: >> >>> +1 >>> >>> On Fri, Nov 3, 2017 at 8:14 PM, Weichen Xu >>> wrote: >>> +1. On Sat, Nov 4, 2017 at 8:04 A

Re: Structured Stream equivalent of reduceByKey

2017-10-26 Thread Michael Armbrust
- dev I think you should be able to write an Aggregator . You probably want to run in update mode if you are looking for it to output any group that has changed in the batch. On Wed, Oct 25, 201

Re: Easy way to get offset metatada with Spark Streaming API

2017-09-14 Thread Michael Armbrust
h the > data) and initialize the custom sink with right batch id when application > re-starts. After this just ignore batch if current batchId <= > latestBatchId. > > Dmitry > > > 2017-09-13 22:12 GMT+03:00 Michael Armbrust : > >> I think the right way to look at th

Re: Easy way to get offset metatada with Spark Streaming API

2017-09-13 Thread Michael Armbrust
ogic (which is > basically, just ignore intermediate data, re-read from Kafka and re-try > processing and load)? > > Dmitry > > > 2017-09-12 22:43 GMT+03:00 Michael Armbrust : > >> In the checkpoint directory there is a file /offsets/$batchId that holds >> the off

Re: Easy way to get offset metatada with Spark Streaming API

2017-09-12 Thread Michael Armbrust
there some kind of > offset manager API that works as get-offset by batch id lookup table? > > Dmitry > > 2017-09-12 20:29 GMT+03:00 Michael Armbrust : > >> I think that we are going to have to change the Sink API as part of >> SPARK-20928 <https://issues-test.apache.

Re: Easy way to get offset metatada with Spark Streaming API

2017-09-12 Thread Michael Armbrust
I think that we are going to have to change the Sink API as part of SPARK-20928 , which is why I linked these tickets together. I'm still targeting an initial version for Spark 2.3 which should happen sometime towards the end of the year. Th

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2 read path

2017-09-07 Thread Michael Armbrust
+1 On Thu, Sep 7, 2017 at 9:32 AM, Ryan Blue wrote: > +1 (non-binding) > > Thanks for making the updates reflected in the current PR. It would be > great to see the doc updated before it is finally published though. > > Right now it feels like this SPIP is focused more on getting the basics > ri

Re: Increase Timeout or optimize Spark UT?

2017-08-23 Thread Michael Armbrust
I think we already set the number of partitions to 5 in tests ? On Tue, Aug 22, 2017 at 3:25 PM, Maciej Szymkiewicz wrote: > Hi, > > From

Re: [SS] watermark, eventTime and "StreamExecution: Streaming query made progress"

2017-08-11 Thread Michael Armbrust
The point here is to tell you what watermark value was used when executing this batch. You don't know the new watermark until the batch is over and we don't want to do two passes over the data. In general the semantics of the watermark are designed to be conservative (i.e. just because data is ol

Re: Thoughts on release cadence?

2017-07-31 Thread Michael Armbrust
+1, should we update https://spark.apache.org/versioning-policy.html ? On Sun, Jul 30, 2017 at 3:34 PM, Reynold Xin wrote: > This is reasonable ... +1 > > > On Sun, Jul 30, 2017 at 2:19 AM, Sean Owen wrote: > >> The project had traditionally posted some guidance about upcoming >> releases. The

[ANNOUNCE] Announcing Apache Spark 2.2.0

2017-07-11 Thread Michael Armbrust
Hi all, Apache Spark 2.2.0 is the third release of the Spark 2.x line. This release removes the experimental tag from Structured Streaming. In addition, this release focuses on usability, stability, and polish, resolving over 1100 tickets. We'd like to thank our contributors and users for their c

Re: [VOTE] Apache Spark 2.2.0 (RC6)

2017-07-07 Thread Michael Armbrust
This vote passes! I'll followup with the release on Monday. +1: Michael Armbrust (binding) Kazuaki Ishizaki Sean Owen (binding) Joseph Bradley (binding) Ricardo Almeida Herman van Hövell tot Westerflier (binding) Yanbo Liang Nick Pentreath (binding) Wenchen Fan (binding) Sameer Agarwal Denn

Re: [VOTE] Apache Spark 2.2.0 (RC6)

2017-06-30 Thread Michael Armbrust
I'll kick off the vote with a +1. On Fri, Jun 30, 2017 at 6:44 PM, Michael Armbrust wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.2.0. The vote is open until Friday, July 7th, 2017 at 18:00 PST and > passes if a majority of at least 3 +1

[VOTE] Apache Spark 2.2.0 (RC6)

2017-06-30 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Friday, July 7th, 2017 at 18:00 PST and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.2.0 [ ] -1 Do not release this package because ...

Re: [VOTE] Apache Spark 2.2.0 (RC5)

2017-06-26 Thread Michael Armbrust
sion from 2.1 > > On Wed, Jun 21, 2017 at 1:43 PM, Nick Pentreath > wrote: > >> As before, release looks good, all Scala, Python tests pass. R tests fail >> with same issue in SPARK-21093 but it's not a blocker. >> >> +1 (binding) >> >> >>

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-21 Thread Michael Armbrust
nks for discussing them. I still feel they are very helpful; I >>>>>>>> particularly notice not having to spend a solid 2-3 weeks of time QAing >>>>>>>> (unlike in earlier Spark releases). One other point not mentioned &

Re: [VOTE] Apache Spark 2.2.0 (RC5)

2017-06-20 Thread Michael Armbrust
I will kick off the voting with a +1. On Tue, Jun 20, 2017 at 4:49 PM, Michael Armbrust wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.2.0. The vote is open until Friday, June 23rd, 2017 at 18:00 PST and > passes if a majority of at least 3 +1

[VOTE] Apache Spark 2.2.0 (RC5)

2017-06-20 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Friday, June 23rd, 2017 at 18:00 PST and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.2.0 [ ] -1 Do not release this package because ...

Re: cannot call explain or show on dataframe in structured streaming addBatch dataframe

2017-06-19 Thread Michael Armbrust
There is a little bit of weirdness to how we override the default query planner to replace it with an incrementalizing planner. As such, calling any operation that changes the query plan (such as a LIMIT) would cause it to revert to the batch planner and return the wrong answer. We should fix thi

Re: the scheme in stream reader

2017-06-19 Thread Michael Armbrust
The socket source can't know how to parse your data. I think the right thing would be for it to throw an exception saying that you can't set the schema here. Would you mind opening a JIRA ticket? If you are trying to parse data from something like JSON then you should use from_json` on the value

Re: Nested "struct" fonction call creates a compilation error in Spark SQL

2017-06-15 Thread Michael Armbrust
I can, what do > you think ? > > Regards, > > Olivier. > > > 2017-06-15 21:08 GMT+02:00 Michael Armbrust : > >> Which version of Spark? If its recent I'd open a JIRA. >> >> On Thu, Jun 15, 2017 at 6:04 AM, Olivier Girardot < >> o.girar...@late

Re: Nested "struct" fonction call creates a compilation error in Spark SQL

2017-06-15 Thread Michael Armbrust
Which version of Spark? If its recent I'd open a JIRA. On Thu, Jun 15, 2017 at 6:04 AM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Hi everyone, > when we create recursive calls to "struct" (up to 5 levels) for extending > a complex datastructure we end up with the following com

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Michael Armbrust
>>>>>>> above: I >>>>>>> think they serve as a very helpful reminder/training for the community >>>>>>> for >>>>>>> rigor in development. Since we instituted QA JIRAs, contributors have >>>>>>>

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Michael Armbrust
QA" JIRAs just represent that 'we will test things, in general', > then I think they're superfluous at best. These aren't used consistently, > and their intent isn't actionable (i.e. it sounds like no particular > testing resolves the JIRA). They signal s

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Michael Armbrust
eeds to > block the release; Joseph what's the status on those? > > On Mon, Jun 5, 2017 at 8:15 PM Michael Armbrust > wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00 PST a

[VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.2.0 [ ] -1 Do not release this package because ...

Re: [VOTE] Apache Spark 2.2.0 (RC3)

2017-06-02 Thread Michael Armbrust
should NEVER backport > non-bug-fix commits to an RC branch. Sorry again for the trouble! > > On Fri, Jun 2, 2017 at 2:40 PM, Michael Armbrust > wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 2.2.0. The vote is open until Tues, Ju

[VOTE] Apache Spark 2.2.0 (RC3)

2017-06-02 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, June 6th, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.2.0 [ ] -1 Do not release this package because ...

Re: [VOTE] Apache Spark 2.2.0 (RC2)

2017-06-02 Thread Michael Armbrust
ead "SQL >> TIMESTAMP semantics vs. SPARK-18350" which might impact Spark 2.2. Should >> we make a decision there before voting on the next RC for Spark 2.2? >> >> Thanks, >> Kostas >> >> On Tue, May 30, 2017 at 12:09 PM, Michael Armbrust < >>

Re: [VOTE] Apache Spark 2.2.0 (RC2)

2017-05-30 Thread Michael Armbrust
arted cutting the new RC, I'm working on a documentation > PR right now I'm hoping we can get into Spark 2.2 as a migration note, even > if it's just a mention: https://issues.apache.org/jira/browse/SPARK-20888. > > Michael > > > On May 22, 2017, at 11:39 AM, Mi

Re: Running into the same problem as JIRA SPARK-19268

2017-05-24 Thread Michael Armbrust
-dev Have you tried clearing out the checkpoint directory? Can you also give the full stack trace? On Wed, May 24, 2017 at 3:45 PM, kant kodali wrote: > Even if I do simple count aggregation like below I get the same error as > https://issues.apache.org/jira/browse/SPARK-19268 > > Dataset df2

Re: [VOTE] Apache Spark 2.2.0 (RC2)

2017-05-22 Thread Michael Armbrust
at 2.2, but they are >>> essentially all for documentation. >>> >>> Joseph >>> >>> On Thu, May 11, 2017 at 3:08 PM, Marcelo Vanzin >>> wrote: >>> >>>> Since you'll be creating a new RC, I'd wait until SPARK-2066

Re: [VOTE] Apache Spark 2.2.0 (RC2)

2017-05-11 Thread Michael Armbrust
- > > [INFO] Total time: 16:51 min > [INFO] Finished at: 2017-05-09T17:51:04+09:00 > [INFO] Final Memory: 53M/514M > [INFO] > > [WARNING] The requested profile "hive"

[VOTE] Apache Spark 2.2.0 (RC2)

2017-05-04 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 9th, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.2.0 [ ] -1 Do not release this package because ... T

Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-05-03 Thread Michael Armbrust
0:02 AM, Ryan Blue >>>>>> > wrote: >>>>>>> >>>>>>> I agree with Sean. Spark only pulls in parquet-avro for tests. For >>>>>>> execution, it implements the record materialization APIs in Parquet to >>>

Re: [ANNOUNCE] Apache Spark 2.1.1

2017-05-03 Thread Michael Armbrust
t; Specifically, the home page has a link (under Documentation menu) labeled > Latest Release (Spark 2.1.1), but when I click it, I get the 2.1.0 > documentation. > > Ofir Manor > > Co-Founder & CTO | Equalum > > Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io

[ANNOUNCE] Apache Spark 2.1.1

2017-05-02 Thread Michael Armbrust
We are happy to announce the availability of Spark 2.1.1! Apache Spark 2.1.1 is a maintenance release, based on the branch-2.1 maintenance branch of Spark. We strongly recommend all 2.1.x users to upgrade to this stable release. To download Apache Spark 2.1.1 visit http://spark.apache.org/downloa

Re: Spark 2.2.0 or Spark 2.3.0?

2017-05-02 Thread Michael Armbrust
An RC for 2.2.0 was released last week. Please test. Note that update mode has been supported since 2.0. On Mon, May 1, 2017 at 10:43 PM, kant kodali wrote: > Hi All, > > If I understand the Spark standard release process c

Re: [VOTE] Apache Spark 2.1.1 (RC4)

2017-05-01 Thread Michael Armbrust
This vote passes. Thanks to everyone for testing! I'll begin packaging the release. +1 Sean Owen (binding) Michael Armbrust (binding) Reynold Xin (binding) Tom Graves (binding) Dong Joon Hyun Holden Karau Vaquar Khan Kazuaki Ishizaki Denny Lee Felix Cheung -1 None On Fri, Apr 28, 2017

Re: [KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?

2017-05-01 Thread Michael Armbrust
He's just suggesting that since the DataStreamWriter start() method can fill in an option named "path", we should make that a synonym for "topic". Then you could do something like. df.writeStream.format("kafka").start("topic") Seems reasonable if people don't think that is confusing. On Mon, May

Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-04-27 Thread Michael Armbrust
ut an RC after those things are ready? > > On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust > wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and >> passes if a major

Re: [VOTE] Apache Spark 2.1.1 (RC4)

2017-04-27 Thread Michael Armbrust
I'll also +1 On Thu, Apr 27, 2017 at 4:20 AM, Sean Owen wrote: > +1 , same result as with the last RC. All checks out for me. > > On Thu, Apr 27, 2017 at 1:29 AM Michael Armbrust > wrote: > >> Please vote on releasing the following candidate as Apache Spark version

[VOTE] Apache Spark 2.2.0 (RC1)

2017-04-27 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.2.0 [ ] -1 Do not release this package because ... T

[VOTE] Apache Spark 2.1.1 (RC4)

2017-04-26 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 2.1.1. The vote is open until Sat, April 29th, 2018 at 18:00 PST and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.1.1 [ ] -1 Do not release this package because ...

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-24 Thread Michael Armbrust
;> >>>>> On Fri, Apr 21, 2017 at 7:45 PM, Wenchen Fan >>>>> wrote: >>>>> >>>>>> IIRC, the new "spark.sql.hive.caseSensitiveInferenceMode" stuff will >>>>>> only scan all table files only once, and write bac

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-21 Thread Michael Armbrust
f this problem, and specifically whether this is a problem in > the Spark codebase or not. I will report back when I have an answer to that > question. > > Michael > > > On Apr 18, 2017, at 11:59 AM, Michael Armbrust > wrote: > > Please vote on releasing the following

Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-04-18 Thread Michael Armbrust
In case it wasn't obvious by the appearance of RC3, this vote failed. On Thu, Mar 30, 2017 at 4:09 PM, Michael Armbrust wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.1.0. The vote is open until Sun, April 2nd, 2018 at 16:30 PST and > passes

[VOTE] Apache Spark 2.1.1 (RC3)

2017-04-18 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.1.1 [ ] -1 Do not release this package because ...

branch-2.2 has been cut

2017-04-18 Thread Michael Armbrust
I just cut the release branch for Spark 2.2. If you are merging important bug fixes, please backport as appropriate. If you have doubts if something should be backported, please ping me. I'll follow with an RC later this week.

Re: 2.2 branch

2017-04-17 Thread Michael Armbrust
I'm going to cut branch-2.2 tomorrow morning. On Thu, Apr 13, 2017 at 11:02 AM, Michael Armbrust wrote: > Yeah, I was delaying until 2.1.1 was out and some of the hive questions > were resolved. I'll make progress on that by the end of the week. Lets > aim for 2.2 branch cu

Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-04-14 Thread Michael Armbrust
hine is being >> used >> >>>> for packaging I can see if I can install pandoc on it (should be >> simple but >> >>>> I know the Jenkins cluster is a bit on the older side). >> >>>> >> >>>> On Tue, Apr 4, 2017 at 3:06 PM

Re: 2.2 branch

2017-04-13 Thread Michael Armbrust
Yeah, I was delaying until 2.1.1 was out and some of the hive questions were resolved. I'll make progress on that by the end of the week. Lets aim for 2.2 branch cut next week. On Thu, Apr 13, 2017 at 8:56 AM, Koert Kuipers wrote: > i see there is no 2.2 branch yet for spark. has this been pus

Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-04-04 Thread Michael Armbrust
1:16 PM, Felix Cheung wrote: > -1 > sorry, found an issue with SparkR CRAN check. > Opened SPARK-20197 and working on fix. > > -- > *From:* holden.ka...@gmail.com on behalf of > Holden Karau > *Sent:* Friday, March 31, 2017 6:25:20 PM > *

[VOTE] Apache Spark 2.1.1 (RC2)

2017-03-30 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 2.1.0. The vote is open until Sun, April 2nd, 2018 at 16:30 PST and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.1.1 [ ] -1 Do not release this package because ...

Re: Outstanding Spark 2.1.1 issues

2017-03-28 Thread Michael Armbrust
Asher Krim > Senior Software Engineer > > On Wed, Mar 22, 2017 at 7:44 PM, Michael Armbrust > wrote: > >> An update: I cut the tag for RC1 last night. Currently fighting with the >> release process. Will post RC1 once I get it working. >> >> On Tue, Mar 21, 2

Re: Outstanding Spark 2.1.1 issues

2017-03-22 Thread Michael Armbrust
good to start the RC >> process. >> >> On Tue, Mar 21, 2017 at 1:41 PM, Michael Armbrust > > wrote: >> >> Please speak up if I'm wrong, but none of these seem like critical >> regressions from 2.1. As such I'll start the RC process later today. >

Re: Outstanding Spark 2.1.1 issues

2017-03-21 Thread Michael Armbrust
Please speak up if I'm wrong, but none of these seem like critical regressions from 2.1. As such I'll start the RC process later today. On Mon, Mar 20, 2017 at 9:52 PM, Holden Karau wrote: > I'm not super sure it should be a blocker for 2.1.1 -- is it a regression? > Maybe we can get TDs input

Spark 2.2 Code-freeze - 3/20

2017-03-15 Thread Michael Armbrust
Hey Everyone, Just a quick announcement that I'm planning to cut the branch for Spark 2.2 this coming Monday (3/20). Please try and get things merged before then and also please begin retargeting of any issues that you don't think will make the release. Michael

Re: Should we consider a Spark 2.1.1 release?

2017-03-15 Thread Michael Armbrust
Hey Holden, Thanks for bringing this up! I think we usually cut patch releases when there are enough fixes to justify it. Sometimes just a few weeks after the release. I guess if we are at 3 months Spark 2.1.0 was a pretty good release :) That said, it is probably time. I was about to start th

Re: Structured Streaming Spark Summit Demo - Databricks people

2017-02-16 Thread Michael Armbrust
Thanks for your interest in Apache Spark Structured Streaming! There is nothing secret in that demo, though I did make some configuration changes in order to get the timing right (gotta have some dramatic effect :) ). Also I think the visualizations based on metrics output by the StreamingQueryLi

Re: benefits of code gen

2017-02-10 Thread Michael Armbrust
Function1 is specialized, but nullSafeEval is Any => Any, so that's still going to box in the non-codegened execution path. On Fri, Feb 10, 2017 at 1:32 PM, Koert Kuipers wrote: > based on that i take it that math functions would be primary beneficiaries > since they work on primitives. > > so i

Re: Structured Streaming. Dropping Duplicates

2017-02-07 Thread Michael Armbrust
nk",tableName2) > .option("checkpointLocation","checkpoint") > .start() > > > On Tue, Feb 7, 2017 at 7:24 PM, Michael Armbrust > wrote: > >> Read the JSON log of files that is in `/your/path/_spark_metadata` and >> only read files that

Re: Structured Streaming. Dropping Duplicates

2017-02-07 Thread Michael Armbrust
e case then how would I go about ensuring no duplicates? > > > Thanks again for the awesome support! > > Regards > Sam > On Tue, 7 Feb 2017 at 18:05, Michael Armbrust > wrote: > >> Sorry, I think I was a little unclear. There are two things at play here. >> &

Re: Structured Streaming. Dropping Duplicates

2017-02-07 Thread Michael Armbrust
cause I can see in the log > that its now polling for new changes, the latest offset is the right one > > After I kill it and relaunch it picks up that same file? > > > Sorry if I misunderstood you > > On Tue, Feb 7, 2017 at 5:20 PM, Michael Armbrust > wrote: > >

Re: Structured Streaming. Dropping Duplicates

2017-02-07 Thread Michael Armbrust
rk job using Ctrl+C. > When I rerun the stream it picks up "update 2" again > > Is this normal? isnt ctrl+c a failure? > > I would expect checkpointing to know that update 2 was already processed > > Regards > Sam > > On Tue, Feb 7, 2017 at 4:58 PM, Sam

Re: Structured Streaming. Dropping Duplicates

2017-02-07 Thread Michael Armbrust
Here a JIRA: https://issues.apache.org/jira/browse/SPARK-19497 We should add this soon. On Tue, Feb 7, 2017 at 8:35 AM, Sam Elamin wrote: > Hi All > > When trying to read a stream off S3 and I try and drop duplicates I get > the following error: > > Exception in thread "main" org.apache.spark.s

Re: specifing schema on dataframe

2017-02-05 Thread Michael Armbrust
-dev You can use withColumn to change the type after the data has been loaded . On Sat, Feb 4, 2017 at 6:22 AM, Sam Elamin wrote: > Hi

Re: [SQL][SPARK-14160] Maximum interval for o.a.s.sql.functions.window

2017-01-18 Thread Michael Armbrust
+1, we should just fix the error to explain why months aren't allowed and suggest that you manually specify some number of days. On Wed, Jan 18, 2017 at 9:52 AM, Maciej Szymkiewicz wrote: > Thanks for the response Burak, > > As any sane person I try to steer away from the objects which have both

Re: StateStoreSaveExec / StateStoreRestoreExec

2017-01-03 Thread Michael Armbrust
You might also be interested in this: https://issues.apache.org/jira/browse/SPARK-19031 On Tue, Jan 3, 2017 at 3:36 PM, Michael Armbrust wrote: > I think we should add something similar to mapWithState in 2.2. It would > be great if you could add the description of your problem to this

Re: StateStoreSaveExec / StateStoreRestoreExec

2017-01-03 Thread Michael Armbrust
I think we should add something similar to mapWithState in 2.2. It would be great if you could add the description of your problem to this ticket: https://issues.apache.org/jira/browse/SPARK-19067 On Mon, Jan 2, 2017 at 2:05 PM, Jeremy Smith wrote: > I have a question about state tracking in St

Re: What is mainly different from a UDT and a spark internal type that ExpressionEncoder recognized?

2016-12-27 Thread Michael Armbrust
An encoder uses reflection to generate expressions that can extract data out of an object (by calling methods on the object) and encode its contents directly into the tungst

Re: Expand the Spark SQL programming guide?

2016-12-15 Thread Michael Armbrust
Pull requests would be welcome for any major missing features in the guide: https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md On Thu, Dec 15, 2016 at 11:48 AM, Jim Hughes wrote: > Hi Anton, > > I'd like to see this as well. I've been working on implementing > geospatial

Re: ability to provide custom serializers

2016-12-05 Thread Michael Armbrust
Lets start with a new ticket, link them and we can merge if the solution ends up working out for both cases. On Sun, Dec 4, 2016 at 5:39 PM, Erik LaBianca wrote: > Thanks Michael! > > On Dec 2, 2016, at 7:29 PM, Michael Armbrust > wrote: > > I would love to see somethi

Re: ability to provide custom serializers

2016-12-02 Thread Michael Armbrust
I would love to see something like this. The closest related ticket is probably https://issues.apache.org/jira/browse/SPARK-7768 (though maybe there are enough people using UDTs in their current form that we should just make a new ticket) A few thoughts: - even if you can do implicit search, we

  1   2   3   4   >