Re: [VOTE] SPARK 2.3.2 (RC5)

2018-08-14 Thread Sean Owen
(We wouldn't consider lack of an improvement to block a maintenance release. It's reasonable to raise this elsewhere as a big nice to have on 2.3.x in general) On Tue, Aug 14, 2018, 4:13 PM antonkulaga wrote: > -1 as https://issues.apache.org/jira/browse/SPARK-16406 does not seem to > be >

Re: sql compile failing with Zinc?

2018-08-14 Thread Sean Owen
If you're running zinc directly, you can give it more memory with -J-Xmx2g or whatever. If you're running ./build/mvn and letting it run zinc we might need to increase the memory that it requests in the script. On Tue, Aug 14, 2018 at 2:56 PM Steve Loughran wrote: > Is anyone else getting the

Re: [DISCUSS] Handling correctness/data loss jiras

2018-08-13 Thread Sean Owen
Generally: if someone thinks correctness fix X should be backported further, I'd say just do it, if it's to an active release branch (see below). Anything that important has to outweigh most any other concern, like behavior changes. On Mon, Aug 13, 2018 at 11:08 AM Tom Graves wrote: > I'm not

CVE-2018-11770: Apache Spark standalone master, Mesos REST APIs not controlled by authentication

2018-08-13 Thread Sean Owen
Severity: Medium Vendor: The Apache Software Foundation Versions Affected: Spark versions from 1.3.0, running standalone master with REST API enabled, or running Mesos master with cluster mode enabled Description: >From version 1.3.0 onward, Spark's standalone master exposes a REST API for job

Re: [DISCUSS] Handling correctness/data loss jiras

2018-08-13 Thread Sean Owen
I doubt the question is whether people want to take such issues seriously -- all else equal, of course everyone does. A JIRA label plus place in the release notes sounds like a good concrete step that isn't happening consistently now. That's a clear flag that at least one person believes issue X

Re: [R] discuss: removing lint-r checks for old branches

2018-08-10 Thread Sean Owen
Seems OK to proceed with shutting off lintr, as it was masking those. On Fri, Aug 10, 2018 at 6:01 PM shane knapp wrote: > ugh... R unit tests failed on both of these builds. > > https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94583/artifact/R/target/ > >

Re: SparkContext singleton get w/o create?

2018-08-07 Thread Sean Owen
Ah, python. How about SparkContext._active_spark_context then? On Tue, Aug 7, 2018 at 5:34 PM Andrew Melo wrote: > Hi Sean, > > On Tue, Aug 7, 2018 at 5:16 PM, Sean Owen wrote: > > Is SparkSession.getActiveSession what you're looking for? > > Perhaps -- though there'

Re: SparkContext singleton get w/o create?

2018-08-07 Thread Sean Owen
Is SparkSession.getActiveSession what you're looking for? On Tue, Aug 7, 2018 at 5:11 PM Andrew Melo wrote: > Hello, > > One pain point with various Jupyter extensions [1][2] that provide > visual feedback about running spark processes is the lack of a public > API to introspect the web URL.

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-06 Thread Sean Owen
... and we still have a few snags with Scala 2.12 support at https://issues.apache.org/jira/browse/SPARK-25029 There is some hope of resolving it on the order of a week, so for the moment, seems worth holding 2.4 for. On Mon, Aug 6, 2018 at 2:37 PM Bryan Cutler wrote: > Hi All, > > I'd like to

Re: Set up Scala 2.12 test build in Jenkins

2018-08-05 Thread Sean Owen
easily if you set class `MyData(val i: Int) > extends Serializable `outside of the test suite. For some reason outers > (not removed) are capturing > the Scalatest stuff in 2.12. > > Let me know if we see the same failures. > > Stavros > > On Sun, Aug 5, 2018 at 5:10 PM, S

Set up Scala 2.12 test build in Jenkins

2018-08-05 Thread Sean Owen
Shane et al - could we get a test job in Jenkins to test the Scala 2.12 build? I don't think I have the access or expertise for it, though I could probably copy and paste a job. I think we just need to clone the, say, master Maven Hadoop 2.7 job, and add two steps: run

Re: Am I crazy, or does the binary distro not have Kafka integration?

2018-08-05 Thread Sean Owen
bmit and spark-shell. I feel > that in an IDE, it won’t be a huge problem because you just add it once, > but it is annoying for spark-submit. > > Matei > > > On Aug 4, 2018, at 2:19 PM, Sean Owen wrote: > > > > Hm OK I am crazy then. I think I never noticed it b

Re: Am I crazy, or does the binary distro not have Kafka integration?

2018-08-04 Thread Sean Owen
gt; > https://about.me/JacekLaskowski > Mastering Spark SQL https://bit.ly/mastering-spark-sql > Spark Structured Streaming https://bit.ly/spark-structured-streaming > Mastering Kafka Streams https://bit.ly/mastering-kafka-streams > Follow me at https://twitter.com/jaceklaskowski > &g

Re: Am I crazy, or does the binary distro not have Kafka integration?

2018-08-04 Thread Sean Owen
there in the release but ... *nobody* noticed all this time? I guess maybe Spark-Kafka users may be using a vendor distro that does package these bits. On Sat, Aug 4, 2018 at 10:48 AM Sean Owen wrote: > I was debugging why a Kafka-based streaming app doesn't seem to find > Kafka-related integ

Am I crazy, or does the binary distro not have Kafka integration?

2018-08-04 Thread Sean Owen
I was debugging why a Kafka-based streaming app doesn't seem to find Kafka-related integration classes when run standalone from our latest 2.3.1 release, and noticed that there doesn't seem to be any Kafka-related jars from Spark in the distro. In jars/, I see: spark-catalyst_2.11-2.3.1.jar

Re: Review notification bot

2018-07-31 Thread Sean Owen
I haven't been pinged by this bot :( :) But I do like this comments on PRs: like https://github.com/apache/spark/pull/21925#issuecomment-409035244 Is the issue that @-mentions cause emails too? Is there any option to maybe only consider pinging someone if they've touched the code within the

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-30 Thread Sean Owen
In theory releases happen on a time-based cadence, so it's pretty much wrap up what's ready by the code freeze and ship it. In practice, the cadence slips frequently, and it's very much a negotiation about what features should push the code freeze out a few weeks every time. So, kind of a hybrid

Re: [SPARK-24950] issues running DateTimeUtilsSuite daysToMillis and millisToDays w/java 8 181-b13

2018-07-27 Thread Sean Owen
I posted on the JIRA -- looks like timezone definitions for Kiribati were fixed in the 2018d timezone data release, and that's the difference between the JDK release that works and does not. On Fri, Jul 27, 2018 at 1:23 PM shane knapp wrote: > hey everyone! > > i'm making great progress on

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

2018-07-19 Thread Sean Owen
-spark that > we are distributing from the archive? > > On Thu, Jul 19, 2018 at 11:15 AM Sean Owen wrote: > >> Ideally, that list is updated with each release, yes. Non-current >> releases will now always download from archive.apache.org though. But we >> run into rate-l

Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-19 Thread Sean Owen
What regression are you referring to here? A -1 vote really needs a rationale. On Thu, Jul 19, 2018 at 1:27 PM Xiao Li wrote: > I would first vote -1. > > I might find another regression caused by the analysis barrier. Will keep > you posted. > >

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

2018-07-19 Thread Sean Owen
thing from Apache archive every test run. > > > -- > *From:* Marco Gaido > *Sent:* Monday, July 16, 2018 11:12 PM > *To:* Hyukjin Kwon > *Cc:* Sean Owen; dev > *Subject:* Re: Cleaning Spark releases from mirrors, and the flakiness of > HiveExter

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

2018-07-16 Thread Sean Owen
be testing against 2.3.1, 2.2.2, 2.1.3, right? 2.0.x is now effectively EOL right? I can make that quick change too if everyone's amenable, in order to prevent more failures in this test from master. On Sun, Jul 15, 2018 at 3:51 PM Sean Owen wrote: > Yesterday I cleaned out old Spark releases f

Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-16 Thread Sean Owen
2.3.2, I didn't meet the errors > you pasted here. I'm not sure how it happens. > > Sean Owen 于2018年7月16日周一 上午6:30写道: > >> Looks good to me, with the following caveats. >> >> First see the discussion on >> https://issues.apache.org/jira/browse/SPARK-24813 ;

Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-15 Thread Sean Owen
Looks good to me, with the following caveats. First see the discussion on https://issues.apache.org/jira/browse/SPARK-24813 ; the flaky HiveExternalCatalogVersionsSuite will probably fail all the time right now. That's not a regression and is a test-only issue, so don't think it must block the

Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

2018-07-15 Thread Sean Owen
Yesterday I cleaned out old Spark releases from the mirror system -- we're supposed to only keep the latest release from active branches out on mirrors. (All releases are available from the Apache archive site.) Having done so I realized quickly that the HiveExternalCatalogVersionsSuite relies on

Re: [VOTE] SPARK 2.3.2 (RC1)

2018-07-11 Thread Sean Owen
would you please help to > clarify? > > Thanks > Saisai > > > Xiao Li 于2018年7月9日周一 上午1:59写道: > >> Three business days might be too short. Let us open the vote until the >> end of this Friday (July 13th)? >> >> Cheers, >> >> Xiao >> >>

CVE-2018-1334 Apache Spark local privilege escalation vulnerability

2018-07-11 Thread Sean Owen
Severity: High Vendor: The Apache Software Foundation Versions affected: Spark versions through 2.1.2 Spark 2.2.0 to 2.2.1 Spark 2.3.0 Description: In Apache Spark up to and including 2.1.2, 2.2.0 to 2.2.1, and 2.3.0, when using PySpark or SparkR, it's possible for a different local user to

CVE-2018-8024 Apache Spark XSS vulnerability in UI

2018-07-11 Thread Sean Owen
Severity: Medium Vendor: The Apache Software Foundation Versions Affected: Spark versions through 2.1.2 Spark 2.2.0 through 2.2.1 Spark 2.3.0 Description: In Apache Spark up to and including 2.1.2, 2.2.0 to 2.2.1, and 2.3.0, it's possible for a malicious user to construct a URL pointing to a

Re: [VOTE] SPARK 2.3.2 (RC1)

2018-07-08 Thread Sean Owen
Just checking that the doc issue in https://issues.apache.org/jira/browse/SPARK-24530 is worked around in this release? This was pointed out as an example of a broken doc: https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression Here it is in

Re: [SPARK ML] Minhash integer overflow

2018-07-07 Thread Sean Owen
I think it probably still does its.job; the hash value can just be negative. It is likely to be very slightly biased though. Because the intent doesn't seem to be to allow the overflow it's worth changing to use longs for the calculation. On Fri, Jul 6, 2018, 8:36 PM jiayuanm wrote: > Hi

Re: [SparkML] Random access in SparseVector will slow down inference stage for some tree based models

2018-07-04 Thread Sean Owen
t; > lazy val breezeVector = > breeze.linalg.HashVector.apply(size)(indices.zip(values): _*) > > . > > private[spark] override def asBreeze: BV[Double] = breezeVector > > > > } > > > > I'm not sure this is the best approach so I t

Fwd: Beam's recent community development work

2018-07-02 Thread Sean Owen
Worth, I think, a read and consideration from Spark folks. I'd be interested in comments; I have a few reactions too. -- Forwarded message - From: Kenneth Knowles Date: Sat, Jun 30, 2018 at 1:15 AM Subject: Beam's recent community development work To: , , Griselda Cuevas <

Re: [SparkML] Random access in SparseVector will slow down inference stage for some tree based models

2018-07-01 Thread Sean Owen
This could be a good optimization. But can it be done without changing any APIs or slowing anything else down? if so this could be worth a pull request. On Sun, Jul 1, 2018 at 9:21 PM Vincent Wang wrote: > > Hi there, > > I'm using GBTClassifier do some classification jobs and find the

Re: Feature request: Java-specific transform method in Dataset

2018-07-01 Thread Sean Owen
an OK solution? On Sun, Jul 1, 2018, 2:36 PM Reynold Xin wrote: > This wouldn’t be a problem with Scala 2.12 right? > > On Sun, Jul 1, 2018 at 12:23 PM Sean Owen wrote: > >> I see, transform() doesn't have the same overload that other methods do >> in order to support Java 8

Re: Feature request: Java-specific transform method in Dataset

2018-07-01 Thread Sean Owen
f what I'm trying to accomplish In the unit > test here: > > https://github.com/void/spark/blob/java-transform/sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java#L73 > > > On Sun, Jul 1, 2018 at 2:48 PM Sean Owen wrote: > >> Don't Java 8 lambdas let yo

Re: Feature request: Java-specific transform method in Dataset

2018-07-01 Thread Sean Owen
Don't Java 8 lambdas let you do this pretty immediately? Can you give an example here of what you want to do and how you are trying to do it? On Sun, Jul 1, 2018, 12:42 PM Ismael Carnales wrote: > Hi, > it would be nice to have an easier way to use the Dataset transform > method from Java than

Re: spark 2.3.1 with kafka spark-streaming-kafka-0-10 (java.lang.AbstractMethodError)

2018-07-01 Thread Sean Owen
Somewhere, you have mismatched versions of Spark on your classpath. On Sun, Jul 1, 2018, 9:01 AM Peter Liu wrote: > Hello there, > > I didn't get any response/help from the user list for the following > question and thought people on the dev list might be able to help?: > > I upgraded to spark

PSA: how to handle LICENSE and NOTICE

2018-06-30 Thread Sean Owen
There's a PR open to fully update our LICENSE and NOTICE file to fix some omissions and bring it in line with best practices: https://github.com/apache/spark/pull/21640#pullrequestreview-131808865 I've summarized some of the issues and links to key reading. It's worth a quick look to ensure we're

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Sean Owen
If it's easy enough to produce them, I agree you can just add them to the RC dir. On Thu, Jun 28, 2018 at 11:56 AM Marcelo Vanzin wrote: > I just noticed this RC is missing builds for hadoop 2.3 and 2.4, which > existed in the previous version: >

Re: [VOTE] Spark 2.2.2 (RC2)

2018-06-27 Thread Sean Owen
+1 from me too. On Wed, Jun 27, 2018 at 3:31 PM Tom Graves wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.2.2. > > The vote is open until Mon, July 2nd @ 9PM UTC (2PM PDT) and passes if a > majority +1 PMC votes are cast, with a minimum of 3 +1 votes. > >

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-27 Thread Sean Owen
+1 from me too for the usual reasons. On Tue, Jun 26, 2018 at 3:25 PM Marcelo Vanzin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.1.3. > > The vote is open until Fri, June 29th @ 9PM UTC (2PM PDT) and passes if a > majority +1 PMC votes are cast, with a

Re: LICENSE and NOTICE file content

2018-06-25 Thread Sean Owen
Yes the code in there is ALv2 licensed; appears to be either created for Spark or copied from Hive. Yes, irrespective of the policy issue, it's important to be able to recreate these JARs somehow, and I don't think we have the source in the repo for all of them (at least, the ones that originate

Re: LICENSE and NOTICE file content

2018-06-25 Thread Sean Owen
@legal-discuss, brief recap: In Spark's test source code and release, there are some JAR files which exist to test handling of JAR files. Example: TestSerDe.jar in https://github.com/apache/spark/tree/master/sql/hive/src/test/resources/data/files Justin raises the legitimate question: these

Re: LICENSE and NOTICE file content

2018-06-23 Thread Sean Owen
On Sat, Jun 23, 2018 at 7:34 PM Justin Mclean wrote: > Hi, > > NOTICE is not the right place for attribution, the license information > usually include attribution (via the copyright line) and that info should > go in LICENSE. It’s often thought that “attribution notice requirements” > need to

Re: LICENSE and NOTICE file content

2018-06-23 Thread Sean Owen
On Sat, Jun 23, 2018 at 4:47 AM Justin Mclean wrote: > See [1] it’s a good idea to have a different LICENSE and NOTICE for source > and binary (and lots of other projects do this). > Agree, this just never happened after I got the initial big overhaul of the LICENSE/NOTICE in place that got

Re: LICENSE and NOTICE file content

2018-06-23 Thread Sean Owen
On Thu, Jun 21, 2018 at 10:10 PM Justin Mclean wrote: > Now I'm not on your PMC, don’t know your projects history and there may be > valid reasons for the current LICENSE and NOTICE contents so take this as > some friendly advice, you can choose to ignore it or not act on it. Looking > at your

Re: Jenkins build errors

2018-06-23 Thread Sean Owen
Also confused about this one as many builds succeed. One possible difference is that this failure is in the Hive tests, so are you building and testing with -Phive locally where it works? still does not explain the download failure. It could be a mirror problem, throttling, etc. But there again

Re: Jenkins build errors

2018-06-19 Thread Sean Owen
Those still appear to be env problems. I don't know why it is so persistent. Does it all pass locally? Retrigger tests again and see what happens. On Tue, Jun 19, 2018, 2:53 AM Petar Zecevic wrote: > > Thanks, but unfortunately, it died again. Now at pyspark tests: > > >

Re: Missing HiveConf when starting PySpark from head

2018-06-14 Thread Sean Owen
I think you would have to build with the 'hive' profile? but if so that would have been true for a while now. On Thu, Jun 14, 2018 at 10:38 AM Li Jin wrote: > Hey all, > > I just did a clean checkout of github.com/apache/spark but failed to > start PySpark, this is what I did: > > git clone

Re: Shared variable in executor level

2018-06-14 Thread Sean Owen
Just use a singleton or static variable. It will be a simple per-JVM value that is therefore per-executor. On Thu, Jun 14, 2018 at 6:59 AM Nikodimos Nikolaidis wrote: > Hello community, > > I am working on a project in which statistics (like predicate > selectivity) are collected during

Re: Scala 2.12 support

2018-06-07 Thread Sean Owen
When I updated for Scala 2.12, I was able to remove almost all the 2.11-2.12 differences. There are still already two source trees for 2.11 vs 2.12. I mean that if it's necessary to accommodate differences between the two, it's already set up for that, and there aren't a mess of differences to

Re: Scala 2.12 support

2018-06-06 Thread Sean Owen
If it means no change to 2.11 support, seems OK to me for Spark 2.4.0. The 2.12 support is separate and has never been mutually compatible with 2.11 builds anyway. (I also hope, suspect that the changes are minimal; tests are already almost entirely passing with no change to the closure cleaner

Re: Review notification bot

2018-06-06 Thread Sean Owen
Certainly I will frequently dig through 'git blame' to figure out who might be the right reviewer. Maybe that's automatable -- ping the person who last touched the most lines touched by the PR? There might be some false positives there. And I suppose the downside is being pinged forever for some

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-02 Thread Sean Owen
+1 from me with the same comments as in the last RC. On Fri, Jun 1, 2018 at 5:29 PM Marcelo Vanzin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.3.1. > > Given that I expect at least a few people to be busy with Spark Summit next > week, I'm taking the

Re: [VOTE] Spark 2.3.1 (RC3)

2018-06-01 Thread Sean Owen
Hm, that was merged two days ago, and you decided to revert it 2 hours ago. It sounds like this was maybe risky to put into 2.3.x during the RC phase, at least. You also don't seem certain whether there's a performance problem; how sure are you? These may all have been the right thing to do

Re: [VOTE] Spark 2.3.1 (RC2)

2018-05-23 Thread Sean Owen
+1 Same result for me as with RC1. On Tue, May 22, 2018 at 2:45 PM Marcelo Vanzin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.3.1. > > The vote is open until Friday, May 25, at 20:00 UTC and passes if > at least 3 +1 PMC votes are

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-16 Thread Sean Owen
+1 the release (otherwise) looks fine to me. Sigs and licenses are OK. Builds and passes tests on Debian with -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Pkubernetes On Tue, May 15, 2018 at 4:00 PM Marcelo Vanzin wrote: > Please vote on releasing the following candidate

Re: parser error?

2018-05-14 Thread Sean Owen
I don't know anything about it directly, but seems like it would have been caused by https://github.com/apache/spark/commit/e3201e165e41f076ec72175af246d12c0da529cf The "?" in fromClause is what's generating the warning, and it may be ignorable. On Mon, May 14, 2018 at 12:38 AM Reynold Xin

Re: GLM Poisson Model - Deviance calculations

2018-04-19 Thread Sean Owen
I see, this was handled for binomial deviance by the 'ylogy' method, which computes y log (y / mu), defining this to be 0 when y = 0. It's not necessary to add a delta or anything; 0 is the limit as y goes to 0 so it's fine. The same change is appropriate for Poisson deviance. Gamma deviance

Re: time for Apache Spark 3.0?

2018-04-19 Thread Sean Owen
That certainly sounds beneficial, to maybe several other projects. If there's no downside and it takes away API issues, seems like a win. On Thu, Apr 19, 2018 at 5:28 AM Dean Wampler wrote: > I spoke with Martin Odersky and Lightbend's Scala Team about the known API >

Re: GLM Poisson Model - Deviance calculations

2018-04-18 Thread Sean Owen
GeneralizedLinearRegression.ylogy seems to handle this case; can you be more specific about where the log(0) happens? that's what should be fixed, right? if so, then a JIRA and PR are the right way to proceed. On Wed, Apr 18, 2018 at 2:37 PM svattig wrote: > In

Re: time for Apache Spark 3.0?

2018-04-05 Thread Sean Owen
On Wed, Apr 4, 2018 at 6:20 PM Reynold Xin wrote: > The primary motivating factor IMO for a major version bump is to support > Scala 2.12, which requires minor API breaking changes to Spark’s APIs. > Similar to Spark 2.0, I think there are also opportunities for other >

Re: Changing how we compute release hashes

2018-03-16 Thread Sean Owen
use the new format, or do we only want to do this for new releases? > ​ > > On Fri, Mar 16, 2018 at 1:50 PM Felix Cheung <felixcheun...@hotmail.com> > wrote: > >> +1 there >> >> -- >> *From:* Sean Owen <sro...@gmail.c

Re: Changing how we compute release hashes

2018-03-16 Thread Sean Owen
I think the issue with that is that OS X doesn't have "sha512sum". Both it and Linux have "shasum -a 512" though. On Fri, Mar 16, 2018 at 11:05 AM Felix Cheung wrote: > Instead of using gpg to create the sha512 hash file we could just change > to using sha512sum? That

Re: Time Series Functionality with Spark

2018-03-12 Thread Sean Owen
(There was also https://github.com/sryza/spark-timeseries -- might be another point of reference for you.) On Mon, Mar 12, 2018 at 10:33 AM Li Jin wrote: > Hi All, > > This is Li Jin. We (me and my fellow colleagues at Two Sigma) have been > using Spark for time series

Re: Spark scala development in Sbt vs Maven

2018-03-05 Thread Sean Owen
Spark uses Maven as the primary build, but SBT works as well. It reads the Maven build to some extent. Zinc incremental compilation works with Maven (with the Scala plugin for Maven). Myself, I prefer Maven, for some of the reasons it is the main build in Spark: declarative builds end up being a

Re: Using bundler for Jekyll?

2018-03-02 Thread Sean Owen
If a header changes or news changes -- anything that causes a change on the common parts of pages -- yeah you'll get tons of modified files. No way around that as far as I know. However I've certainly observed differences that seem to be due to differing jekyll versions. To solve that I've always

Re: Please keep s3://spark-related-packages/ alive

2018-02-27 Thread Sean Owen
See http://apache-spark-developers-list.1001551.n3.nabble.com/What-is-d3kbcqa49mib13-cloudfront-net-td22427.html -- it was 'retired', yes. Agree with all that, though they're intended for occasional individual use and not a case where performance and uptime matter. For that, I think you'd want to

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-23 Thread Sean Owen
Same result as last RC for me. +1 On Thu, Feb 22, 2018 at 4:23 PM Sameer Agarwal wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.3.0. The vote is open until Tuesday February 27, 2018 at 8:00:00 am UTC > and passes if a majority of at

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-19 Thread Sean Owen
Maybe I misunderstand, but I don't see any .iml file in the 4 results on that page? it looks reasonable. On Mon, Feb 19, 2018 at 8:02 PM Felix Cheung wrote: > Any idea with sql func docs search result returning broken links as below? > > *From:* Felix Cheung

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-18 Thread Sean Owen
+1 from me as last time, same outcome. I saw one test fail, but passed on a second run, so just seems flaky. - subscribing topic by name from latest offsets (failOnDataLoss: true) *** FAILED *** Error while stopping stream: query.exception() is not empty after clean stop:

Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-15 Thread Sean Owen
SPARK-23381 is probably not a blocker IMHO; it's a nice-to-have to make some returned values match an external implementation, for code that hasn't been published yet. However I think it's OK to add to the 2.3.0 release if there's going to be another RC. On Wed, Feb 14, 2018 at 10:49 PM Holden

Re: I Want to Help with MLlib Migration

2018-02-15 Thread Sean Owen
I don't think you can move or alter the class APis. There also isn't much value in copying the code. Maybe there are opportunities for moving some internal code. But in general I think all this has to wait. On Thu, Feb 15, 2018, 5:17 AM Yacine Mazari wrote: > Hi, > > I see

Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-13 Thread Sean Owen
+1 from me. Again, licenses and sigs look fine. I built the source distribution with "-Phive -Phadoop-2.7 -Pyarn -Pkubernetes" and all tests passed. Remaining issues for 2.3.0, none of which are a Blocker: SPARK-22797 Add multiple column support to PySpark Bucketizer SPARK-23083 Adding

Re: redundant decision tree model

2018-02-13 Thread Sean Owen
I think the simple pruning you have in mind was just never implemented. That sort of pruning wouldn't help much if the nodes maintained a distribution over classes, as those are rarely identical, but, they just maintain a single class prediction. After training, I see no value in keeping those

Re: Drop the Hadoop 2.6 profile?

2018-02-08 Thread Sean Owen
mostly vestigial now. On Thu, Feb 8, 2018 at 3:57 PM Koert Kuipers <ko...@tresata.com> wrote: > CDH 5 is still based on hadoop 2.6 > > On Thu, Feb 8, 2018 at 2:03 PM, Sean Owen <so...@cloudera.com> wrote: > >> Mostly just shedding the extra build complexity, and

Re: Drop the Hadoop 2.6 profile?

2018-02-08 Thread Sean Owen
. On Thu, Feb 8, 2018 at 12:57 PM Reynold Xin <r...@databricks.com> wrote: > Does it gain us anything to drop 2.6? > > > On Feb 8, 2018, at 10:50 AM, Sean Owen <so...@cloudera.com> wrote: > > > > At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both fairl

Drop the Hadoop 2.6 profile?

2018-02-08 Thread Sean Owen
At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both fairly old, and actually, not different from 2.7 with respect to Spark. That is, I don't know if we are actually maintaining anything here but a separate profile and 2x the number of test builds. The cost is, by the same token, low.

Re: Difficulties building spark-master with sbt

2018-02-07 Thread Sean Owen
The master SBT builds seem OK, like: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/ It looks like an issue between Windows, SBT, and your env I think. On Wed, Feb 7, 2018 at 5:12 PM ds wrote: > After

Re: no-reopen-closed?

2018-01-28 Thread Sean Owen
a ticket from now creating > a new ticket every time it is closed? > > > On Sat, Jan 27, 2018 at 8:41 PM, Sean Owen <so...@cloudera.com> wrote: > >> Yeah you'd have to create a new one. You could link the two. >> >> >> On Sat, Jan 27, 2018, 7:07 PM

Re: no-reopen-closed?

2018-01-27 Thread Sean Owen
create a new one, right? > > Thanks, > > Xiao > > > > 2018-01-27 17:02 GMT-08:00 Sean Owen <so...@cloudera.com>: > >> Yes this happened about 6 months ago when we had a person reopen a JIRA >> over and over despite being told not to. We changed the workf

Re: no-reopen-closed?

2018-01-27 Thread Sean Owen
Yes this happened about 6 months ago when we had a person reopen a JIRA over and over despite being told not to. We changed the workflow such that Closed can't become Reopened. I would not move anything to Closed unless you need it to be permanent for reasons like that. Resolved is the normal end

Re: What is "*** UNCHECKED ***"?

2018-01-26 Thread Sean Owen
Yeah sounds like some JIRA 'feature' or issue. It's not any particular bother. If it persists I'll ask INFRA, sure. On Fri, Jan 26, 2018 at 12:00 PM Reynold Xin <r...@databricks.com> wrote: > Examples? > > > On Fri, Jan 26, 2018 at 9:56 AM, Sean Owen <so...@cloudera.com>

Fwd: ***UNCHECKED*** [jira] [Resolved] (SPARK-23218) simplify ColumnVector.getArray

2018-01-26 Thread Sean Owen
This is an example of the "*** UNCHECKED ***" message I was talking about -- it's part of the email subject rather than JIRA. -- Forwarded message - From: Xiao Li (JIRA) Date: Fri, Jan 26, 2018 at 11:18 AM Subject: ***UNCHECKED*** [jira] [Resolved] (SPARK-23218)

What is "*** UNCHECKED ***"?

2018-01-26 Thread Sean Owen
I probably missed this, but what is the new "*** UNCHECKED ***" message in the subject line of some JIRAs?

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Sean Owen
at 9:01 AM Sean Owen <so...@cloudera.com> wrote: > I'm not seeing that same problem on OS X and /usr/bin/tar. I tried > unpacking it with 'xvzf' and also unzipping it first, and it untarred > without warnings in either case. > > I am encountering errors while running the tests

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-23 Thread Sean Owen
I'm not seeing that same problem on OS X and /usr/bin/tar. I tried unpacking it with 'xvzf' and also unzipping it first, and it untarred without warnings in either case. I am encountering errors while running the tests, different ones each time, so am still figuring out whether there is a real

Re: Spark 3

2018-01-19 Thread Sean Owen
ain and see if we can do something with then. What's top > of everyone else's mind? > > On Jan 20, 2018 6:32 AM, "Sean Owen" <so...@cloudera.com> wrote: > >> Forking this thread to muse about Spark 3. Like Spark 2, I assume it >> would be more about making all those a

Spark 3

2018-01-19 Thread Sean Owen
curious if anyone had an opinion on whether to move on to Spark 3 next or just continue with 2.4 later this year. On Fri, Jan 19, 2018 at 11:13 AM Sean Owen <so...@cloudera.com> wrote: > Yeah, if users are using Kryo directly, they should be insulated from a > Spark-side change becaus

Re: Kryo 4 serialized form changes -- a problem?

2018-01-19 Thread Sean Owen
kryo in a minor upgrade in general. not that it > cannot be done. > > > > On Fri, Jan 19, 2018 at 8:55 AM, Sean Owen <so...@cloudera.com> wrote: > >> See: >> >> https://issues.apache.org/jira/browse/SPARK-23131 >> https://github.com/apache/spark

Kryo 4 serialized form changes -- a problem?

2018-01-19 Thread Sean Owen
See: https://issues.apache.org/jira/browse/SPARK-23131 https://github.com/apache/spark/pull/20301#issuecomment-358473199 I expected a major Kryo upgrade to be problematic, but it worked fine. It picks up a number of fixes: https://github.com/EsotericSoftware/kryo/releases/tag/kryo-parent-4.0.0

Re: [VOTE] Spark 2.3.0 (RC1)

2018-01-13 Thread Sean Owen
The signatures and licenses look OK. Except for the missing k8s package, the contents look OK. Tests look pretty good with "-Phive -Phadoop-2.7 -Pyarn" on Ubuntu 17.10, except that KafkaContinuousSourceSuite seems to hang forever. That was just fixed and needs to get into an RC? Aside from the

Re: Palantir replease under org.apache.spark?

2018-01-09 Thread Sean Owen
Just to follow up -- those are actually in a Palantir repo, not Central. Deploying to Central would be uncourteous, but this approach is legitimate and how it has to work for vendors to release distros of Spark etc. On Tue, Jan 9, 2018 at 11:43 AM Nan Zhu wrote: > Hi,

Re: Result obtained before the completion of Stages

2017-12-26 Thread Sean Owen
My guess is that either they haven't actually finished before the result and something about timestamps you're comparing is misleading, or else, you're looking at stages executing that are part of a later part of the program. On Tue, Dec 26, 2017 at 3:49 PM ckhari4u wrote: >

Re: Anyone know how to bypass tools.jar problem in JDK9 when mvn clean install SPARK code

2017-12-21 Thread Sean Owen
You need to run ./dev/change-scala-version.sh 2.12 first On Thu, Dec 21, 2017 at 4:38 PM Zhang, Liyun wrote: > Hi all: > > Now I am using JDK9 to compile Spark by (mvn clean install –DskipTests), > but exception was thrown > > > > [root@bdpe41 spark_source]# java

Re: Publishing official docker images for KubernetesSchedulerBackend

2017-12-19 Thread Sean Owen
lishable, that is >>>>>> consumable by kube-spark images, and mesos-spark images, and likely any >>>>>> other community image whose primary purpose is running spark components. >>>>>> The kube-specific dockerfiles would be written "

Re: Publishing official docker images for KubernetesSchedulerBackend

2017-12-19 Thread Sean Owen
> On Tue, Dec 19, 2017 at 11:45 AM, Sean Owen <so...@cloudera.com> wrote: > >> I think that's all correct, though the license of third party >> dependencies is actually a difficult and sticky part. The ASF couldn't make >> a software release including any GPL software f

Re: Publishing official docker images for KubernetesSchedulerBackend

2017-12-19 Thread Sean Owen
>>>> "spark-base" image. >>>> >>>> Does this factorization sound reasonable to others? >>>> Cheers, >>>> Erik >>>> >>>> >>>> On Wed, Nov 29, 2017 at 10:04 AM, Mridul Muralidhara

Re: [01/51] [partial] spark-website git commit: 2.2.1 generated doc

2017-12-17 Thread Sean Owen
/latest does not point to 2.2.1 yet. Not all the pieces are released yet, as I understand? On Sun, Dec 17, 2017 at 8:12 AM Jacek Laskowski wrote: > Hi, > > I saw the following commit, but I can't seem to see 2.2.1 as the version > in the header of the documentation pages under

Re: [RESULT][VOTE] Spark 2.2.1 (RC2)

2017-12-14 Thread Sean Owen
On the various access questions here -- what do you need to have that access? We definitely need to give you all necessary access if you're the release manager! On Thu, Dec 14, 2017 at 6:32 AM Felix Cheung wrote: > And I don’t have access to publish python. > > On Wed,

<    3   4   5   6   7   8   9   10   11   12   >