Re: Maven

2018-11-20 Thread Sean Owen
Sure, if you published Spark artifacts in a local repo (even your local file system) as com.foo:spark-core_2.12, etc, just depend on those artifacts, not the org.apache ones. On Tue, Nov 20, 2018 at 3:21 PM Jack Kolokasis wrote: > > Hello, > > is there any way to use my local custom - Spark

Re: Can I update the "2.12" PR builder to 2.11?

2018-11-20 Thread Sean Owen
t the build config) > > basically, it runs 'dev/change-scala-version.sh 2.11' and builds w/mvn and > '-Pscala-2.11' > > i'll also disable the spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12 > build. > > On Tue, Nov 20, 2018 at 8:52 AM Sean Owen wrote: >> >> Ah r

Re: Can I update the "2.12" PR builder to 2.11?

2018-11-20 Thread Sean Owen
ODO list to move in to the main > spark repo). > > and btw, the spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12 is *not* > a PR builder... ;) > > On Tue, Nov 20, 2018 at 8:20 AM Sean Owen wrote: > >> The one you set up to test 2.12 separately, >> spark-maste

Re: Can I update the "2.12" PR builder to 2.11?

2018-11-20 Thread Sean Owen
gt; but yes, it should just be a simple dev/change_scala_version.sh call in the > build step. > > shane > > On Tue, Nov 20, 2018 at 7:06 AM Sean Owen wrote: >> >> Shane, on your long list of TODOs, we still need to update the 2.12 PR >> builder to inst

Can I update the "2.12" PR builder to 2.11?

2018-11-20 Thread Sean Owen
Shane, on your long list of TODOs, we still need to update the 2.12 PR builder to instead test 2.11. Is that just a matter of editing Jenkins configuration that I can see and change? if so I'll just do it. Sean - To unsubscribe

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-20 Thread Sean Owen
17, 2018 at 7:28 PM Sean Owen wrote: > > I support dropping 2.11 support. My general logic is: > > - 2.11 is EOL, and is all the more EOL in the middle of next year when > Spark 3 arrives > - I haven't heard of a critical dependency that has no 2.12 counterpart > - 2.11 us

How to manually kick off an ASF -> github git sync

2018-11-19 Thread Sean Owen
I noticed the sync hasn't happened for about 2 days, and noticed https://issues.apache.org/jira/browse/INFRA-17269 and also noticed from there that we can trigger them manually, at http://selfserve.apache.org . Neat. I tried it just now; let's see.

Re: Jenkins down?

2018-11-19 Thread Sean Owen
Jenkins says it's shutting down; I assume shane needs to cycle it. Note also that the Apache - github sync looks like it is stuck; nothing has synced since yesterday: https://github.com/apache/spark/commits/master That might also be a factor in whatever you're observing. On Mon, Nov 19, 2018 at

CVE-2018-17190: Unsecured Apache Spark standalone executes user code

2018-11-18 Thread Sean Owen
Severity: Low Vendor: The Apache Software Foundation Versions Affected: All versions of Apache Spark Description: Spark's standalone resource manager accepts code to execute on a 'master' host, that then runs that code on 'worker' hosts. The master itself does not, by design, execute user code.

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-17 Thread Sean Owen
I support dropping 2.11 support. My general logic is: - 2.11 is EOL, and is all the more EOL in the middle of next year when Spark 3 arrives - I haven't heard of a critical dependency that has no 2.12 counterpart - 2.11 users can stay on 2.4.x, which will be notionally supported through, say, end

Re: which classes/methods are considered as private in Spark?

2018-11-13 Thread Sean Owen
You should find that 'surprisingly public' classes are there because of language technicalities. For example DummySerializerInstance is public because it's a Java class, and can't be used outside its package otherwise. LIkewise I think MiMa just looks at bytecode, and private[spark] classes are

Re: time for Apache Spark 3.0?

2018-11-13 Thread Sean Owen
by adding the tag and proposed release notes. On Tue, Nov 13, 2018, 11:49 AM Matt Cheah The release-notes label on JIRA sounds good. Can we make it a point to > have that done retroactively now, and then moving forward? > > On 11/12/18, 4:01 PM, "Sean Owen" wrote: > >

Re: time for Apache Spark 3.0?

2018-11-12 Thread Sean Owen
My non-definitive takes -- I would personally like to remove all deprecated methods for Spark 3. I started by removing 'old' deprecated methods in that commit. Things deprecated in 2.4 are maybe less clear, whether they should be removed Everything's fair game for removal or change in a major

Re: On Java 9+ support, Cleaners, modules and the death of reflection

2018-11-12 Thread Sean Owen
MaxDirectMemorySize if it's an issue. Or do nothing if you don't actually run up against the MaxDirectMemorySize limit, which seems to default to equal the size of the JVM heap. On Thu, Nov 8, 2018 at 12:46 PM Sean Owen wrote: > > I think this is a key thread, perhaps one of the only big problems, > f

Re: Spark Utf 8 encoding

2018-11-09 Thread Sean Owen
That doesn't necessarily look like a Spark-related issue. Your terminal seems to be displaying the glyph with a question mark because the font lacks that symbol, maybe? On Fri, Nov 9, 2018 at 7:17 PM lsn24 wrote: > > Hello, > > Per the documentation default character encoding of spark is UTF-8.

On Java 9+ support, Cleaners, modules and the death of reflection

2018-11-08 Thread Sean Owen
I think this is a key thread, perhaps one of the only big problems, for Java 9+ support: https://issues.apache.org/jira/browse/SPARK-24421?focusedCommentId=16680169=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16680169 We basically can't access a certain method

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-08 Thread Sean Owen
hange the alternative Scala version > to 2.13 and drop 2.11 if we just want to support two Scala versions at > one time. > > Thanks. > > Sincerely, > > DB Tsai > -- > Web: https://www.dbtsai.com > PGP Key ID: 0x

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-07 Thread Sean Owen
lowing "exclude Scala 2.13". Is there something inherent in making > 2.12 the default Scala version in Spark 3.0 that would prevent us from > supporting the option of building with 2.13? > > On Tue, Nov 6, 2018 at 5:48 PM Sean Owen wrote: >> >> That's possible her

Re: How to know all the issues resolved for 2.4.0?

2018-11-07 Thread Sean Owen
Use the Fix Version instead. Target Version is only used occasionally to mark that a JIRA is intended for a release. It isn't set on most of them that are rapidly created and resolved. There is some explanation of the few Resolution statuses that are used consistently, in

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-06 Thread Sean Owen
That's possible here, sure. The issue is: would you exclude Scala 2.13 support in 3.0 for this, if it were otherwise ready to go? I think it's not a hard rule that something has to be deprecated previously to be removed in a major release. The notice is helpful, sure, but there are lots of ways to

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-06 Thread Sean Owen
I think we should make Scala 2.12 the default in Spark 3.0. I would also prefer to drop Scala 2.11 support in 3.0. In theory, not dropping 2.11 support it means we'd support Scala 2.11 for years, the lifetime of Spark 3.x. In practice, we could drop 2.11 support in a 3.1.0 or 3.2.0 release, kind

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-06 Thread Sean Owen
__ > > From: Wenchen Fan > > Sent: Tuesday, November 6, 2018 8:51 AM > > To: Felix Cheung > > Cc: Matei Zaharia; Sean Owen; Spark dev list; Shivaram Venkataraman > > Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0 > > > >

Re: Java 11 support

2018-11-06 Thread Sean Owen
ems below. > > > > > From: Felix Cheung > Sent: Tuesday, November 6, 2018 8:57 AM > To: Wenchen Fan > Cc: Matei Zaharia; Sean Owen; Spark dev list; Shivaram Venkataraman > Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0 > >

Re: Removing non-deprecated R methods that were deprecated in Python, Scala?

2018-11-06 Thread Sean Owen
Sounds good, remove in 3.1? I can update accordingly. On Tue, Nov 6, 2018, 10:46 AM Reynold Xin Maybe deprecate and remove in next version? It is bad to just remove a > method without deprecation notice. > > On Tue, Nov 6, 2018 at 5:44 AM Sean Owen wrote: > >> See https:/

Removing non-deprecated R methods that were deprecated in Python, Scala?

2018-11-06 Thread Sean Owen
See https://github.com/apache/spark/pull/22921#discussion_r230568058 Methods like toDegrees, toRadians, approxCountDistinct were 'renamed' in Spark 2.1: deprecated, and replaced with an identical method with different name. However, these weren't actually deprecated in SparkR. Is it an oversight

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-05 Thread Sean Owen
What can we do to get the release through? is there any way to circumvent these tests or otherwise hack it? or does it need a maintenance release? On Mon, Nov 5, 2018 at 8:53 PM Felix Cheung wrote: > > FYI. SparkR submission failed. It seems to detect Java 11 correctly with > vignettes but not

Removing deprecated items in Spark 3

2018-11-01 Thread Sean Owen
I took a pass at removing most of the older deprecated items in Spark. For discussion: https://github.com/apache/spark/pull/22921 - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: python lint is broken on master branch

2018-10-31 Thread Sean Owen
Maybe a pycodestyle or flake8 version issue? On Wed, Oct 31, 2018 at 7:43 AM Wenchen Fan wrote: > > The Jenkins job spark-master-lint keeps failing. The error message is > flake8.exceptions.FailedToLoadPlugin: Flake8 failed to load plugin > "pycodestyle.break_after_binary_operator" due to

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-29 Thread Sean Owen
+1 Same result as in RC4 from me, and the issues I know of that were raised with RC4 are resolved. I tested vs Scala 2.12 and 2.11. These items are still targeted to 2.4.0; Xiangrui I assume these should just be untargeted now, or resolved? SPARK-25584 Document libsvm data source in doc site

Re: Drop support for old Hive in Spark 3.0?

2018-10-26 Thread Sean Owen
o provide a thrift/JDBC interface to Spark. Just to > be clear I am not proposing to remove the thriftserver in 3.0, but maybe it > is something we could evaluate in the long term. > > Thanks, > Marco > > > Il giorno ven 26 ott 2018 alle ore 19:07 Sean Owen ha > scritto:

Re: Drop support for old Hive in Spark 3.0?

2018-10-26 Thread Sean Owen
n. > > PS. For Hadoop, let's have another thread if needed. I expect another long > story. :) > > > On Fri, Oct 26, 2018 at 7:11 AM Sean Owen wrote: > >> Here's another thread to start considering, and I know it's been raised >> before. >> What version(s) of

Re: What if anything to fix about k8s for the 2.4.0 RC5?

2018-10-26 Thread Sean Owen
This is all merged to master/2.4. AFAIK there aren't any items I'm monitoring that are needed for 2.4. On Thu, Oct 25, 2018 at 6:54 PM Sean Owen wrote: > Yep, we're going to merge a change to separate the k8s tests into a > separate profile, and fix up the Scala 2.12 thing. While non-cr

Drop support for old Hive in Spark 3.0?

2018-10-26 Thread Sean Owen
Here's another thread to start considering, and I know it's been raised before. What version(s) of Hive should Spark 3 support? If at least we know it won't include Hive 0.x, could we go ahead and remove those tests from master? It might significantly reduce the run time and flakiness. It seems

Re: What if anything to fix about k8s for the 2.4.0 RC5?

2018-10-25 Thread Sean Owen
ought to be entirely safe, and avoid two of the issues we >>> identified. >>> >> >> Besides disabling it, when someone wants to run the tests with 2.12 he >> should be able to do so. So propagating the Scala profile still makes sense >> but it is not related to the r

Re: KryoSerializer Implementation - Not using KryoPool

2018-10-25 Thread Sean Owen
to open a PR but, pardon my ignorance, how would I go about > doing that properly? Do I need to open a JIRA issue first? Also how would I > demonstrate performance gains? Do you guys use something like ScalaMeter? > > Thanks for your help! > > On Wed, Oct 24, 2018 at 2:37 PM Sean

Re: What if anything to fix about k8s for the 2.4.0 RC5?

2018-10-25 Thread Sean Owen
: > > Hopefully, this will not delay RC5. Since this is not a blocker ticket, RC5 > will start if all the blocker tickets are resolved. > > Thanks, > > Xiao > > Sean Owen 于2018年10月25日周四 上午8:44写道: >> >> Yes, I agree, and perhaps you are best placed to do that for

Re: What if anything to fix about k8s for the 2.4.0 RC5?

2018-10-25 Thread Sean Owen
On Thu, Oct 25, 2018 at 6:16 PM, Erik Erlandson wrote: >>> >>> >>> I would be comfortable making the integration testing manual for now. A >>> JIRA for ironing out how to make it reliable for automatic as a goal for >>> 3.0 seems like a good idea. >

Re: What's a blocker?

2018-10-25 Thread Sean Owen
What does "PMC members aren't saying its a block for reasons other then the actual impact the jira has" mean that isn't already widely agreed? Likewise "Committers and PMC members should not be saying its not a blocker because they personally or their company doesn't care about this feature or

Re: KryoSerializer Implementation - Not using KryoPool

2018-10-24 Thread Sean Owen
I don't know; possibly just because it wasn't available whenever Kryo was first used in the project. Skimming the code, the KryoSerializerInstance looks like a wrapper that provides a Kryo object to do work. It already maintains a 'pool' of just 1 instance. Is the point that KryoSerializer can

What's a blocker?

2018-10-24 Thread Sean Owen
Shifting this to dev@. See the PR https://github.com/apache/spark/pull/22144 for more context. There will be no objective, complete definition of blocker, or even regression or correctness issue. Many cases are clear, some are not. We can draw up more guidelines, and feel free to open PRs against

CVE-2018-11804: Apache Spark build/mvn runs zinc, and can expose information from build machines

2018-10-24 Thread Sean Owen
Severity: Low Vendor: The Apache Software Foundation Versions Affected: 1.3.x release branch and later, including master Description: Spark's Apache Maven-based build includes a convenience script, 'build/mvn', that downloads and runs a zinc server to speed up compilation. This server will

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-23 Thread Sean Owen
Hm, so you're trying to build a source release from a binary release? I don't think that needs to work nor do I expect it to for reasons like this. They just have fairly different things. On Tue, Oct 23, 2018 at 7:04 PM Dongjoon Hyun wrote: > > Ur, Wenchen. > > Source distribution seems to fail

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-23 Thread Sean Owen
To be clear I'm currently +1 on this release, with much commentary. OK, the explanation for kubernetes tests makes sense. Yes I think we need to propagate the scala-2.12 build profile to make it work. Go for it, if you have a lead on what the change is. This doesn't block the release as it's an

Re: Documentation of boolean column operators missing?

2018-10-23 Thread Sean Owen
r for >> PySpark. Indeed, I don’t think such a thing is possible in PySpark. (e.g. >> (col('age') > 0).and(...)) >> >> I can file a ticket about this, but I’m just making sure I’m not missing >> something obvious. >> >> >> On Tue, Oct 23, 2018 at

Re: Documentation of boolean column operators missing?

2018-10-23 Thread Sean Owen
Those should all be Column functions, really, and I see them at http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column On Tue, Oct 23, 2018, 12:27 PM Nicholas Chammas wrote: > I can’t seem to find any documentation of the &, |, and ~ operators for > PySpark

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-23 Thread Sean Owen
/22805#issuecomment-432304622 for a change that might address this kind of thing.) On Tue, Oct 23, 2018 at 11:05 AM Sean Owen wrote: > > Yeah, that's maybe the issue here. This is a source release, not a git > checkout, and it still needs to work in this context. > > I just added -Pk

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-23 Thread Sean Owen
Yeah, that's maybe the issue here. This is a source release, not a git checkout, and it still needs to work in this context. I just added -Pkubernetes to my build and didn't do anything else. I think the ideal is that a "mvn -P... -P... install" to work from a source release; that's a good

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-23 Thread Sean Owen
; release and publish doc to spark website later. Can anyone confirm? > > On Tue, Oct 23, 2018 at 8:30 AM Sean Owen wrote: >> >> This is what I got from a straightforward build of the source distro >> here ... really, ideally, it builds as-is from source. You're saying >

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-22 Thread Sean Owen
)/spark-2.4.0-SNAPSHOT-bin-test.tgz > cd resource-managers/kubernetes/integration-tests > ./dev/dev-run-integration-tests.sh --image-tag $SPARK_K8S_IMAGE_TAG > --spark-tgz $TGZ_PATH --image-repo $DOCKER_USERNAME > > Stavros > > On Tue, Oct 23, 2018 at 1:54 AM, Sean Owen wrote: >

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-22 Thread Sean Owen
Provisionally looking good to me, but I had a few questions. We have these open for 2.4, but I presume they aren't actually going to be in 2.4 and should be untargeted: SPARK-25507 Update documents for the new features in 2.4 release SPARK-25179 Document the features that require Pyarrow 0.10

Re: [MLlib] PCA Aggregator

2018-10-19 Thread Sean Owen
/questions/45240556/perform-pca-on-each-group-of-a-groupby-in-pyspark > > This isn't really big enough to warrant its own library--it's just a > single class. But if you think it's better to publish it externally I can > certainly do that. > > thanks again, > --Matt > > > On Fri, Oct

Re: [MLlib] PCA Aggregator

2018-10-19 Thread Sean Owen
It's OK to open a JIRA though I generally doubt any new functionality will be added. This might be viewed as a small worthwhile enhancement, haven't looked at it. It's always more compelling if you can sketch the use case for it and why it is more meaningful in spark than outside it. There is

Re: some doubt on code understanding

2018-10-17 Thread Sean Owen
"/" is integer division, so "x / y * y" is not x, but more like the biggest multiple of y that's <= x. On Wed, Oct 17, 2018 at 11:25 AM Sandeep Katta wrote: > > Hi Guys, > > I am trying to understand structured streaming code flow by doing so I came > across below code flow > > def

Starting to make changes for Spark 3 -- what can we delete?

2018-10-16 Thread Sean Owen
There was already agreement to delete deprecated things like Flume and Kafka 0.8 support in master. I've got several more on my radar, and wanted to highlight them and solicit general opinions on where we should accept breaking changes. For example how about removing accumulator v1?

Re: Remove Flume support in 3.0.0?

2018-10-11 Thread Sean Owen
park (eg > via Kafka), so i believe people would not be affected so much by a removal. > > (Non-Voting just my opinion) > > > Am 10.10.2018 um 22:31 schrieb Sean Owen : > > > > Marcelo makes an argument that Flume support should be removed in > > 3.0.0 at h

Re: [VOTE] SPARK 2.4.0 (RC3)

2018-10-11 Thread Sean Owen
for changing ScalaUDF again before the release. Have a look, anyone familiar with catalyst. On Wed, Oct 10, 2018 at 3:00 PM Sean Owen wrote: > > +1. I tested the source build against Scala 2.12 and common build > profiles. License and sigs look OK. > > No blockers; one critical:

Re: Remove Flume support in 3.0.0?

2018-10-10 Thread Sean Owen
le remaining dstream > connector in Spark, though. (If you ignore kinesis which we can't ship > in binary form or something like that?) > On Wed, Oct 10, 2018 at 1:32 PM Sean Owen wrote: > > > > Marcelo makes an argument that Flume support should be removed in > > 3.0.0 a

Re: [VOTE] SPARK 2.4.0 (RC3)

2018-10-10 Thread Sean Owen
+1. I tested the source build against Scala 2.12 and common build profiles. License and sigs look OK. No blockers; one critical: SPARK-25378 ArrayData.toArray(StringType) assume UTF8String in 2.4 I think this one is "won't fix" though? not trying to restore the behavior? Other items open for

Remove Flume support in 3.0.0?

2018-10-10 Thread Sean Owen
Marcelo makes an argument that Flume support should be removed in 3.0.0 at https://issues.apache.org/jira/browse/SPARK-25598 I tend to agree. Is there an argument that it needs to be supported, and can this move to Bahir if so?

Re: Docker image to build Spark/Spark doc

2018-10-10 Thread Sean Owen
You can just build it with Maven or SBT as in the docs. I don't know of a docker image but there isn't much to package. On Wed, Oct 10, 2018, 1:10 AM assaf.mendelson wrote: > Hi all, > I was wondering if there was a docker image to build spark and/or spark > documentation > > The idea would be

Re: Random sampling in tests

2018-10-08 Thread Sean Owen
ment why we did it that way. >> >> 2. Use a seed and log the seed, so any test failures can be reproduced >> deterministically. For this one, it'd be better to pick the seed from a seed >> environmental variable. If the env variable is not set, set to a random seed. >> &g

Random sampling in tests

2018-10-08 Thread Sean Owen
Recently, I've seen 3 pull requests that try to speed up a test suite that tests a bunch of cases by randomly choosing different subsets of cases to test on each Jenkins run. There's disagreement about whether this is good approach to improving test runtime. Here's a discussion on one that was

Re: On Scala 2.12.7

2018-10-01 Thread Sean Owen
needed. On Mon, Oct 1, 2018 at 10:20 PM Wenchen Fan wrote: > > SGTM then. Is there anything we need to do to pick up the 2.12.7 upgrade? > like updating Jenkins config? > > On Tue, Oct 2, 2018 at 10:53 AM Sean Owen wrote: >> >> I tested both ways, and it actually works f

Re: On Scala 2.12.7

2018-10-01 Thread Sean Owen
I tested both ways, and it actually works fine. It calls into question whether there's really a fix we need with 2.12.7, but, I hear two informed opinions (Darcy and the scala release notes) that it was relevant. As we have no prior 2.12 support, I guess my feeling was indeed to get this update in

Re: saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

2018-09-30 Thread Sean Owen
.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > > Pozdrawiam, > Jacek Laskowski > > https://about.me/JacekLaskowski >

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-09-28 Thread Sean Owen
+1, with comments: There are 5 critical issues for 2.4, and no blockers: SPARK-25378 ArrayData.toArray(StringType) assume UTF8String in 2.4 SPARK-25325 ML, Graph 2.4 QA: Update user guide for new features & APIs SPARK-25319 Spark MLlib, GraphX 2.4 QA umbrella SPARK-25326 ML, Graph 2.4 QA:

On Scala 2.12.7

2018-09-28 Thread Sean Owen
I'm forking the discussion about Scala 2.12.7 from the 2.4.0 RC vote thread. 2.12.7 was released yesterday, and, is even labeled as fixing Spark 2.4 compatibility! https://www.scala-lang.org/news/2.12.7 We should look into it, yes. Darcy identified, and they fixed, this issue:

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-09-28 Thread Sean Owen
Go ahead and file a JIRA to update to 2.12.7 with these details. We'll assess whether it is a blocker. On Fri, Sep 28, 2018 at 12:09 PM Darcy Shen wrote: > > I agree it is a non-important Spark bug. I mean the Option and String > comparison. The bug is easy to fix and obvious to confirm. If the

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-09-28 Thread Sean Owen
I don't even know how or if this manifests as a bug. The code is indeed incorrect and the 2.12 compiler flags it. We fixed a number of these in SPARK-25398. While I want to get this into 2.4 if we have another RC, I don't see evidence this is a blocker. It is not specific to Scala 2.12. Using

Re: Filtering based on a float value with more than one decimal place not working correctly in Pyspark dataframe

2018-09-26 Thread Sean Owen
Is this not just a case of floating-point literals not being exact? this is expressed in Python, not SQL. On Wed, Sep 26, 2018 at 12:46 AM Meethu Mathew wrote: > Hi all, > > I tried the following code and the output was not as expected. > > schema = StructType([StructField('Id', StringType(),

Re: 2.4.0 Blockers, Critical, etc

2018-09-21 Thread Sean Owen
pleting the release... > > > > From: Wenchen Fan > Sent: Friday, September 21, 2018 12:02 AM > To: Sean Owen > Cc: Spark dev list > Subject: Re: 2.4.0 Blockers, Critical, etc > > Sean thanks for checking them! > > I made one pass and re-targeted/closed some of them. Mos

2.4.0 Blockers, Critical, etc

2018-09-20 Thread Sean Owen
Because we're into 2.4 release candidates, I thought I'd look at what's still open and targeted at 2.4.0. I presume the Blockers are the usual umbrellas that don't themselves block anything, but, confirming, there is nothing left to do there? I think that's mostly a question for Joseph and

Re: [DISCUSS] upper/lower of special characters

2018-09-19 Thread Sean Owen
I don't have the details in front of me, but I recall we explicitly overhauled locale-sensitive toUpper and toLower in the code for this exact situation. The current behavior should be on purpose. I believe user data strings are handled in a case sensitive way but things like reserved words in SQL

Re: ***UNCHECKED*** Re: [VOTE] SPARK 2.3.2 (RC6)

2018-09-19 Thread Sean Owen
gt; Thank you, Saisai. >>> >>> Bests, >>> Dongjoon. >>> >>> On Mon, Sep 17, 2018 at 6:48 PM Saisai Shao wrote: >>>> >>>> +1 from my own side. >>>> >>>> Thanks >>>> Saisai >>>> >>

Re: [VOTE] SPARK 2.3.2 (RC6)

2018-09-17 Thread Sean Owen
+1 . Licenses and sigs check out as in previous 2.3.x releases. A build from source with most profiles passed for me. On Mon, Sep 17, 2018 at 8:17 AM Saisai Shao wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.3.2. > > The vote is open until September 21

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-16 Thread Sean Owen
: > > Ah I missed the Scala 2.12 build. Do you mean we should publish a Scala 2.12 > build this time? Current for Scala 2.11 we have 3 builds: with hadoop 2.7, > with hadoop 2.6, without hadoop. Shall we do the same thing for Scala 2.12? > > On Mon, Sep 17, 2018 at 11:14 A

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-16 Thread Sean Owen
A few preliminary notes: Wenchen for some weird reason when I hit your key in gpg --import, it asks for a passphrase. When I skip it, it's fine, gpg can still verify the signature. No issue there really. The staging repo gives a 404:

Re: Branch 2.4 is cut

2018-09-07 Thread Sean Owen
holder tag that > we can change later? > > On Fri, Sep 7, 2018 at 8:15 AM Wenchen Fan wrote: > >> I've reached to Shane, but he is busy recently. I'll figure it out with >> Josh soon. Will post update to this thread later. >> >> Thanks, >> Wenchen >>

Re: Branch 2.4 is cut

2018-09-07 Thread Sean Owen
l try and update you later. Thanks! >> >> On Thu, Sep 6, 2018 at 9:44 PM Sean Owen wrote: >> >>> BTW it does appear the Scala 2.12 build works now: >>> >>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hado

Re: Spark JIRA tags clarification and management

2018-09-06 Thread Sean Owen
I believe 'starter' is still the standard tag for simple issues for newcomers. On Thu, Sep 6, 2018 at 8:46 PM Hyukjin Kwon wrote: > Does anyone know if we still user starter or newbie tags as well? >

Re: time for Apache Spark 3.0?

2018-09-06 Thread Sean Owen
I think this doesn't necessarily mean 3.0 is coming soon (thoughts on timing? 6 months?) but simply next. Do you mean you'd prefer that change to happen before 3.x? if it's a significant change, seems reasonable for a major version bump rather than minor. Is the concern that tying it to 3.0 means

Re: Branch 2.4 is cut

2018-09-06 Thread Sean Owen
BTW it does appear the Scala 2.12 build works now: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/229/ Let's try also producing a 2.12 build with this release. The machinery should be there in the release scripts, but let me

Re: code freeze and branch cut for Apache Spark 2.4

2018-09-05 Thread Sean Owen
(I slipped https://github.com/apache/spark/pull/22340 in for Scala 2.12. Maybe it really is the last one. In any event, yes go ahead with a 2.4 RC) On Wed, Sep 5, 2018 at 8:14 PM Wenchen Fan wrote: > The repartition correctness bug fix is merged. The Scala 2.12 PRs > mentioned in this thread

Re: Select top (100) percent equivalent in spark

2018-09-04 Thread Sean Owen
Sort and take head(n)? On Tue, Sep 4, 2018 at 12:07 PM Chetan Khatri wrote: > Dear Spark dev, anything equivalent in spark ? >

Re: Jenkins automatic disabling service - who and why?

2018-09-03 Thread Sean Owen
I'm not sure if anything changed. What is the particular issue here? Looks like for some reason Jenkins first asked for admin approval well after the discussion started. Nobody asked for a test after that. Can you not trigger it from the web app UI? On Mon, Sep 3, 2018, 1:54 AM Hyukjin Kwon

Re: Nightly Builds in the docs (in spark-nightly/spark-master-bin/latest? Can't seem to find it)

2018-08-31 Thread Sean Owen
There are some builds there, but they're not recent: https://people.apache.org/~pwendell/spark-nightly/ We can either get the jobs running again, or just knock this on the head and remove it. Anyone know how to get it running again and want to? I have a feeling Shane knows if anyone. Or does

Re: TimSort bug

2018-08-31 Thread Sean Owen
TL;DR - We already had the fix from SPARK-5984. The delta from the current JDK implementation to Spark's looks actually inconsequential. No action required AFAICT. On Fri, Aug 31, 2018 at 12:30 PM Sean Owen wrote: > I looked into this, because it sure sounds like a similar issue from a

Re: [discuss] replacing SPIP template with Heilmeier's Catechism?

2018-08-31 Thread Sean Owen
Looks good. From the existing template at https://spark.apache.org/improvement-proposals.html I might keep points about design sketch, API, and non goals. And we don't need a cost section. On Fri, Aug 31, 2018, 1:23 PM Reynold Xin wrote: > I helped craft the current SPIP template >

Re: TimSort bug

2018-08-31 Thread Sean Owen
I looked into this, because it sure sounds like a similar issue from a few years ago that was fixed in https://issues.apache.org/jira/browse/SPARK-5984 The change in that JIRA actually looks almost identical to the change mentioned in the JDK bug:

Re: mllib + SQL

2018-08-31 Thread Sean Owen
My $0.02 -- this isn't worthwhile. Yes, there are ML-in-SQL tools. I'm thinking of MADlib for example. I think these hold over from days when someone's only interface to a data warehouse was SQL, and so there had to be SQL-language support for invoking ML jobs. There was no programmatic

Re: Upgrade SBT to the latest

2018-08-31 Thread Sean Owen
Certainly worthwhile. I think this should target Spark 3, which should come after 2.4, which is itself already just about ready to test and release. On Fri, Aug 31, 2018 at 8:16 AM Darcy Shen wrote: > > SBT 1.x is ready for a long time. > > We may spare some time upgrading sbt for Spark. > > An

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-30 Thread Sean Owen
I know it's famous last words, but we really might be down to the last fix: https://github.com/apache/spark/pull/22264 More a question of making tests happy at this point I think than fundamental problems. My goal is to make sure we can release a usable, but beta-quality, 2.12 release of Spark in

Update to Kryo 4 for Spark 2.4?

2018-08-30 Thread Sean Owen
I wanted to call any interested eyes to this discussion: https://github.com/apache/spark/pull/22179

Re: [VOTE] SPIP: Executor Plugin (SPARK-24918)

2018-08-28 Thread Sean Owen
her stage runs on > executor B? > > It's just way more complicated, if possible at all, to write this > feature by asking everybody to change their application code. > > > On Tue, Aug 28, 2018 at 9:39 AM, Sean Owen wrote: > > I should be able to force a class to init

Re: [VOTE] SPIP: Executor Plugin (SPARK-24918)

2018-08-28 Thread Sean Owen
Still +0 on the idea, as I am still not sure it does much over simple JVM mechanisms like a class init. More comments on the JIRA. I can't say it's a bad idea though, so would not object to it. On Tue, Aug 28, 2018 at 8:50 AM Imran Rashid wrote: > There has been discussion on jira & the PR, all

Re: Porting or explicitly linking project style in Apache Spark based on https://github.com/databricks/scala-style-guide

2018-08-23 Thread Sean Owen
Seems OK to me. The style is pretty standard Scala style anyway. My guidance is always to follow the code around the code you're changing. On Thu, Aug 23, 2018 at 8:14 PM Hyukjin Kwon wrote: > Hi all, > > I usually follow https://github.com/databricks/scala-style-guide for > Apache Spark's

Proposing an 18-month maintenance period for feature branches

2018-08-15 Thread Sean Owen
This was mentioned in another thread, but I wanted to highlight this proposed change to our release policy: https://github.com/apache/spark-website/pull/136 Right now we don't have any guidance about how long feature branches get maintenance releases. I'm proposing 18 months as a guide -- not a

Re: Naming policy for packages

2018-08-15 Thread Sean Owen
g top > Datatbricks ( > https://github.com/databricks?utf8=%E2%9C%93=spark==) or > RStudio (sparklyr) contributions, violate the trademark? > > > Sent with ProtonMail <https://protonmail.com> Secure Email. > > ‐‐‐ Original Message ‐‐‐ > On August 15, 2018 5:51

Re: Naming policy for packages

2018-08-15 Thread Sean Owen
>> On Wed, Aug 15, 2018 at 7:19 AM Simon Dirmeier >> wrote: >> >>> Hey, >>> thanks for clearning that up. >>> Imho this is somewhat unfortunate, because package names that contain >>> "spark", somewhat promote and advertise Apache Spark, righ

Re: Naming policy for packages

2018-08-15 Thread Sean Owen
You raise a great point, and we were just discussing this. The page is old and contains many projects that were listed before the trademarks we're being enforced. Some have renamed themselves. We will update the page and remove stale or noncompliant projects and ask those that need to change to do

<    2   3   4   5   6   7   8   9   10   11   >