Re: Github mirroring is running behind

2014-07-28 Thread Reynold Xin
Hi devs, I don't know if this is going to help, but if you can watch & vote on the ticket, it might help ASF INFRA prioritize and triage it faster: https://issues.apache.org/jira/browse/INFRA-8116 Please do. Thanks! On Mon, Jul 28, 2014 at 5:41 PM, Patrick Wendell wrote: > https://issues.apa

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-28 Thread Henry Saputra
NOTICE and LICENSE files look good Hashes and sigs look good No executable in the source distribution Compile source and run standalone +1 - Henry On Fri, Jul 25, 2014 at 4:08 PM, Tathagata Das wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.0.2. > > This

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-28 Thread Xiangrui Meng
+1 Tested basic spark-shell and pyspark operations and MLlib examples on a Mac. On Mon, Jul 28, 2014 at 8:29 PM, Mubarak Seyed wrote: > +1 (non-binding) > > Tested this on Mac OS X. > > > On Mon, Jul 28, 2014 at 6:52 PM, Andrew Or wrote: > >> +1 Tested on standalone and yarn clusters >> >> >> 2

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-28 Thread Mubarak Seyed
+1 (non-binding) Tested this on Mac OS X. On Mon, Jul 28, 2014 at 6:52 PM, Andrew Or wrote: > +1 Tested on standalone and yarn clusters > > > 2014-07-28 14:59 GMT-07:00 Tathagata Das : > > > Let me add my vote as well. > > Did some basic tests by running simple projects with various Spark > >

Re: on shark, is tachyon less efficient than memory_only cache strategy ?

2014-07-28 Thread qingyang li
hi, haoyuan, thanks for replying. 2014-07-21 16:29 GMT+08:00 Haoyuan Li : > Qingyang, > > Aha. Got it. > > 800MB data is pretty small. Loading from Tachyon does have a bit of extra > overhead. But it will have more benefit when the data size is larger. Also, > if you store the table in Tachyon,

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-28 Thread Andrew Or
+1 Tested on standalone and yarn clusters 2014-07-28 14:59 GMT-07:00 Tathagata Das : > Let me add my vote as well. > Did some basic tests by running simple projects with various Spark > modules. Tested checksums. > > +1 > > On Sun, Jul 27, 2014 at 4:52 PM, Matei Zaharia > wrote: > > +1 > > > >

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
bq. Either way its unclear to if there is any reason to use reflection to support multiple versions, instead of just upgrading to Hive 0.13.0 Which Spark release would this Hive upgrade take place ? I agree it is cleaner to upgrade Hive dependency vs. introducing reflection. Cheers On Mon, Jul

Github mirroring is running behind

2014-07-28 Thread Patrick Wendell
https://issues.apache.org/jira/browse/INFRA-8116 Just a heads up, the github mirroring is running behind. You can follow that JIRA to keep up to date on the fix. In the mean time you can use the Apache git itself: https://git-wip-us.apache.org/repos/asf/spark.git Some people have reported issue

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Steve Nunez
The larger goal is to get a clean compile & test in the environment I have to use. As near as I can tell, tests fail in parquet because parquet was only added in Hive 0.13. There could well be issues in later meta-stores, but one thing at a time... - SteveN On 7/28/14, 17:22, "Michael A

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Michael Armbrust
A few things: - When we upgrade to Hive 0.13.0, Patrick will likely republish the hive-exec jar just as we did for 0.12.0 - Since we have to tie into some pretty low level APIs it is unsurprising that the code doesn't just compile out of the box against 0.13.0 - ScalaReflection is for determinin

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
I was looking for a class where reflection-related code should reside. I found this but don't think it is the proper class for bridging differences between hive 0.12 and 0.13.1: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala Cheers On Mon, Jul 28, 2014 at 3:41 P

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
After manually copying hive 0.13.1 jars to local maven repo, I got the following errors when building spark-hive_2.10 module : [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala:182: type mismatch; found : String required: Array[String] [ERROR] va

Re: package/assemble with local spark

2014-07-28 Thread Reynold Xin
You can use publish-local in sbt. If you want to be more careful, you can give Spark a different version number and use that version number in your app. On Mon, Jul 28, 2014 at 4:33 AM, Larry Xiao wrote: > Hi, > > How do you package an app with modified spark? > > In seems sbt would resolve t

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-28 Thread Tathagata Das
Let me add my vote as well. Did some basic tests by running simple projects with various Spark modules. Tested checksums. +1 On Sun, Jul 27, 2014 at 4:52 PM, Matei Zaharia wrote: > +1 > > Tested this on Mac OS X. > > Matei > > On Jul 25, 2014, at 4:08 PM, Tathagata Das > wrote: > >> Please vot

Re: Can I translate the documentations of Spark in Japanese?

2014-07-28 Thread Nicholas Chammas
On Mon, Jul 28, 2014 at 12:48 AM, Patrick Wendell wrote: > I'd be interested to know what other projects > do about this situation! > I know some projects get translations crowdsourced via one website or other. Googling real quick, it appears there are a few sites that offer homes for this kind

Jenkins Documentation Build

2014-07-28 Thread DB Tsai
Hi Patrick, I started to work on the documentation about my work in spark. Since it has lots of dependencies to get the document build setup locally, it will be nice that people are able to verify/preview the document build for each PR. Is it possible to build the doc in Jenkins, and have a link p

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Steve Nunez
So, do we have a short-term fix until Hive 0.14 comes out? Perhaps adding the hive-exec jar to the spark-project repo? It doesn¹t look like there¹s a release date schedule for 0.14. On 7/28/14, 10:50, "Cheng Lian" wrote: >Exactly, forgot to mention Hulu team also made changes to cope with thos

Re: 'Proper' Build Tool

2014-07-28 Thread Patrick Wendell
Yeah for packagers we officially recommend using maven. Spark's dependency graph is very complicated and Maven and SBT use different conflict resolution strategies, so we've opted to official support Maven. SBT is still around though and it's used more often by day-to-day developers. - Patrick

Re: Can I translate the documentations of Spark in Japanese?

2014-07-28 Thread giwa
Hi Yu, I could help translating Spark documentation to Japanese. Please let me know if you need. Best, Ken On Mon, Jul 28, 2014 at 1:03 AM, Yu Ishikawa [via Apache Spark Developers List] wrote: > Hello Patrick, > > Thank you for your replying. > I checked some other projects in terms of i18n

Re: 'Proper' Build Tool

2014-07-28 Thread Stephen Boesch
Hi Steve, I had the opportunity to ask this question at the Summit to Andrew Orr. He mentioned that with 1.0 the recommended build tool is with maven. sbt is however still supported. You will notice that the dependencies are now completely handled within the maven pom.xml: the SparkBuild.scala

'Proper' Build Tool

2014-07-28 Thread Steve Nunez
Gents, It seem that until recently, building via sbt was a documented process in the 0.9 overview: http://spark.apache.org/docs/0.9.0/ The section on building mentions using sbt/sbt assembly. However in the latest overview: http://spark.apache.org/docs/latest/index.html There¹s no mention of b

Re: VertexPartition and ShippableVertexPartition

2014-07-28 Thread Ankur Dave
On Mon, Jul 28, 2014 at 4:29 AM, Larry Xiao wrote: > On 7/28/14, 3:41 PM, shijiaxin wrote: > >> There is a VertexPartition in the EdgePartition,which is created by >> >> EdgePartitionBuilder.toEdgePartition. >> >> and There is also a ShippableVertexPartition in the VertexRDD. >> >> These two Part

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Cheng Lian
Exactly, forgot to mention Hulu team also made changes to cope with those incompatibility issues, but they said that’s relatively easy once the re-packaging work is done. On Tue, Jul 29, 2014 at 1:20 AM, Patrick Wendell wrote: > I've heard from Cloudera that there were hive internal changes bet

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Mark Hamstra
Getting and maintaining our own branch in the main asf hive repo is a non-starter or isn't workable? On Mon, Jul 28, 2014 at 10:17 AM, Patrick Wendell wrote: > Yeah so we need a model for this (Mark - do you have any ideas?). I > did this in a personal github repo. I just did it quickly because

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Patrick Wendell
I've heard from Cloudera that there were hive internal changes between 0.12 and 0.13 that required code re-writing. Over time it might be possible for us to integrate with hive using API's that are more stable (this is the domain of Michael/Cheng/Yin more than me!). It would be interesting to see w

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Patrick Wendell
Yeah so we need a model for this (Mark - do you have any ideas?). I did this in a personal github repo. I just did it quickly because dependency issues were blocking the 1.0 release: https://github.com/pwendell/hive/tree/branch-0.12-shaded-protobuf I think what we want is to have a semi official

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Cheng Lian
AFAIK, according a recent talk, Hulu team in China has built Spark SQL against Hive 0.13 (or 0.13.1?) successfully. Basically they also re-packaged Hive 0.13 as what the Spark team did. The slides of the talk hasn't been released yet though. On Tue, Jul 29, 2014 at 1:01 AM, Ted Yu wrote: > Owen

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Mark Hamstra
Where and how is that fork being maintained? I'm not seeing an obviously correct branch or tag in the main asf hive repo & github mirror. On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell wrote: > It would be great if the hive team can fix that issue. If not, we'll > have to continue forking ou

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
Owen helped me find this: https://issues.apache.org/jira/browse/HIVE-7423 I guess this means that for Hive 0.14, Spark should be able to directly pull in hive-exec-core.jar Cheers On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell wrote: > It would be great if the hive team can fix that issue.

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Patrick Wendell
It would be great if the hive team can fix that issue. If not, we'll have to continue forking our own version of Hive to change the way it publishes artifacts. - Patrick On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu wrote: > Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still >

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still uber jar. Right now I am facing the following error building against Hive 0.13.1 : [ERROR] Failed to execute goal on project spark-hive_2.10: Could not resolve dependencies for project org.apache.spark:spark-hive_2.10:jar:

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Sean Owen
Yes, it is published. As of previous versions, at least, hive-exec included all of its dependencies *in its artifact*, making it unusable as-is because it contained copies of dependencies that clash with versions present in other artifacts, and can't be managed with Maven mechanisms. I am not sure

Re: Fraud management system implementation

2014-07-28 Thread Sandy Ryza
+user list bcc: dev list It's definitely possible to implement credit fraud management using Spark. A good start would be using some of the supervised learning algorithms that Spark provides in MLLib (logistic regression or linear SVMs). Spark doesn't have any HMM implementation right now. Sean

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
hive-exec (as of 0.13.1) is published here: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar Should a JIRA be opened so that dependency on hive-metastore can be replaced by dependency on hive-exec ? Cheers On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen wrote:

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Sean Owen
The reason for org.spark-project.hive is that Spark relies on hive-exec, but the Hive project does not publish this artifact by itself, only with all its dependencies as an uber jar. Maybe that's been improved. If so, you need to point at the new hive-exec and perhaps sort out its dependencies manu

Re: Fraud management system implementation

2014-07-28 Thread Nicholas Chammas
This sounds more like a user list question. This is the dev list, where people discuss things related to contributing code and such to Spark. On Mon, Jul 28, 2014 at 10:15 AM, jitendra shelar < jitendra.shelar...@gmail.com> wrote: > Hi, > > I am new to s

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
I found 0.13.1 artifacts in maven: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar However, Spark uses groupId of org.spark-project.hive, not org.apache.hive Can someone tell me how it is supposed to work ? Cheers On Mon, Jul 28, 2014 at 7:44 AM, Stev

Working Formula for Hive 0.13?

2014-07-28 Thread Steve Nunez
I saw a note earlier, perhaps on the user list, that at least one person is using Hive 0.13. Anyone got a working build configuration for this version of Hive? Regards, - Steve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is ad

Fraud management system implementation

2014-07-28 Thread jitendra shelar
Hi, I am new to spark. I am learning spark and scala. I had some queries. 1) Can somebody please tell me if it is possible to implement credit card fraud management system using spark? 2) If yes, can somebody please guide me how to proceed. 3) Shall I prefer Scala or Java for this implementation

package/assemble with local spark

2014-07-28 Thread Larry Xiao
Hi, How do you package an app with modified spark? In seems sbt would resolve the dependencies, and use the official spark release. Thank you! Larry

Re: Utilize newer hadoop releases WAS: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-28 Thread Sean Owen
Right, the scenario is, for example, that a class is added in release 2.5.0, but has been back-ported to a 2.4.1-based release. 2.4.1 isn't missing anything from 2.4.1. But a version of "2.4.1" doesn't tell you whether or not the class is there reliably. By the way, I just found there is already s

Re: No such file or directory errors running tests

2014-07-28 Thread Sean Owen
On Mon, Jul 28, 2014 at 3:35 AM, Stephen Boesch wrote: > mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive clean package > mvn -Pyarn -Phadoop-2.3 -Phive test Yes, it's unintuitive for Maven, since package always happens after test, which kind of makes sense in general. I suppose we could bind the gene

Re: new JDBC server test cases seems failed ?

2014-07-28 Thread Cheng Lian
Noticed that Nan’s PR is not related to SQL, but the JDBC test suites got executed. Then I checked PRs of all those Jenkins builds that failed because of the JDBC suites, it turns out that none of them touched SQL code. The JDBC code is only contained in the assembly file when the hive-thriftse