Re: Akka usage in Spark

2014-08-21 Thread Mayur Rustagi
The stream receiver seems to leverage actor receivers http://spark.apache.org/docs/0.8.1/streaming-custom-receivers.html But spark system doesnt lend itself to a messaging kind of a structure.. more of a DAG kind Just curious are you looking for the actor subsystem to act on messages or just l

[SNAPSHOT] Snapshot2 of Spark 1.1 has been posted

2014-08-21 Thread Patrick Wendell
Hi All, I've packaged and published a snapshot release of Spark 1.1 for testing. This is very close to RC1 and we are distributing it for testing. Please test this and report any issues on this thread. The tag of this release is v1.1.0-snapshot1 (commit e1535ad3): *https://git-wip-us.apache.org/r

Re: [SNAPSHOT] Snapshot2 of Spark 1.1 has been posted

2014-08-21 Thread Patrick Wendell
The docs for this release are also available here: http://people.apache.org/~pwendell/spark-1.1.0-snapshot2-docs/ On Thu, Aug 21, 2014 at 1:12 AM, Patrick Wendell wrote: > Hi All, > > I've packaged and published a snapshot release of Spark 1.1 for testing. > This is very close to RC1 and we ar

Re: is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-21 Thread Sean Owen
Maven is just telling you that there is no version 1.1.0 of yarn-parent, and indeed, it has not been released. To build the branch you would need to "mvn install" to compile and make available local copies of artifacts along the way. (You may have these for 1.1.0-SNAPSHOT locally already). Use Mave

Spark Contribution

2014-08-21 Thread Maisnam Ns
Hi, Can someone help me with some links on how to contribute for Spark Regards mns

Kinesis streaming integration in upcoming 1.1

2014-08-21 Thread Aniket Bhatnagar
Hi everyone I started looking at Kinesis integration and it looks promising. However, I feel like it can be improved. Here are my thoughts: 1. It assumes that AWS credentials are provided by DefaultAWSCredentialsProviderChain and there is no way to change the behavior. I would have liked to have

Re: Spark SQL Query and join different data sources.

2014-08-21 Thread chutium
as far as i know, HQL queries try to find the schema info of all the tables in this query from hive metastore, so it is not possible to join tables from sqlContext using hiveContext.hql but this should work: hiveContext.hql("select ...").regAsTable("a") sqlContext.jsonFile("xxx").regAsTable("b")

Re: is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-21 Thread Mridul Muralidharan
Weird that Patrick did not face this while creating the RC. Essentially the yarn alpha pom.xml has not been updated properly in the 1.1 branch. Just change version to '1.1.1-SNAPSHOT' for yarn/alpha/pom.xml (to make it same as any other pom). Regards, Mridul On Thu, Aug 21, 2014 at 5:09 AM, Ch

Re: is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-21 Thread Chester @work
Do we have Jenkins tests these ? Should be pretty easy to setup just to test basic build Sent from my iPhone > On Aug 21, 2014, at 6:45 AM, Mridul Muralidharan wrote: > > Weird that Patrick did not face this while creating the RC. > Essentially the yarn alpha pom.xml has not been updated prope

RE: Spark SQL Query and join different data sources.

2014-08-21 Thread Yan Zhou.sc
I doubt it will work as expected. Note that hiveContext.hql("select ...").regAsTable("a") will create a SchemaRDD before register the SchemaRDD with the (Hive) catalog; While sqlContext.jsonFile("xxx").regAsTable("b") will create a SchemaRDD before register the SchemaRDD with the SparkSQL catalo

Re: is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-21 Thread Chester Chen
Mridul, Thanks for the suggestion. I just updated the build today and changed the yarn/alpha/pom.xml to 1.1.1-SNAPSHOT then the command worked. I will create a JIRA and PR for it. Chester On Thu, Aug 21, 2014 at 8:03 AM, Chester @work wrote: > Do we have Jenkins tests these

Re: Lost executor on YARN ALS iterations

2014-08-21 Thread Nishkam Ravi
Can someone from Databricks test and commit this PR? This is not a complete solution, but would provide some relief. https://github.com/apache/spark/pull/1391 Thanks, Nishkam On Wed, Aug 20, 2014 at 12:39 AM, Sandy Ryza wrote: > Hi Debasish, > > The fix is to raise spark.yarn.executor.memoryOv

Re: Spark Contribution

2014-08-21 Thread Henry Saputra
The Apache Spark wiki on how to contribute should be great place to start: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark - Henry On Thu, Aug 21, 2014 at 3:25 AM, Maisnam Ns wrote: > Hi, > > Can someone help me with some links on how to contribute for Spark > > Regards >

PARSING_ERROR from kryo

2014-08-21 Thread npanj
Hi All, I am getting PARSING_ERROR while running my job on the code checked out up to commit# db56f2df1b8027171da1b8d2571d1f2ef1e103b6. I am running this job on EC2. Any idea if there is something wrong with my config? Here is my config: -- .set("spark.executor.extraJavaOptions", "-XX:+UseC

Re: Spark Contribution

2014-08-21 Thread Nicholas Chammas
We should add this link to the readme on GitHub btw. 2014년 8월 21일 목요일, Henry Saputra님이 작성한 메시지: > The Apache Spark wiki on how to contribute should be great place to > start: > https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark > > - Henry > > On Thu, Aug 21, 2014 at 3:25 AM,

Spark-JobServer moving to a new location

2014-08-21 Thread Evan Chan
Dear community, Wow, I remember when we first open sourced the job server, at the first Spark Summit in December. Since then, more and more of you have started using it and contributing to it. It is awesome to see! If you are not familiar with the spark job server, it is a REST API for managin

Re: Lost executor on YARN ALS iterations

2014-08-21 Thread Debasish Das
Sandy, I put spark.yarn.executor.memoryOverhead 1024 on spark-defaults.conf but I don't see environment variable on spark properties on the webui->environment Does it need to go in spark-env.sh ? Thanks. Deb On Wed, Aug 20, 2014 at 12:39 AM, Sandy Ryza wrote: > Hi Debasish, > > The fix is to

saveAsTextFile makes no progress without caching RDD

2014-08-21 Thread jerryye
Hi, Cross-posting this from users list. I'm running on branch-1.1 and trying to do a simple transformation to a relatively small dataset of 64GB and saveAsTextFile essentially hangs and tasks are stuck in running mode with the following code: // Stalls with tasks running for over an hour with n

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-21 Thread jerryye
bump. I'm seeing the same issue with branch-1.1. Caching the RDD before running saveAsTextFile gets things running but the job stalls 2/3 of the way by using too much memory. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-to-s3-on-spar

Storage Handlers in Spark SQL

2014-08-21 Thread Niranda Perera
Hi, I have been playing around with Spark for the past few days, and evaluating the possibility of migrating into Spark (Spark SQL) from Hive/Hadoop. I am working on the WSO2 Business Activity Monitor (WSO2 BAM, https://docs.wso2.com/display/BAM241/WSO2+Business+Activity+Monitor+Documentation ) w

RE: Spark SQL Query and join different data sources.

2014-08-21 Thread alexliu68
Presto is so far good at joining different sources/databases. I tried a simple join query in Spark SQL, it fails as the followings errors val a = cql("select test.a from test JOIN test1 on test.a = test1.a") a: org.apache.spark.sql.SchemaRDD = SchemaRDD[0] at RDD at SchemaRDD.scala:104 == Query

Too late to contribute for 1.1.0?

2014-08-21 Thread Evan Chan
I'm hoping to get in some doc enhancements and small bug fixes for Spark SQL. Also possibly a small new API to list the tables in sqlContext. Oh, and to get the doc page I had talked about before, a list of community Spark projects. thanks, Evan -

Re: Too late to contribute for 1.1.0?

2014-08-21 Thread Reynold Xin
I believe docs changes can go in anytime (because we can just publish new versions of docs). Critical bug fixes can still go in too. On Thu, Aug 21, 2014 at 11:43 PM, Evan Chan wrote: > I'm hoping to get in some doc enhancements and small bug fixes for Spark > SQL. > > Also possibly a small ne