Re: Spark SQL

2014-09-14 Thread Burak Yavuz
Hi, I'm not a master on SparkSQL, but from what I understand, the problem ıs that you're trying to access an RDD inside an RDD here: val xyz = file.map(line = *** extractCurRate(sqlContext.sql(select rate ... *** and here: xyz = file.map(line = *** extractCurRate(sqlContext.sql(select rate

Broadcast error

2014-09-14 Thread Chengi Liu
Hi, I am trying to create an rdd out of large matrix sc.parallelize suggest to use broadcast But when I do sc.broadcast(data) I get this error: Traceback (most recent call last): File stdin, line 1, in module File /usr/common/usg/spark/1.0.2/python/pyspark/context.py, line 370, in

Re: Broadcast error

2014-09-14 Thread Chengi Liu
Specifically the error I see when I try to operate on rdd created by sc.parallelize method : org.apache.spark.SparkException: Job aborted due to stage failure: Serialized task 12:12 was 12062263 bytes which exceeds spark.akka.frameSize (10485760 bytes). Consider using broadcast variables for large

File operations on spark

2014-09-14 Thread rapelly kartheek
Hi I am trying to perform read/write file operations in spark by creating Writable object. But, I am not able to write to a file. The concerned data is not rdd. Can someone please tell me how to perform read/write file operations on non-rdd data in spark. Regards karthik

Driver fail with out of memory exception

2014-09-14 Thread richiesgr
Hi I've written a job (I think not very complicated only 1 reduceByKey) the driver JVM always hang with OOM killing the worker of course. How can I know what is running on the driver and what is running on the worker how to debug the memory problem. I've already used --driver-memory 4g params to

Re: Driver fail with out of memory exception

2014-09-14 Thread Akhil Das
Try increasing the number of partitions while doing a reduceByKey() http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.api.java.JavaPairRDD Thanks Best Regards On Sun, Sep 14, 2014 at 5:11 PM, richiesgr richie...@gmail.com wrote: Hi I've written a job (I think not very

object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread arthur.hk.c...@gmail.com
Hi, I have tried to to run HBaseTest.scala, but I got following errors, any ideas to how to fix them? Q1) scala package org.apache.spark.examples console:1: error: illegal start of definition package org.apache.spark.examples Q2) scala import

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread Ted Yu
Spark examples builds against hbase 0.94 by default. If you want to run against 0.98, see: SPARK-1297 https://issues.apache.org/jira/browse/SPARK-1297 Cheers On Sun, Sep 14, 2014 at 7:36 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I have tried to to run

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread arthur.hk.c...@gmail.com
Hi,Thanks!!I tried to apply the patches, bothspark-1297-v2.txt andspark-1297-v4.txt are good here, but notspark-1297-v5.txt:$ patch -p1 -i spark-1297-v4.txtpatching file examples/pom.xml$ patch -p1 -i spark-1297-v5.txtcan't find file to patch at input line 5Perhaps you used the wrong -p or --strip

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread Ted Yu
spark-1297-v5.txt is level 0 patch Please use spark-1297-v5.txt Cheers On Sun, Sep 14, 2014 at 8:06 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, Thanks!! I tried to apply the patches, both spark-1297-v2.txt and spark-1297-v4.txt are good here, but not

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread arthur.hk.c...@gmail.com
Hi, Thanks! patch -p0 -i spark-1297-v5.txt patching file docs/building-with-maven.md patching file examples/pom.xml Hunk #1 FAILED at 45. Hunk #2 FAILED at 110. 2 out of 2 hunks FAILED -- saving rejects to file examples/pom.xml.rej Still got errors. Regards Arthur On 14 Sep, 2014, at 11:33

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread arthur.hk.c...@gmail.com
Hi, My bad. Tried again, worked. patch -p0 -i spark-1297-v5.txt patching file docs/building-with-maven.md patching file examples/pom.xml Thanks! Arthur On 14 Sep, 2014, at 11:38 pm, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, Thanks! patch -p0 -i spark-1297-v5.txt

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread Ted Yu
I applied the patch on master branch without rejects. If you use spark 1.0.2, use pom.xml attached to the JIRA. On Sun, Sep 14, 2014 at 8:38 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, Thanks! patch -p0 -i spark-1297-v5.txt patching file docs/building-with-maven.md

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread arthur.hk.c...@gmail.com
Hi, I applied the patch. 1) patched $ patch -p0 -i spark-1297-v5.txt patching file docs/building-with-maven.md patching file examples/pom.xml 2) Compilation result [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread Ted Yu
Take a look at bin/run-example Cheers On Sun, Sep 14, 2014 at 9:15 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I applied the patch. 1) patched $ patch -p0 -i spark-1297-v5.txt patching file docs/building-with-maven.md patching file examples/pom.xml 2)

Re: Dependency Problem with Spark / ScalaTest / SBT

2014-09-14 Thread Dean Wampler
Can you post your whole SBT build file(s)? Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Sep 10, 2014 at 6:48

Re: Dependency Problem with Spark / ScalaTest / SBT

2014-09-14 Thread Dean Wampler
Sorry, I meant any *other* SBT files. However, what happens if you remove the line: exclude(org.eclipse.jetty.orbit, javax.servlet) dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com

failed to run SimpleApp locally on macbook

2014-09-14 Thread Gary Zhao
Hello I'm new to Spark and I couldn't make the SimpleApp run on my macbook. I feel it's related to network configuration. Could anyone take a look? Thanks. 14/09/14 10:10:36 INFO Utils: Fetching http://10.63.93.115:59005/jars/simple-project_2.11-1.0.jar to

Re: HBase 0.96+ with Spark 1.0+

2014-09-14 Thread Reinis Vicups
I did actually try Seans suggestion just before I posted for the first time in this thread. I got an error when doing this and thought that I am not understanding what Sean was suggesting. Now I re-attempted your suggestions with spark 1.0.0-cdh5.1.0, hbase 0.98.1-cdh5.1.0 and hadoop

Re: Broadcast error

2014-09-14 Thread Chengi Liu
How? Example please.. Also, if I am running this in pyspark shell.. how do i configure spark.akka.frameSize ?? On Sun, Sep 14, 2014 at 7:43 AM, Akhil Das ak...@sigmoidanalytics.com wrote: When the data size is huge, you better of use the torrentBroadcastFactory. Thanks Best Regards On

Re: compiling spark source code

2014-09-14 Thread Matei Zaharia
I've seen the file name too long error when compiling on an encrypted Linux file system -- some of them have a limit on file name lengths. If you're on Linux, can you try compiling inside /tmp instead? Matei On September 13, 2014 at 10:03:14 PM, Yin Huai (huaiyin@gmail.com) wrote: Can you

Re: Broadcast error

2014-09-14 Thread Chengi Liu
And when I use sparksubmit script, I get the following error: py4j.protocol.Py4JJavaError: An error occurred while calling o26.trainKMeansModel. : org.apache.spark.SparkException: Job aborted due to stage failure: All masters are unresponsive! Giving up. at

Re: When does Spark switch from PROCESS_LOCAL to NODE_LOCAL or RACK_LOCAL?

2014-09-14 Thread Brad Miller
Hi Andrew, I agree with Nicholas. That was a nice, concise summary of the meaning of the locality customization options, indicators and default Spark behaviors. I haven't combed through the documentation end-to-end in a while, but I'm also not sure that information is presently represented

Re: spark-1.1.0 with make-distribution.sh problem

2014-09-14 Thread Patrick Wendell
Yeah that issue has been fixed by adding better docs, it just didn't make it in time for the release: https://github.com/apache/spark/blob/branch-1.1/make-distribution.sh#L54 On Thu, Sep 11, 2014 at 11:57 PM, Zhanfeng Huo huozhanf...@gmail.com wrote: resolved: ./make-distribution.sh --name

Alternative to spark.executor.extraClassPath ?

2014-09-14 Thread innowireless TaeYun Kim
Hi, On Spark Configuration document, spark.executor.extraClassPath is regarded as a backwards-compatibility option. It also says that users typically should not need to set this option. Now, I must add a classpath to the executor environment (as well as to the driver in the future, but for

Re: Re: spark-1.1.0 with make-distribution.sh problem

2014-09-14 Thread Zhanfeng Huo
Thank you very much. It is helpful for end users. Zhanfeng Huo From: Patrick Wendell Date: 2014-09-15 10:19 To: Zhanfeng Huo CC: user Subject: Re: spark-1.1.0 with make-distribution.sh problem Yeah that issue has been fixed by adding better docs, it just didn't make it in time for the

PathFilter for newAPIHadoopFile?

2014-09-14 Thread Eric Friedman
Hi, I have a directory structure with parquet+avro data in it. There are a couple of administrative files (.foo and/or _foo) that I need to ignore when processing this data or Spark tries to read them as containing parquet content, which they do not. How can I set a PathFilter on the

Re: Broadcast error

2014-09-14 Thread Davies Liu
Hey Chengi, What's the version of Spark you are using? It have big improvements about broadcast in 1.1, could you try it? On Sun, Sep 14, 2014 at 8:29 PM, Chengi Liu chengi.liu...@gmail.com wrote: Any suggestions.. I am really blocked on this one On Sun, Sep 14, 2014 at 2:43 PM, Chengi Liu

Re: Broadcast error

2014-09-14 Thread Chengi Liu
I am using spark1.0.2. This is my work cluster.. so I can't setup a new version readily... But right now, I am not using broadcast .. conf = SparkConf().set(spark.executor.memory, 32G).set(spark.akka.frameSize, 1000) sc = SparkContext(conf = conf) rdd = sc.parallelize(matrix,5) from

Re: Broadcast error

2014-09-14 Thread Chengi Liu
And the thing is code runs just fine if I reduce the number of rows in my data? On Sun, Sep 14, 2014 at 8:45 PM, Chengi Liu chengi.liu...@gmail.com wrote: I am using spark1.0.2. This is my work cluster.. so I can't setup a new version readily... But right now, I am not using broadcast ..

Re: Use Case of mutable RDD - any ideas around will help.

2014-09-14 Thread Evan Chan
SPARK-1671 looks really promising. Note that even right now, you don't need to un-cache the existing table. You can do something like this: newAdditionRdd.registerTempTable(table2) sqlContext.cacheTable(table2) val unionedRdd = sqlContext.table(table1).unionAll(sqlContext.table(table2)) When