Re: Starting sparkthrift server

2015-03-23 Thread Denny Lee
In that case, can you use the configurations to specify the folders? I'm wondering if this actually Hive in play here and somehow the /tmp/spark-events is being specified for the logs for Hive? On Mon, Mar 23, 2015 at 2:00 PM Anubhav Agarwal wrote: > When I start spark-shell (for example) it do

Re: Review request for SPARK-6112:Provide OffHeap support through HDFS RAM_DISK

2015-03-23 Thread Zhan Zhang
Thanks Reynold, Agree with you to open another JIRA to unify the block storage API. I have upload the design doc to SPARK-6479 as well. Thanks. Zhan Zhang On Mar 23, 2015, at 4:03 PM, Reynold Xin mailto:r...@databricks.com>> wrote: I created a ticket to separate the API refactoring from the

Re: Review request for SPARK-6112:Provide OffHeap support through HDFS RAM_DISK

2015-03-23 Thread Reynold Xin
I created a ticket to separate the API refactoring from the implementation. Would be great to have these as two separate patches to make it easier to review (similar to the way we are doing RPC refactoring -- first introducing an internal RPC api, port akka to it, and then add an alternative implem

Re: Spark-thriftserver Issue

2015-03-23 Thread Zhan Zhang
Probably the port is already used by others, e.g., hive. You can change the port similar to below ./sbin/start-thriftserver.sh --master yarn --executor-memory 512m --hiveconf hive.server2.thrift.port=10001 Thanks. Zhan Zhang On Mar 23, 2015, at 12:01 PM, Neil Dev mailto:neilk...@gmail.com>

Review request for SPARK-6112:Provide OffHeap support through HDFS RAM_DISK

2015-03-23 Thread Zhan Zhang
Hi Folks, I am planning to implement hdfs off heap support for spark, and have uploaded the design doc for the off heap support through hdfs ramdisk in jira SPARK-6112. Please review it and provide your feedback if anybody are interested. https://issues.apache.org/jira/browse/SPARK-6112 Thank

Re: Shuffle Spill Memory and Shuffle Spill Disk

2015-03-23 Thread Bijay Pathak
It looks this is not the right place for this question, I have send the question to user group. thank you, bijay On Mon, Mar 23, 2015 at 2:25 PM, Bijay Pathak wrote: > Hello, > > I am running TeraSort on > 100GB of data. The final metrics I am getting

Re: enum-like types in Spark

2015-03-23 Thread Imran Rashid
well, perhaps I overstated things a little, I wouldn't call it the "official" solution, just a recommendation in the never-ending debate (and the recommendation from folks with their hands on scala itself). Even if we do get this fixed in scaladoc eventually -- as its not in the current versions,

Shuffle Spill Memory and Shuffle Spill Disk

2015-03-23 Thread Bijay Pathak
Hello, I am running TeraSort on 100GB of data. The final metrics I am getting on Shuffle Spill are: Shuffle Spill(Memory): 122.5 GB Shuffle Spill(Disk): 3.4 GB What's the difference and relation between these two metrics? Does these mean 122.5 GB was s

Re: enum-like types in Spark

2015-03-23 Thread Reynold Xin
If scaladoc can show the Java enum types, I do think the best way is then just Java enum types. On Mon, Mar 23, 2015 at 2:11 PM, Patrick Wendell wrote: > If the official solution from the Scala community is to use Java > enums, then it seems strange they aren't generated in scaldoc? Maybe > we

Re: enum-like types in Spark

2015-03-23 Thread Patrick Wendell
If the official solution from the Scala community is to use Java enums, then it seems strange they aren't generated in scaldoc? Maybe we can just fix that w/ Typesafe's help and then we can use them. On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen wrote: > Yeah the fully realized #4, which gets back t

Re: enum-like types in Spark

2015-03-23 Thread Aaron Davidson
The only issue I knew of with Java enums was that it does not appear in the Scala documentation. On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen wrote: > Yeah the fully realized #4, which gets back the ability to use it in > switch statements (? in Scala but not Java?) does end up being kind of > hug

Spark Executor resources

2015-03-23 Thread Zoltán Zvara
Let's say I'm an Executor instance in a Spark system. Who started me and where, when I run on a worker node supervised by (a) Mesos, (b) YARN? I suppose I'm the only one Executor on a worker node for a given framework scheduler (driver). If I'm an Executor instance, who is the closest object to me

Re: Starting sparkthrift server

2015-03-23 Thread Anubhav Agarwal
When I start spark-shell (for example) it does not write to the /tmp/spark-events folder. It remains empty. I have even tried it after giving that folder rwx permission for user, group and others. Neil's colleague, Anu On Mon, Mar 23, 2015 at 4:50 PM, Denny Lee wrote: > When you say the job has

Fwd: hadoop input/output format advanced control

2015-03-23 Thread Koert Kuipers
see email below. reynold suggested i send it to dev instead of user -- Forwarded message -- From: Koert Kuipers Date: Mon, Mar 23, 2015 at 4:36 PM Subject: hadoop input/output format advanced control To: "u...@spark.apache.org" currently its pretty hard to control the Hadoop In

Re: Starting sparkthrift server

2015-03-23 Thread Denny Lee
When you say the job has access, do you mean that when you run spark-submit or spark-shell (for example), it is able to write to the /tmp/spark-events folder? On Mon, Mar 23, 2015 at 1:02 PM Neil Dev wrote: > we are running this right now as root user and the folder /tmp/spark-events > was manu

Re: enum-like types in Spark

2015-03-23 Thread Sean Owen
Yeah the fully realized #4, which gets back the ability to use it in switch statements (? in Scala but not Java?) does end up being kind of huge. I confess I'm swayed a bit back to Java enums, seeing what it involves. The hashCode() issue can be 'solved' with the hash of the String representation.

Re: enum-like types in Spark

2015-03-23 Thread Imran Rashid
I've just switched some of my code over to the new format, and I just want to make sure everyone realizes what we are getting into. I went from 10 lines as java enums https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/Stage

Re: Starting sparkthrift server

2015-03-23 Thread Neil Dev
we are running this right now as root user and the folder /tmp/spark-events was manually created and the Job has access to this folder On Mon, Mar 23, 2015 at 3:38 PM, Denny Lee wrote: > It appears that you are running the thrift-server using the spark-events > account but the /tmp/spark-events

Re: Starting sparkthrift server

2015-03-23 Thread Denny Lee
It appears that you are running the thrift-server using the spark-events account but the /tmp/spark-events folder doesn't exist or the user running thrift-server does not have access to it. Have you been able to run Hive using the spark-events user so that way the /tmp/spark-events folder has been

Spark-thriftserver Issue

2015-03-23 Thread Neil Dev
Hi, I am having issue starting spark-thriftserver. I'm running spark 1.3.with Hadoop 2.4.0. I would like to be able to change its port too so, I can hive hive-thriftserver as well as spark-thriftserver running at the same time. Starting sparkthrift server:- sudo ./start-thriftserver.sh --master s

Starting sparkthrift server

2015-03-23 Thread Neil Dev
Hi, I am having issues starting spark-thriftserver. I'm running spark 1.3.o with Hadoop 2.4.0. I would like to be able to change its port too so, I can hive hive-thriftserver as well as spark-thriftserver running at the same time. Starting sparkthrift server:- sudo ./start-thriftserver.sh --maste

Re: Directly broadcasting (sort of) RDDs

2015-03-23 Thread Sean Owen
Since RDDs aren't designed as random-access maps, and are basically bits of bookkeeping that make sense only on the driver, I think the realization of something like this in Spark would realistically be "collect RDD to local data structure" if anything. It sounds like you're looking for a distribu

Re: Directly broadcasting (sort of) RDDs

2015-03-23 Thread Guillaume Pitel
Not far, but not exactly. The RDD could be too big to fit in memory, The idea is more like a worker-side rdd.lookup() with local cache. Guillaume In a sentence, is this the idea of collecting an RDD to memory on each executor directly? On Sun, Mar 22, 2015 at 10:56 PM, Sandy Ryza wrote: Hi G

Re: lower&upperBound not working/spark 1/3

2015-03-23 Thread Marek Wiewiorka
Ok- thanks Michael I will do another series of tests to confirm this and then report an issue. Regards, Marek 2015-03-22 22:19 GMT+01:00 Michael Armbrust : > I have not heard this reported yet, but your invocation looks correct to > me. Can you open a JIRA? > > On Sun, Mar 22, 2015 at 8:39 AM,

Re: Spark scheduling, data locality

2015-03-23 Thread Hui WANG
Hello Zoltan, I'm a spark beginner but i think that the locality preferences should be prepared before the sending of tasks. One important element of a RDD is the metadata on the scheme and location of its partitions. Tasks created in the driver program should be based on this information. I'm al

Spark Sql with python udf fail

2015-03-23 Thread lonely Feb
Hi all, I tried to transfer some hive jobs into spark-sql. When i ran a sql job with python udf i got a exception: java.lang.ArrayIndexOutOfBoundsException: 9 at org.apache.spark.sql.catalyst.expressions.GenericRow.apply(Row.scala:142) at org.apache.spark.sql.catalyst.expressions.B