Re: SparkLauncher is blocked until mail process is killed.

2015-10-29 Thread Jey Kottalam
Could you please provide the jstack output? That would help the devs identify the blocking operation more easily. On Thu, Oct 29, 2015 at 6:54 PM, 陈宇航 wrote: > I tried to use SparkLauncher (org.apache.spark.launcher.SparkLauncher) to > submit a Spark Streaming job, however, in my test, the Spark

Re: Running Spark 1.4.1 without Hadoop

2015-06-29 Thread Jey Kottalam
rgetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) > at java.lang

Re: Running Spark 1.4.1 without Hadoop

2015-06-29 Thread Jey Kottalam
t; > Regards, > Sourav > > On Mon, Jun 29, 2015 at 1:13 PM, Jey Kottalam wrote: > >> Hi Sourav, >> >> The error seems to be caused by the fact that your URL starts with >> "file://" instead of "file:///". >> >> Also, I believ

Re: Running Spark 1.4.1 without Hadoop

2015-06-29 Thread Jey Kottalam
gt; at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)

Re: Running Spark 1.4.1 without Hadoop

2015-06-29 Thread Jey Kottalam
Actually, Hadoop InputFormats can still be used to read and write from "file://", "s3n://", and similar schemes. You just won't be able to read/write to HDFS without installing Hadoop and setting up an HDFS cluster. To summarize: Sourav, you can use any of the prebuilt packages (i.e. anything othe

Re: Get importerror when i run pyspark with ipython=1

2015-02-26 Thread Jey Kottalam
Hi Sourabh, could you try it with the stable 2.4 version of IPython? On Thu, Feb 26, 2015 at 8:54 PM, sourabhguha wrote: > > > I get the above error when I try to run pyspark with the ipython option. I > do not ge

Re: reduceByKey vs countByKey

2015-02-24 Thread Jey Kottalam
Hi Sathish, The current implementation of countByKey uses reduceByKey: https://github.com/apache/spark/blob/v1.2.1/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L332 It seems that countByKey is mostly deprecated: https://issues.apache.org/jira/browse/SPARK-3994 -Jey On Tue, Fe

Extracting values from a Collecion

2014-11-21 Thread Jey Kottalam
Hi Sanjay, These are instances of the standard Scala collection type "Set", and its documentation can be found by googling the phrase "scala set". Hope that helps, -Jey On Fri, Nov 21, 2014 at 10:41 AM, Sanjay Subramanian wrote: > hey guys > > names.txt > = > 1,paul > 2,john > 3,george

Re: Is it possible to use Parquet with Dremel encoding

2014-09-25 Thread Jey Kottalam
Hi Matthes, You may find the following blog post relevant: http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/ Hope that helps, -Jey On Thu, Sep 25, 2014 at 5:05 PM, matthes wrote: > Hi again! > > At the moment I try to use parquet and I want to keep the data into the > memory in an effi

Re: RDD pipe example. Is this a bug or a feature?

2014-09-19 Thread Jey Kottalam
of data -> spark -> down stream system not implemented in spark > > After seeing your example code I now understand how the stdin and stdout get > configured. > > It seem like pipe() does not work the way I want. I guess I could open a > socket and write to the down stream

Re: RDD pipe example. Is this a bug or a feature?

2014-09-19 Thread Jey Kottalam
Hi Andy, That's a feature -- you'll have to print out the return value from collect() if you want the contents to show up on stdout. Probably something like this: for(Iterator iter = rdd.pipe(pwd + "/src/main/bin/RDDPipe.sh").collect().iterator(); iter.hasNext();) System.out.println(iter.next

Re: MLlib - Possible to use SVM with Radial Basis Function kernel rather than Linear Kernel?

2014-09-18 Thread Jey Kottalam
Hi Aris, A simple approach to gaining some of the benefits of an RBF kernel is to add synthetic features to your training set. For example, if your original data consists of 3-dimensional vectors [x, y, z], you could compute a new 9-dimensional feature vector containing [x, y, z, x^2, y^2, z^2, xy

Re: EC2 instances missing SSD drives randomly?

2014-08-19 Thread Jey Kottalam
I think you have to explicitly list the ephemeral disks in the device map when launching the EC2 instance. http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/block-device-mapping-concepts.html On Tue, Aug 19, 2014 at 11:54 AM, Andras Barjak wrote: > Hi, > > Using the spark 1.0.1 ec2 script I lau

Re: Anaconda Spark AMI

2014-07-03 Thread Jey Kottalam
Hi Ben, Has the PYSPARK_PYTHON environment variable been set in spark/conf/spark-env.sh to the path of the new python binary? FYI, there's a /root/copy-dirs script that can be handy when updating files on an already-running cluster. You'll want to restart the spark cluster for the changes to take

Re: Executors not utilized properly.

2014-06-17 Thread Jey Kottalam
Hi Abhishek, > Where mapreduce is taking 2 mins, spark is taking 5 min to complete the job. Interesting. Could you tell us more about your program? A "code skeleton" would certainly be helpful. Thanks! -Jey On Tue, Jun 17, 2014 at 3:21 PM, abhiguruvayya wrote: > I did try creating more part

Re: Local file being refrenced in mapper function

2014-05-30 Thread Jey Kottalam
Hi Rahul, Marcelo's explanation is correct. Here's a possible approach to your program, in pseudo-Python: # connect to Spark cluster sc = SparkContext(...) # load input data input_data = load_xls(file("input.xls")) input_rows = input_data['Sheet1'].rows # create RDD on cluster input_rdd = sc.p

Re: help

2014-04-25 Thread Jey Kottalam
Sorry, but I don't know where Cloudera puts the executor log files. Maybe their docs give the correct path? On Fri, Apr 25, 2014 at 12:32 PM, Joe L wrote: > hi thank you for your reply but I could not find it. it says that no such > file or directory > > >

Re: help

2014-04-25 Thread Jey Kottalam
Try taking a look at the stderr logs of the executor "app-20140426030946-0004/8". This should be in the $SPARK_HOME/work directory of the corresponding machine. Hope that helps, -Jey On Fri, Apr 25, 2014 at 11:17 AM, Joe L wrote: > I need someone's help please I am getting the following error. >