Re: Spark Job not using all nodes in cluster

2015-05-20 Thread Shailesh Birari
May 2015 13:16, Shailesh Birari sbirar...@gmail.com wrote: Hi, I have a 4 node Spark 1.3.1 cluster. All four nodes have 4 cores and 64 GB of RAM. I have around 600,000+ Json files on HDFS. Each file is small around 1KB in size. Total data is around 16GB. Hadoop block size is 256MB. My

Spark Job not using all nodes in cluster

2015-05-19 Thread Shailesh Birari
Hi, I have a 4 node Spark 1.3.1 cluster. All four nodes have 4 cores and 64 GB of RAM. I have around 600,000+ Json files on HDFS. Each file is small around 1KB in size. Total data is around 16GB. Hadoop block size is 256MB. My application reads these files with sc.textFile() (or sc.jsonFile()

Spark SQL Self join with agreegate

2015-03-19 Thread Shailesh Birari
Hello, I want to use Spark sql to aggregate some columns of the data. e.g. I have huge data with some columns as: time, src, dst, val1, val2 I want to calculate sum(val1) and sum(val2) for all unique pairs of src and dst. I tried by forming SQL query SELECT a.time, a.src, a.dst,

Re: Spark 1.2 – How to change Default (Random) port ….

2015-03-15 Thread Shailesh Birari
Hi SM, Apologize for delayed response. No, the issue is with Spark 1.2.0. There is a bug in Spark 1.2.0. Recently Spark have latest 1.3.0 release so it might have fixed in it. I am not planning to test it soon, may be after some time. You can try for it. Regards, Shailesh -- View this

Re: Spark 1.2 – How to change Default (Random) port ….

2015-01-26 Thread Shailesh Birari
. On Sun, Jan 25, 2015 at 6:28 PM, Shailesh Birari sbirar...@gmail.com wrote: Can anyone please let me know ? I don't want to open all ports on n/w. So, am interested in the property by which this new port I can configure. Shailesh -- View this message in context: http://apache-spark-user

Re: Spark 1.2 – How to change Default (Random) port ….

2015-01-25 Thread Shailesh Birari
Can anyone please let me know ? I don't want to open all ports on n/w. So, am interested in the property by which this new port I can configure. Shailesh -- View this message in context:

Spark 1.2 – How to change Default (Random) port ….

2015-01-21 Thread Shailesh Birari
Hello, Recently, I have upgraded my setup to Spark 1.2 from Spark 1.1. I have 4 node Ubuntu Spark Cluster. With Spark 1.1, I used to write Spark Scala program in Eclipse on my Windows development host and submit the job on Ubuntu Cluster, from Eclipse (Windows machine). As on my network not all

Spark 1.2 - com/google/common/base/Preconditions java.lang.NoClassDefFoundErro

2015-01-20 Thread Shailesh Birari
Hello, I recently upgraded my setup from Spark 1.1 to Spark 1.2. My existing applications are working fine on ubuntu cluster. But, when I try to execute Spark MLlib application from Eclipse (Windows node) it gives java.lang.NoClassDefFoundError: com/google/common/base/Preconditions exception.

Re: Spark 1.2 - com/google/common/base/Preconditions java.lang.NoClassDefFoundErro

2015-01-20 Thread Shailesh Birari
version from the past few years). On Tue, Jan 20, 2015 at 6:16 PM, Shailesh Birari sbirar...@gmail.com wrote: Hi Frank, Its a normal eclipse project where I added Scala and Spark libraries as user libraries. Though, I am not attaching any hadoop libraries, in my application code I have

Re: Spark 1.2 - com/google/common/base/Preconditions java.lang.NoClassDefFoundErro

2015-01-20 Thread Shailesh Birari
, but if you can pull up your builds dependency tree, you will likely find com.google.guava being brought in by one of your dependencies. Regards, Frank Austin Nothaft fnoth...@berkeley.edu fnoth...@eecs.berkeley.edu 202-340-0466 On Jan 20, 2015, at 5:13 PM, Shailesh Birari sbirar

Re: Spark 1.2 - com/google/common/base/Preconditions java.lang.NoClassDefFoundErro

2015-01-20 Thread Shailesh Birari
like you are mixing versions of Spark then, with some that still refer to unshaded Guava. Make sure you are not packaging Spark with your app and that you don't have other versions lying around. On Tue, Jan 20, 2015 at 11:55 PM, Shailesh Birari sbirar...@gmail.com wrote: Hello, I recently

Re: Spark SQL takes unexpected time

2014-11-03 Thread Shailesh Birari
Yes, I am using Spark1.1.0 and have used rdd.registerTempTable(). I tried by adding sqlContext.cacheTable(), but it took 59 seconds (more than earlier). I also tried by changing schema to use Long data type in some fields but seems conversion takes more time. Is there any way to specify index ?

Spark SQL takes unexpected time

2014-11-02 Thread Shailesh Birari
Hello, I have written an Spark SQL application which reads data from HDFS and query on it. The data size is around 2GB (30 million records). The schema and query I am running is as below. The query takes around 05+ seconds to execute. I tried by adding

Re: Submitting Spark job on Unix cluster from dev environment (Windows)

2014-10-29 Thread Shailesh Birari
Thanks by setting driver host to Windows and specifying some ports (like driver, fileserver, broadcast etc..) it worked perfectly. I need to specify those ports as not all ports are open on my machine. For, driver host name, I was assuming Spark should get it, as in case of linux we are not

Re: Submitting Spark job on Unix cluster from dev environment (Windows)

2014-10-28 Thread Shailesh Birari
Can anyone please help me here ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-tp16989p17552.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Submiting Spark application through code

2014-10-28 Thread Shailesh Birari
Yes, this is doable. I am submitting the Spark job using JavaSparkContext spark = new JavaSparkContext(sparkMaster, app name, System.getenv(SPARK_HOME), new String[] {application JAR}); To run this first you have to create the application jar and in above API specify its

Re: Submitting Spark job on cluster from dev environment

2014-10-27 Thread Shailesh Birari
Hello, I am able to submit Job on Spark cluster from Windows desktop. But the executors are not able to run. When I check the Spark UI (which is on Windows, as Driver is there) it shows me JAVA_HOME, CLASS_PATH and other environment variables related to Windows. I tried by setting

Re: Submitting Spark job on cluster from dev environment

2014-10-27 Thread Shailesh Birari
Some more update. Now, I tried with by setting spark.driver.host to Spark Master node and spark.driver.port to 51800 (available open port), but its failing with bind error. I was hoping that it will start the driver on supplied host:port and as its unix node there should not be any issue. Can

Spark Streaming - How to write RDD's in same directory ?

2014-10-21 Thread Shailesh Birari
Hello, Spark 1.1.0, Hadoop 2.4.1 I have written a Spark streaming application. And I am getting FileAlreadyExistsException for rdd.saveAsTextFile(outputFolderPath). Here is brief what I am is trying to do. My application is creating text file stream using Java Stream context. The input file is

Re: Spark Streaming - How to write RDD's in same directory ?

2014-10-21 Thread Shailesh Birari
Thanks Sameer for quick reply. I will try to implement it. Shailesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-How-to-write-RDD-s-in-same-directory-tp16962p16970.html Sent from the Apache Spark User List mailing list archive at

Re: java.lang.OutOfMemoryError while running SVD MLLib example

2014-09-25 Thread Shailesh Birari
Hi Xianguri, After setting SVD to smaller value (200) its working. Thanks, Shailesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-while-running-SVD-MLLib-example-tp14972p15179.html Sent from the Apache Spark User List

Re: java.lang.OutOfMemoryError while running SVD MLLib example

2014-09-24 Thread Shailesh Birari
Note, the data is random numbers (double). Any suggestions/pointers will be highly appreciated. Thanks, Shailesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-while-running-SVD-MLLib-example-tp14972p15083.html Sent from the