JetS3T settings spark

2014-12-30 Thread durga
I am not sure , the way I can pass jets3t.properties file for spark-submit. --file option seems not working. can some one please help me. My production spark jobs get hung up when reading s3 file sporadically. Thanks, -D -- View this message in context:

Re: JetS3T settings spark

2014-12-30 Thread durga katakam
in a Maven or SBT project, and check that it makes it into the JAR using jar tf yourfile.jar. Matei On Dec 30, 2014, at 4:21 PM, durga durgak...@gmail.com wrote: I am not sure , the way I can pass jets3t.properties file for spark-submit. --file option seems not working. can some one please

Re: S3 files , Spark job hungsup

2014-12-23 Thread durga
Hi All , It seems problem is little more complicated. If the job is hungup on reading s3 file.even if I kill the unix process that started the job, it is not killing spark-job. It is still hung up there. Now the questions are : How do I find spark-job based on the name? How do I kill the

Re: S3 files , Spark job hungsup

2014-12-22 Thread durga katakam
because I open a few hundreds of files on s3 to read from one node. It just block itself without error until timeout later. On Monday, December 22, 2014, durga durgak...@gmail.com wrote: Hi All, I am facing a strange issue sporadically. occasionally my spark job is hungup on reading s3 files

Re: Spark in Standalone mode

2014-12-22 Thread durga
Please check your spark version and hadoop version in your mvn as well as local spark setup. If hadoop versions not matching you might get this issue. Thanks, -D -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-in-Standalone-mode-tp20780p20815.html

Re: java.sql.SQLException: No suitable driver found

2014-12-21 Thread durga
Hi All, I tried to make combined.jar in shell script . it is working when I am using spark-shell. But for the spark-submit it is same issue. Help is highly appreciated. Thanks -D -- View this message in context:

Re: java.sql.SQLException: No suitable driver found

2014-12-21 Thread durga
One more question. How would I submit additional jars to the spark-submit job. I used --jars option, it seems it is not working as explained earlier. Thanks for the help, -D -- View this message in context:

S3 files , Spark job hungsup

2014-12-21 Thread durga
Hi All, I am facing a strange issue sporadically. occasionally my spark job is hungup on reading s3 files. It is not throwing exception . or making some progress, it is just hungs up there. Is this a known issue , Please let me know how could I solve this issue. Thanks, -D -- View this

java.sql.SQLException: No suitable driver found

2014-12-19 Thread durga
Hi I am facing an issue with mysql jars with spark-submit. I am not running in yarn mode. spark-submit --jars $(echo mysql-connector-java-5.1.34-bin.jar | tr ' ' ',') --class com.abc.bcd.GetDBSomething myjar.jar abc bcd Any help is really appreciated. Thanks, -D 14/12/19 23:42:10 INFO

Re: S3 globbing

2014-12-17 Thread durga katakam
wrote: Did you try something like: //Get the last hour val d = (System.currentTimeMillis() - 3600 * 1000) val ex = abc_ + d.toString().substring(0,7) + *.json [image: Inline image 1] Thanks Best Regards On Wed, Dec 17, 2014 at 5:05 AM, durga durgak...@gmail.com wrote: Hi All, I need

S3 globbing

2014-12-16 Thread durga
Hi All, I need help with regex in my sc.textFile() I have lots of files with with epoch millisecond timestamp. ex:abc_1418759383723.json Now I need to consume last one hour files using the epoch time stamp as mentioned above. I tried couple of options , nothing seems working for me. If any

Spark exception while reading different inputs

2014-08-20 Thread durga
Hi I am using using below program in spark-shell to load and filter data from the data sets. I am getting exceptions if I run the programs for multiple times, If I restart the shell it is working fine. 1) please let me know what I am doing wrong. 2) Also is there a way to make the program better

Re: How could I start new spark cluster with hadoop2.0.2

2014-07-23 Thread durga
Thanks Akhil -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-could-I-start-new-spark-cluster-with-hadoop2-0-2-tp10450p10514.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How could I start new spark cluster with hadoop2.0.2

2014-07-23 Thread durga
Hi It seems I can only give --hadoop-major-version=2 . it is taking 2.0.0. How could I say it should use 2.0.2 is there any --hadoop-minor-version variable I can use? Thanks, D. -- View this message in context:

persistent HDFS instance for cluster restarts/destroys

2014-07-23 Thread durga
Hi All, I have a question, For my company , we are planning to use spark-ec2 scripts to create cluster for us. I understand that , persistent HDFS will make the hdfs available for cluster restarts. Question is: 1) What happens , If I destroy and re-create , do I loose the data. a) If I

Re: persistent HDFS instance for cluster restarts/destroys

2014-07-23 Thread durga
Thanks Mayur. is there any documentation/readme with step by step process available for adding or deleting nodes? Thanks, D. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/persistent-HDFS-instance-for-cluster-restarts-destroys-tp10551p10565.html Sent from

RE: Joining by timestamp.

2014-07-22 Thread durga
Thanks Chen -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Joining-by-timestamp-tp10367p10449.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

How could I start new spark cluster with hadoop2.0.2

2014-07-22 Thread durga
Hi, I am trying to create spark cluster using spark-ec2 file under spark1.0.1 directory. 1) I noticed that It is always creating hadoop version 1.0.4.Is there a way I can override that?I would like to have hadoop2.0.2 2) I also wants install Oozie along with. Is there any scrips available along

Joining by timestamp.

2014-07-21 Thread durga
Hi I have peculiar problem, I have two data sets (large ones) . Data set1: ((timestamp),iterable[Any]) = { (2014-07-10T00:02:45.045+,ArrayBuffer((2014-07-10T00:02:45.045+,98.4859,22))) (2014-07-10T00:07:32.618+,ArrayBuffer((2014-07-10T00:07:32.618+,75.4737,22))) } DataSet2:

RE: Joining by timestamp.

2014-07-21 Thread durga
Hi Chen, I am new to the Spark as well as SparkSQL , could you please explain how would I create a table and run query on top of it.That would be super helpful. Thanks, D. -- View this message in context:

RE: Joining by timestamp.

2014-07-21 Thread durga
Hi Chen, Thank you very much for your reply. I think I do not understand how can I do the join using spark api. If you have time , could you please write some code . Thanks again, D. -- View this message in context:

Re: Java null pointer exception while saving hadoop file

2014-07-19 Thread durga
Thanks for the reply. I am trying to save a huge file in my case it is 60GB. I think l.toSeq is going to collect all the data into the driver , where I don't have that much space . Is there any possibility using something like multipleoutput format class etc for a large file. Thanks, Durga

Java null pointer exception while saving hadoop file

2014-07-18 Thread durga
it , But for larger files I am getting heap space error . I am thinking it is due to take . Can some please help me with this. Thanks, Durga import org.apache.spark.SparkContext._ val conf = new SparkConf() .setMaster(master) .setAppName(appName) .set(spark.cores.max, numCores