I am not sure , the way I can pass jets3t.properties file for spark-submit.
--file option seems not working.
can some one please help me. My production spark jobs get hung up when
reading s3 file sporadically.
Thanks,
-D
--
View this message in context:
in a Maven or SBT project, and
check that it makes it into the JAR using jar tf yourfile.jar.
Matei
On Dec 30, 2014, at 4:21 PM, durga durgak...@gmail.com wrote:
I am not sure , the way I can pass jets3t.properties file for
spark-submit.
--file option seems not working.
can some one please
Hi All ,
It seems problem is little more complicated.
If the job is hungup on reading s3 file.even if I kill the unix process that
started the job, it is not killing spark-job. It is still hung up there.
Now the questions are :
How do I find spark-job based on the name?
How do I kill the
because I open a few hundreds of files on s3 to read
from one node. It just block itself without error until timeout later.
On Monday, December 22, 2014, durga durgak...@gmail.com wrote:
Hi All,
I am facing a strange issue sporadically. occasionally my spark job is
hungup on reading s3 files
Please check your spark version and hadoop version in your mvn as well as
local spark setup. If hadoop versions not matching you might get this issue.
Thanks,
-D
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-in-Standalone-mode-tp20780p20815.html
Hi All,
I tried to make combined.jar in shell script . it is working when I am using
spark-shell. But for the spark-submit it is same issue.
Help is highly appreciated.
Thanks
-D
--
View this message in context:
One more question.
How would I submit additional jars to the spark-submit job. I used --jars
option, it seems it is not working as explained earlier.
Thanks for the help,
-D
--
View this message in context:
Hi All,
I am facing a strange issue sporadically. occasionally my spark job is
hungup on reading s3 files. It is not throwing exception . or making some
progress, it is just hungs up there.
Is this a known issue , Please let me know how could I solve this issue.
Thanks,
-D
--
View this
Hi I am facing an issue with mysql jars with spark-submit.
I am not running in yarn mode.
spark-submit --jars $(echo mysql-connector-java-5.1.34-bin.jar | tr ' ' ',')
--class com.abc.bcd.GetDBSomething myjar.jar abc bcd
Any help is really appreciated.
Thanks,
-D
14/12/19 23:42:10 INFO
wrote:
Did you try something like:
//Get the last hour
val d = (System.currentTimeMillis() - 3600 * 1000)
val ex = abc_ + d.toString().substring(0,7) + *.json
[image: Inline image 1]
Thanks
Best Regards
On Wed, Dec 17, 2014 at 5:05 AM, durga durgak...@gmail.com wrote:
Hi All,
I need
Hi All,
I need help with regex in my sc.textFile()
I have lots of files with with epoch millisecond timestamp.
ex:abc_1418759383723.json
Now I need to consume last one hour files using the epoch time stamp as
mentioned above.
I tried couple of options , nothing seems working for me.
If any
Hi
I am using using below program in spark-shell to load and filter data from
the data sets. I am getting exceptions if I run the programs for multiple
times, If I restart the shell it is working fine.
1) please let me know what I am doing wrong.
2) Also is there a way to make the program better
Thanks Akhil
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-could-I-start-new-spark-cluster-with-hadoop2-0-2-tp10450p10514.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi
It seems I can only give --hadoop-major-version=2 . it is taking 2.0.0.
How could I say it should use 2.0.2
is there any --hadoop-minor-version variable I can use?
Thanks,
D.
--
View this message in context:
Hi All,
I have a question,
For my company , we are planning to use spark-ec2 scripts to create cluster
for us.
I understand that , persistent HDFS will make the hdfs available for cluster
restarts.
Question is:
1) What happens , If I destroy and re-create , do I loose the data.
a) If I
Thanks Mayur.
is there any documentation/readme with step by step process available for
adding or deleting nodes?
Thanks,
D.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/persistent-HDFS-instance-for-cluster-restarts-destroys-tp10551p10565.html
Sent from
Thanks Chen
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Joining-by-timestamp-tp10367p10449.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi,
I am trying to create spark cluster using spark-ec2 file under spark1.0.1
directory.
1) I noticed that It is always creating hadoop version 1.0.4.Is there a way
I can override that?I would like to have hadoop2.0.2
2) I also wants install Oozie along with. Is there any scrips available
along
Hi
I have peculiar problem,
I have two data sets (large ones) .
Data set1:
((timestamp),iterable[Any]) = {
(2014-07-10T00:02:45.045+,ArrayBuffer((2014-07-10T00:02:45.045+,98.4859,22)))
(2014-07-10T00:07:32.618+,ArrayBuffer((2014-07-10T00:07:32.618+,75.4737,22)))
}
DataSet2:
Hi Chen,
I am new to the Spark as well as SparkSQL , could you please explain how
would I create a table and run query on top of it.That would be super
helpful.
Thanks,
D.
--
View this message in context:
Hi Chen,
Thank you very much for your reply. I think I do not understand how can I do
the join using spark api. If you have time , could you please write some
code .
Thanks again,
D.
--
View this message in context:
Thanks for the reply.
I am trying to save a huge file in my case it is 60GB. I think l.toSeq is
going to collect all the data into the driver , where I don't have that much
space . Is there any possibility using something like multipleoutput format
class etc for a large file.
Thanks,
Durga
it , But for larger files I am getting heap
space error . I am thinking it is due to take . Can some please help me
with this.
Thanks,
Durga
import org.apache.spark.SparkContext._
val conf = new SparkConf()
.setMaster(master)
.setAppName(appName)
.set(spark.cores.max, numCores
23 matches
Mail list logo