Spark on Kubernetes

2024-04-29 Thread Tarun raghav
Respected Sir/Madam, I am Tarunraghav. I have a query regarding spark on kubernetes. We have an eks cluster, within which we have spark installed in the pods. We set the executor memory as 1GB and set the executor instances as 2, I have also set dynamic allocation as true. So when I try to read a

Re: newbie question about RDD

2016-11-21 Thread Raghav
Sorry I forgot to ask how can I use spark context here ? I have hdfs directory path of the files, as well as the name node of hdfs cluster. Thanks for your help. On Mon, Nov 21, 2016 at 9:45 PM, Raghav wrote: > Hi > > I am extremely new to Spark. I have to read a file form HDFS, a

newbie question about RDD

2016-11-21 Thread Raghav
HDFS is as follows: UUID. FirstName LastName Zip 7462 John Doll06903 5231 Brad Finley 32820 Can someone point me how to get a JavaRDD object by reading the file in HDFS ? Thanks. -- Raghav

Kafka Producer within a docker Instance

2016-11-11 Thread Raghav
both Spark and Kafka, and looking for some pointers to start exploring. Thanks. -- Raghav

Re: Newbie question - Best way to bootstrap with Spark

2016-11-07 Thread Raghav
Thanks a ton, guys. On Sun, Nov 6, 2016 at 4:57 PM, raghav wrote: > I am newbie in the world of big data analytics, and I want to teach myself > Apache Spark, and want to be able to write scripts to tinker with data. > > I have some understanding of Map Reduce but have not had a c

Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread Raghav
and 2016 videos. Regading practice, I would strongly suggest > Databricks cloud (or download prebuilt from spark site). You can also take > courses from EDX/Berkley, which are very good starter courses. > > On Mon, Nov 7, 2016 at 11:57 AM, raghav wrote: > >> I am newbie in the worl

Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread raghav
some guidance for starter material, or videos. Thanks. Raghav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with-Spark-tp28032.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Submitting Spark Applications using Spark Submit

2015-06-20 Thread Raghav Shankar
ttp://ec2-xxx.compute-1.amazonaws.com/10.165.103.16:7077> from. I never specify that in the master url command line parameter. Any ideas on what I might be doing wrong? > On Jun 19, 2015, at 7:19 PM, Andrew Or wrote: > > Hi Raghav, > > I'm assuming you're using stan

Re: Submitting Spark Applications using Spark Submit

2015-06-19 Thread Raghav Shankar
Thanks Andrew! Is this all I have to do when using the spark ec2 script to setup a spark cluster? It seems to be getting an assembly jar that is not from my project(perhaps from a maven repo). Is there a way to make the ec2 script use the assembly jar that I created? Thanks, Raghav On Friday

Re: Implementing top() using treeReduce()

2015-06-17 Thread Raghav Shankar
So, I would add the assembly jar to the just the master or would I have to add it to all the slaves/workers too? Thanks, Raghav > On Jun 17, 2015, at 5:13 PM, DB Tsai wrote: > > You need to build the spark assembly with your modification and deploy > into cluster. > >

Re: Implementing top() using treeReduce()

2015-06-17 Thread Raghav Shankar
setup scripts, it sets up spark, but I think my custom built spark-core jar is not being used. How do it up on EC2 so that my custom version of Spark-core is used? Thanks, Raghav > On Jun 9, 2015, at 7:41 PM, DB Tsai wrote: > > Having the following code in RDD.scala works for me. P

Re: Submitting Spark Applications using Spark Submit

2015-06-16 Thread Raghav Shankar
will upload this jar to YARN cluster automatically > and then you can run your application as usual. > It does not care about which version of Spark in your YARN cluster. > > 2015-06-17 10:42 GMT+08:00 Raghav Shankar >: > >> The documentation says spark.driver.userClassPath

Re: Submitting Spark Applications using Spark Submit

2015-06-16 Thread Raghav Shankar
enough to tell Spark to use that spark-core jar instead of the default? Thanks, Raghav > On Jun 16, 2015, at 7:19 PM, Will Briggs wrote: > > If this is research-only, and you don't want to have to worry about updating > the jars installed by default on the cluster, you can add

Re: Submitting Spark Applications using Spark Submit

2015-06-16 Thread Raghav Shankar
would be very useful. Thanks, Raghav > On Jun 16, 2015, at 6:57 PM, Will Briggs wrote: > > In general, you should avoid making direct changes to the Spark source code. > If you are using Scala, you can seamlessly blend your own methods on top of > the base RDDs using impli

Re: Different Sorting RDD methods in Apache Spark

2015-06-09 Thread Raghav Shankar
entire data and collecting it on the driver node is not a typical use case? If I want to do this using sortBy(), I would first call sortBy() followed by a collect(). Collect() would involve gathering all the data on a single machine as well. Thanks, Raghav On Tuesday, June 9, 2015, Mark Hamstra wrote

Re: TreeReduce Functionality in Spark

2015-06-04 Thread Raghav Shankar
y, > > DB Tsai > --- > Blog: https://www.dbtsai.com > > > On Thu, Jun 4, 2015 at 10:46 AM, Raghav Shankar > wrote: > > Hey Reza, > > > > Thanks for your response! > > > > Your response clarifies some of my initi

Re: TreeReduce Functionality in Spark

2015-06-04 Thread Raghav Shankar
d you provide some insight into this? Thanks, Raghav On Thursday, June 4, 2015, Reza Zadeh wrote: > In a regular reduce, all partitions have to send their reduced value to a > single machine, and that machine can become a bottleneck. > > In a treeReduce, the partitions talk to each other

Re: Task result in Spark Worker Node

2015-04-17 Thread Raghav Shankar
.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) > On Apr 17, 2015, at 2:30 AM, Raghav Shankar wrote: > > Hey Imran, > > Thanks for the great explanation! This cleared up a lot of things for me. I > am actually trying to utilize some of the features withi

Re: Task result in Spark Worker Node

2015-04-17 Thread Raghav Shankar
I am doing wrong, or how I can properly send the serialized version of the RDD and function to my other program. My thought is that I might need to add more jars to the build path, but I have no clue if thats the issue and what jars I need to add. Thanks, Raghav > On Apr 13, 2015, at 10:22 PM

Re: Sending RDD object over the network

2015-04-06 Thread Raghav Shankar
object to my second program? Thanks, Raghav On Mon, Apr 6, 2015 at 3:08 AM, Akhil Das wrote: > Are you expecting to receive 1 to 100 values in your second program? > > RDD is just an abstraction, you would need to do like: > > num.foreach(x => send(x)) > > > Thanks >