Respected Sir/Madam,
I am Tarunraghav. I have a query regarding spark on kubernetes.
We have an eks cluster, within which we have spark installed in the pods.
We set the executor memory as 1GB and set the executor instances as 2, I
have also set dynamic allocation as true. So when I try to read a
Sorry I forgot to ask how can I use spark context here ? I have hdfs
directory path of the files, as well as the name node of hdfs cluster.
Thanks for your help.
On Mon, Nov 21, 2016 at 9:45 PM, Raghav wrote:
> Hi
>
> I am extremely new to Spark. I have to read a file form HDFS, a
HDFS is as follows:
UUID. FirstName LastName Zip
7462 John Doll06903
5231 Brad Finley 32820
Can someone point me how to get a JavaRDD object by reading the
file in HDFS ?
Thanks.
--
Raghav
both Spark and Kafka, and looking
for some pointers to start exploring.
Thanks.
--
Raghav
Thanks a ton, guys.
On Sun, Nov 6, 2016 at 4:57 PM, raghav wrote:
> I am newbie in the world of big data analytics, and I want to teach myself
> Apache Spark, and want to be able to write scripts to tinker with data.
>
> I have some understanding of Map Reduce but have not had a c
and 2016 videos. Regading practice, I would strongly suggest
> Databricks cloud (or download prebuilt from spark site). You can also take
> courses from EDX/Berkley, which are very good starter courses.
>
> On Mon, Nov 7, 2016 at 11:57 AM, raghav wrote:
>
>> I am newbie in the worl
some guidance for starter material, or videos.
Thanks.
Raghav
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with-Spark-tp28032.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
ttp://ec2-xxx.compute-1.amazonaws.com/10.165.103.16:7077> from. I never
specify that in the master url command line parameter. Any ideas on what I
might be doing wrong?
> On Jun 19, 2015, at 7:19 PM, Andrew Or wrote:
>
> Hi Raghav,
>
> I'm assuming you're using stan
Thanks Andrew! Is this all I have to do when using the spark ec2 script to
setup a spark cluster? It seems to be getting an assembly jar that is not
from my project(perhaps from a maven repo). Is there a way to make the ec2
script use the assembly jar that I created?
Thanks,
Raghav
On Friday
So, I would add the assembly jar to the just the master or would I have to add
it to all the slaves/workers too?
Thanks,
Raghav
> On Jun 17, 2015, at 5:13 PM, DB Tsai wrote:
>
> You need to build the spark assembly with your modification and deploy
> into cluster.
>
>
setup scripts, it
sets up spark, but I think my custom built spark-core jar is not being used.
How do it up on EC2 so that my custom version of Spark-core is used?
Thanks,
Raghav
> On Jun 9, 2015, at 7:41 PM, DB Tsai wrote:
>
> Having the following code in RDD.scala works for me. P
will upload this jar to YARN cluster automatically
> and then you can run your application as usual.
> It does not care about which version of Spark in your YARN cluster.
>
> 2015-06-17 10:42 GMT+08:00 Raghav Shankar >:
>
>> The documentation says spark.driver.userClassPath
enough to tell Spark to use that spark-core jar instead of the default?
Thanks,
Raghav
> On Jun 16, 2015, at 7:19 PM, Will Briggs wrote:
>
> If this is research-only, and you don't want to have to worry about updating
> the jars installed by default on the cluster, you can add
would be very useful.
Thanks,
Raghav
> On Jun 16, 2015, at 6:57 PM, Will Briggs wrote:
>
> In general, you should avoid making direct changes to the Spark source code.
> If you are using Scala, you can seamlessly blend your own methods on top of
> the base RDDs using impli
entire data and
collecting it on the driver node is not a typical use case? If I want to do
this using sortBy(), I would first call sortBy() followed by a collect().
Collect() would involve gathering all the data on a single machine as well.
Thanks,
Raghav
On Tuesday, June 9, 2015, Mark Hamstra wrote
y,
>
> DB Tsai
> ---
> Blog: https://www.dbtsai.com
>
>
> On Thu, Jun 4, 2015 at 10:46 AM, Raghav Shankar > wrote:
> > Hey Reza,
> >
> > Thanks for your response!
> >
> > Your response clarifies some of my initi
d you provide some insight into this?
Thanks,
Raghav
On Thursday, June 4, 2015, Reza Zadeh wrote:
> In a regular reduce, all partitions have to send their reduced value to a
> single machine, and that machine can become a bottleneck.
>
> In a treeReduce, the partitions talk to each other
.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
> On Apr 17, 2015, at 2:30 AM, Raghav Shankar wrote:
>
> Hey Imran,
>
> Thanks for the great explanation! This cleared up a lot of things for me. I
> am actually trying to utilize some of the features withi
I am doing wrong, or how I can
properly send the serialized version of the RDD and function to my other
program. My thought is that I might need to add more jars to the build path,
but I have no clue if thats the issue and what jars I need to add.
Thanks,
Raghav
> On Apr 13, 2015, at 10:22 PM
object to my
second program?
Thanks,
Raghav
On Mon, Apr 6, 2015 at 3:08 AM, Akhil Das
wrote:
> Are you expecting to receive 1 to 100 values in your second program?
>
> RDD is just an abstraction, you would need to do like:
>
> num.foreach(x => send(x))
>
>
> Thanks
>
20 matches
Mail list logo