Re: Inquiry about Processing Speed

2023-09-27 Thread Deepak Goel
Hi "Processing Speed" can be at a software level (Code Optimization) and at a hardware level (Capacity Planning) Deepak "The greatness of a nation can be judged by the way its animals are treated - Mahatma Gandhi" +91 73500 12833 deic...@gmail.com Facebook: https://www.facebook.com/deicool Link

Re: GC issue - Ext Root Scanning

2021-11-15 Thread Deepak Goel
How many 'hardware threads' do you have? Deepak "The greatness of a nation can be judged by the way its animals are treated - Mahatma Gandhi" +91 73500 12833 deic...@gmail.com Facebook: https://www.facebook.com/deicool LinkedIn: www.linkedin.com/in/deicool "Plant a Tree, Go Green" Make In Ind

json to parquet failure

2021-10-10 Thread Deepak Goel
Hi, I am trying to convert json file into parquet format using spark and json file contains a map where key and value are defined and actual key is scriptId. It fails with below exception- java.lang.ClassCastException: optional binary scriptId (UTF8) is not a group    at org.apache.parquet.

Re: spark optimized pagination

2018-06-10 Thread Deepak Goel
I think your requirement is that of OLTP system. Spark & Cassandra are more suitable for batch kind of jobs (They can be used for OLTP but there would be a performance hit) Deepak "The greatness of a nation can be judged by the way its animals are treated. Please consider stopping the cruelty by

Re: [Spark 2.x Core] .collect() size limit

2018-04-30 Thread Deepak Goel
AM. So, > if general guidelines are followed, **virtual memory** is moot. > > *From: *Deepak Goel > *Date: *Saturday, April 28, 2018 at 12:58 PM > *To: *Stephen Boesch > *Cc: *klrmowse , "user @spark" > *Subject: *Re: [Spark 2.x Core] .collect() size limit >

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Deepak Goel
ck the source code to see if there were disk backed collects actually > happening for some cases? > > 2018-04-28 9:48 GMT-07:00 Deepak Goel : > >> There is something as *virtual memory* >> >> On Sat, 28 Apr 2018, 21:19 Stephen Boesch, wrote: >> >>> Do yo

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Deepak Goel
There is something as *virtual memory* On Sat, 28 Apr 2018, 21:19 Stephen Boesch, wrote: > Do you have a machine with terabytes of RAM? afaik collect() requires > RAM - so that would be your limiting factor. > > 2018-04-28 8:41 GMT-07:00 klrmowse : > >> i am currently trying to find a workarou

Re: Spark crashes worker nodes with multiple application starts

2016-06-16 Thread Deepak Goel
tain of these instances is the logging. > > Thanks, > —Ken > > On Jun 16, 2016, at 12:17 PM, Deepak Goel wrote: > > I guess what you are saying is: > > 1. The nodes work perfectly ok without io wait before Spark job. > 2. After you have run Spark job and killed it,

Re: Spark Memory Error - Not enough space to cache broadcast

2016-06-16 Thread Deepak Goel
Seems like the exexutor memory is not enough for your job and it is writing objects to disk On Jun 17, 2016 2:25 AM, "Cassa L" wrote: > > > On Thu, Jun 16, 2016 at 5:27 AM, Deepak Goel wrote: > >> What is your hardware configuration like which you are running S

Re: Spark crashes worker nodes with multiple application starts

2016-06-16 Thread Deepak Goel
still lost nodes. > 4. He’s currently running storage benchmarking tests, which consist mainly > of shuffles. > > Thanks! > Ken > > On Jun 16, 2016, at 8:00 AM, Deepak Goel wrote: > > I am no expert, but some naive thoughts... > > 1. How many HPC nodes do you have

Re: What is the interpretation of Cores in Spark doc

2016-06-16 Thread Deepak Goel
Just wondering, if threads were purely an hardware implementation then if my application in Java had one thread, and it was ran on a multcore machine then that thread in Java could be split up into small parts and ran in different cores simultaneously. However this would raise synchronization probl

Re: Spark Memory Error - Not enough space to cache broadcast

2016-06-16 Thread Deepak Goel
What is your hardware configuration like which you are running Spark on? Hey Namaskara~Nalama~Guten Tag~Bonjour -- Keigu Deepak 73500 12833 www.simtree.net, dee...@simtree.net deic...@gmail.com LinkedIn: www.linkedin.com/in/deicool Skype: thumsupdeicool Google talk: deicool Blog: http://lo

Re: Spark crashes worker nodes with multiple application starts

2016-06-16 Thread Deepak Goel
I am no expert, but some naive thoughts... 1. How many HPC nodes do you have? How many of them crash (What do you mean by multiple)? Do all of them crash? 2. What things are you running on Puppet? Can't you switch it off and test Spark? Also you can switch of Facter. Btw, your observation that th

Re: GraphX performance and settings

2016-06-15 Thread Deepak Goel
I am not an expert but some thoughts inline On Jun 16, 2016 6:31 AM, "Maja Kabiljo" wrote: > > Hi, > > We are running some experiments with GraphX in order to compare it with other systems. There are multiple settings which significantly affect performance, and we experimented a lot in order

Re: processing 50 gb data using just one machine

2016-06-15 Thread Deepak Goel
am, i7 >> >> Will this config be able to handle the processing without my ipythin >> notebook dying? >> >> The local mode is for testing purpose. But, I do not have any cluster at >> my disposal. So can I make this work with the configuration that I have? >>

Re: Is that normal spark performance?

2016-06-15 Thread Deepak Goel
I am not an expert, but it seems all your processing is done on node1 while node2 is lying idle Hey Namaskara~Nalama~Guten Tag~Bonjour -- Keigu Deepak 73500 12833 www.simtree.net, dee...@simtree.net deic...@gmail.com LinkedIn: www.linkedin.com/in/deicool Skype: thumsupdeicool Google talk:

Re: Book for Machine Learning (MLIB and other libraries on Spark)

2016-06-12 Thread Deepak Goel
>>> >>> >>> On 11 June 2016 at 16:10, Ted Yu wrote: >>> >>>> >>>> https://www.amazon.com/Machine-Learning-Spark-Powerful-Algorithms/dp/1783288515/ref=sr_1_1?ie=UTF8&qid=1465657706&sr=8-1&keywords=spark+mllib >>>> >&

Book for Machine Learning (MLIB and other libraries on Spark)

2016-06-11 Thread Deepak Goel
Hey Namaskara~Nalama~Guten Tag~Bonjour I am a newbie to Machine Learning (MLIB and other libraries on Spark) Which would be the best book to learn up? Thanks Deepak -- Keigu Deepak 73500 12833 www.simtree.net, dee...@simtree.net deic...@gmail.com LinkedIn: www.linkedin.com/in/deicool Skype

Performance of Spark/MapReduce

2016-06-05 Thread Deepak Goel
. Sent from my iPhone On Jun 5, 2016, at 4:37 PM, Deepak Goel wrote: Hello Sorry, I am new to Spark. Spark claims it can do all that what MapReduce can do (and more!) but 10X times faster on disk, and 100X faster in memory. Why would then I use Mapreduce at all? Thanks Deepak Hey Namaskara~N

Performance of Spark/MapReduce

2016-06-05 Thread Deepak Goel
Hello Sorry, I am new to Spark. Spark claims it can do all that what MapReduce can do (and more!) but 10X times faster on disk, and 100X faster in memory. Why would then I use Mapreduce at all? Thanks Deepak Hey Namaskara~Nalama~Guten Tag~Bonjour -- Keigu Deepak 73500 12833 www.simtree.n