Hi
"Processing Speed" can be at a software level (Code Optimization) and at a
hardware level (Capacity Planning)
Deepak
"The greatness of a nation can be judged by the way its animals are treated
- Mahatma Gandhi"
+91 73500 12833
deic...@gmail.com
Facebook: https://www.facebook.com/deicool
Link
How many 'hardware threads' do you have?
Deepak
"The greatness of a nation can be judged by the way its animals are treated
- Mahatma Gandhi"
+91 73500 12833
deic...@gmail.com
Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool
"Plant a Tree, Go Green"
Make In Ind
Hi,
I am trying to convert json file into parquet format using spark and json file
contains a map where key and value are defined and actual key is scriptId. It
fails with below exception-
java.lang.ClassCastException: optional binary scriptId (UTF8) is not a group
at org.apache.parquet.
I think your requirement is that of OLTP system. Spark & Cassandra are more
suitable for batch kind of jobs (They can be used for OLTP but there would
be a performance hit)
Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by
AM. So,
> if general guidelines are followed, **virtual memory** is moot.
>
> *From: *Deepak Goel
> *Date: *Saturday, April 28, 2018 at 12:58 PM
> *To: *Stephen Boesch
> *Cc: *klrmowse , "user @spark"
> *Subject: *Re: [Spark 2.x Core] .collect() size limit
>
ck the source code to see if there were disk backed collects actually
> happening for some cases?
>
> 2018-04-28 9:48 GMT-07:00 Deepak Goel :
>
>> There is something as *virtual memory*
>>
>> On Sat, 28 Apr 2018, 21:19 Stephen Boesch, wrote:
>>
>>> Do yo
There is something as *virtual memory*
On Sat, 28 Apr 2018, 21:19 Stephen Boesch, wrote:
> Do you have a machine with terabytes of RAM? afaik collect() requires
> RAM - so that would be your limiting factor.
>
> 2018-04-28 8:41 GMT-07:00 klrmowse :
>
>> i am currently trying to find a workarou
tain of these instances is the logging.
>
> Thanks,
> —Ken
>
> On Jun 16, 2016, at 12:17 PM, Deepak Goel wrote:
>
> I guess what you are saying is:
>
> 1. The nodes work perfectly ok without io wait before Spark job.
> 2. After you have run Spark job and killed it,
Seems like the exexutor memory is not enough for your job and it is writing
objects to disk
On Jun 17, 2016 2:25 AM, "Cassa L" wrote:
>
>
> On Thu, Jun 16, 2016 at 5:27 AM, Deepak Goel wrote:
>
>> What is your hardware configuration like which you are running S
still lost nodes.
> 4. He’s currently running storage benchmarking tests, which consist mainly
> of shuffles.
>
> Thanks!
> Ken
>
> On Jun 16, 2016, at 8:00 AM, Deepak Goel wrote:
>
> I am no expert, but some naive thoughts...
>
> 1. How many HPC nodes do you have
Just wondering, if threads were purely an hardware implementation then if
my application in Java had one thread, and it was ran on a multcore machine
then that thread in Java could be split up into small parts and ran in
different cores simultaneously. However this would raise synchronization
probl
What is your hardware configuration like which you are running Spark on?
Hey
Namaskara~Nalama~Guten Tag~Bonjour
--
Keigu
Deepak
73500 12833
www.simtree.net, dee...@simtree.net
deic...@gmail.com
LinkedIn: www.linkedin.com/in/deicool
Skype: thumsupdeicool
Google talk: deicool
Blog: http://lo
I am no expert, but some naive thoughts...
1. How many HPC nodes do you have? How many of them crash (What do you mean
by multiple)? Do all of them crash?
2. What things are you running on Puppet? Can't you switch it off and test
Spark? Also you can switch of Facter. Btw, your observation that th
I am not an expert but some thoughts inline
On Jun 16, 2016 6:31 AM, "Maja Kabiljo" wrote:
>
> Hi,
>
> We are running some experiments with GraphX in order to compare it with
other systems. There are multiple settings which significantly affect
performance, and we experimented a lot in order
am, i7
>>
>> Will this config be able to handle the processing without my ipythin
>> notebook dying?
>>
>> The local mode is for testing purpose. But, I do not have any cluster at
>> my disposal. So can I make this work with the configuration that I have?
>>
I am not an expert, but it seems all your processing is done on node1 while
node2 is lying idle
Hey
Namaskara~Nalama~Guten Tag~Bonjour
--
Keigu
Deepak
73500 12833
www.simtree.net, dee...@simtree.net
deic...@gmail.com
LinkedIn: www.linkedin.com/in/deicool
Skype: thumsupdeicool
Google talk:
>>>
>>>
>>> On 11 June 2016 at 16:10, Ted Yu wrote:
>>>
>>>>
>>>> https://www.amazon.com/Machine-Learning-Spark-Powerful-Algorithms/dp/1783288515/ref=sr_1_1?ie=UTF8&qid=1465657706&sr=8-1&keywords=spark+mllib
>>>>
>&
Hey
Namaskara~Nalama~Guten Tag~Bonjour
I am a newbie to Machine Learning (MLIB and other libraries on Spark)
Which would be the best book to learn up?
Thanks
Deepak
--
Keigu
Deepak
73500 12833
www.simtree.net, dee...@simtree.net
deic...@gmail.com
LinkedIn: www.linkedin.com/in/deicool
Skype
.
Sent from my iPhone
On Jun 5, 2016, at 4:37 PM, Deepak Goel wrote:
Hello
Sorry, I am new to Spark.
Spark claims it can do all that what MapReduce can do (and more!) but 10X
times faster on disk, and 100X faster in memory. Why would then I use
Mapreduce at all?
Thanks
Deepak
Hey
Namaskara~N
Hello
Sorry, I am new to Spark.
Spark claims it can do all that what MapReduce can do (and more!) but 10X
times faster on disk, and 100X faster in memory. Why would then I use
Mapreduce at all?
Thanks
Deepak
Hey
Namaskara~Nalama~Guten Tag~Bonjour
--
Keigu
Deepak
73500 12833
www.simtree.n
20 matches
Mail list logo