Re: Spark performance

santoshv98 Sun, 12 Jul 2015 06:55:11 -0700

Ravi


Spark (or in that case Big Data solutions like Hive) is suited for large 
analytical loads, where the “scaling  up” starts to pale in comparison to 
“Scaling out” with regards to performance, versatility(types of data) and cost. 
Without going into the details of MsSQL architecture, there is an inflection 
point in terms of cost(licensing), performance and Maintainability where open 
Source commodity platform would start to become viable albeit sometimes at the 
expense of slower performance. With 1 million records ,  I am not sure you are 
reaching that point to justify a Spark cluster. So why are you planning to move 
away from MSSql and move to Spark as the destination platform?


You said “Spark performance” is slow as compared to MSSql. What kind of load 
are you running and what kind of querying are you performing? There may be 
startup costs associated with running the Map side of the querying.


If your testing to understand Spark, can you post what you are currently doing 
(queries, table structures, compression and storage optimizations)? That way, 
we could look at suggesting optimizations but again, not to compare with MsSQL, 
but to improve Spark side of things.


Again, to quote someone who answered earlier in the thread, What is your ‘Use 
case’? 


-Santosh






Sent from Windows Mail





From: Jörn Franke
Sent: ‎Saturday‎, ‎July‎ ‎11‎, ‎2015 ‎8‎:‎20‎ ‎PM
To: Mohammed Guller, Ravisankar Mani, user@spark.apache.org





Honestly you are addressing this wrongly - you do not seem.to have a business 
case for changing - so why do you want to switch 




Le sam. 11 juil. 2015 à 3:28, Mohammed Guller <moham...@glassbeam.com> a écrit :





Hi Ravi,

First, Neither Spark nor Spark SQL is a database. Both are compute engines, 
which need to be paired with a storage system. Seconds, they are designed for 
processing large distributed datasets. If you have only 100,000 records or even 
a million records, you don’t need Spark. A RDBMS will perform much better for 
that volume of data.

 

Mohammed

 

From: Ravisankar Mani [mailto:rrav...@gmail.com] 
Sent: Friday, July 10, 2015 3:50 AM
To: user@spark.apache.org
Subject: Spark performance



 





Hi everyone,


I have planned to move mssql server to spark?.  I have using around 50,000 to 
1l records.


 The spark performance is slow when compared to mssql server.


 

What is the best data base(Spark or sql) to store or retrieve data around 
50,000 to 1l records ?

regards,

Ravi

Re: Spark performance

Reply via email to