Hi Rohit,

I think the 3rd question on the FAQ may help you.

https://spark.apache.org/faq.html

Some other links that talk about building bigger clusters and processing
more data:

http://spark-summit.org/wp-content/uploads/2014/07/Building-1000-node-Spark-Cluster-on-EMR.pdf
http://apache-spark-user-list.1001560.n3.nabble.com/Largest-Spark-Cluster-td3782.html



Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Wed, Jul 16, 2014 at 9:17 AM, Rohit Pujari <rpuj...@hortonworks.com>
wrote:

> Hello Folks:
>
> There is lot of buzz in the hadoop community around Spark's inability to
> scale beyond the 1 TB datasets ( or 10-20 nodes). It is being regarded as
> great tech for cpu intensive workloads on smaller data( less that TB) but
> fails to scale and perform effectively on larger datasets. How true it is?
>
> Are there any customers in who are running petabyte scale workloads on
> spark in production? Are there any benchmarks performed by databricks or
> other companies to clear this perception?
>
>  I'm a big fan of spark. Knowing spark is in its early stages, I'd like
> to better understand boundaries of the tech and recommend right solution
> for right problem.
>
> Thanks,
> Rohit Pujari
> Solutions Engineer, Hortonworks
> rpuj...@hortonworks.com
> 716-430-6899
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Reply via email to