Yup, as mentioned in the FAQ, we are aware of multiple deployments running jobs on over 1000 nodes. Some of our proof of concepts involved people running a 2000-node job on EC2.
I wouldn't confuse buzz with FUD :). Matei On Jul 15, 2014, at 9:17 PM, Sonal Goyal <sonalgoy...@gmail.com> wrote: > Hi Rohit, > > I think the 3rd question on the FAQ may help you. > > https://spark.apache.org/faq.html > > Some other links that talk about building bigger clusters and processing more > data: > > http://spark-summit.org/wp-content/uploads/2014/07/Building-1000-node-Spark-Cluster-on-EMR.pdf > http://apache-spark-user-list.1001560.n3.nabble.com/Largest-Spark-Cluster-td3782.html > > > > Best Regards, > Sonal > Nube Technologies > > > > > > > On Wed, Jul 16, 2014 at 9:17 AM, Rohit Pujari <rpuj...@hortonworks.com> wrote: > Hello Folks: > > There is lot of buzz in the hadoop community around Spark's inability to > scale beyond the 1 TB datasets ( or 10-20 nodes). It is being regarded as > great tech for cpu intensive workloads on smaller data( less that TB) but > fails to scale and perform effectively on larger datasets. How true it is? > > Are there any customers in who are running petabyte scale workloads on spark > in production? Are there any benchmarks performed by databricks or other > companies to clear this perception? > > I'm a big fan of spark. Knowing spark is in its early stages, I'd like to > better understand boundaries of the tech and recommend right solution for > right problem. > > Thanks, > Rohit Pujari > Solutions Engineer, Hortonworks > rpuj...@hortonworks.com > 716-430-6899 > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader of > this message is not the intended recipient, you are hereby notified that any > printing, copying, dissemination, distribution, disclosure or forwarding of > this communication is strictly prohibited. If you have received this > communication in error, please contact the sender immediately and delete it > from your system. Thank You. >