\cc David Tompkins and Jim Donahue if they have anything to add. 
\cc My school email. Please include bamos_cmu.edu for further discussion. 

Hi Soumya,


ssimanta wrote
> The project mentions - "process petabytes of data in real-time". I'm
> curious to know if the architecture implemented in the Github repo was
> used
> to process petabytes?
> If yes, how many nodes did you use for this and did you use Spark
> standalone cluster or with YARN/Mesos ?
> I'm also interested to know what issues you had with Spray and Akka
> working
> at this scale.

Great question, I've added the following portion to the README's intro
section to make it clear that Spindle is not yet ready for processing
petabytes of data in real-time.

Also, I'd be interested in seeing how Spray/Akka at larger
scales compares to using job or resource managers.
We're currently running Spindle on the standalone cluster.

Regards,
Brandon.

---

This repo contains the Spindle implementation and benchmarking scripts
to observe Spindle's performance while exploring Spark's tuning options.
Spindle's goal is to process petabytes of data on thousands of nodes,
but the current implementation has not yet been tested at this scale.
Our current experimental results use six nodes,
each with 24 cores and 21g of Spark memory, to query 13.1GB of analytics
data.
The trends show that further Spark tuning and optimizations should
be investigated before attempting larger scale deployments.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Open-sourcing-Spindle-by-Adobe-Research-a-web-analytics-processing-engine-in-Scala-Spark-and-Parquet-tp12203p12706.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to