Re: Spark is in-memory processing, how then can Tachyon make Spark faster?

andy petrella Fri, 07 Aug 2015 13:30:00 -0700

Exactly!

The sharing part is used in the Spark Notebook (this one
<https://github.com/andypetrella/spark-notebook/blob/master/notebooks/Tachyon%20Test.snb>)
so we can share stuffs between notebooks which are different SparkContext
(in diff JVM).


OTOH, we have a project that creates micro services on genomics data, for
several reasons we used Tachyon to server genomes cubes (ranges across
genomes), see here <https://github.com/med-at-scale/high-health>.

HTH
andy

On Fri, Aug 7, 2015 at 8:36 PM Calvin Jia <jia.cal...@gmail.com> wrote:

> Hi,
>
> Tachyon <http://tachyon-project.org> manages memory off heap which can
> help prevent long GC pauses. Also, using Tachyon will allow the data to be
> shared between Spark jobs if they use the same dataset.
>
> Here's <http://www.meetup.com/Tachyon/events/222485713/> a production use
> case where Baidu runs Tachyon to get 30x performance improvement in their
> SparkSQL workload.
>
> Hope this helps,
> Calvin
>
> On Fri, Aug 7, 2015 at 9:42 AM, Muler <mulugeta.abe...@gmail.com> wrote:
>
>> Spark is an in-memory engine and attempts to do computation in-memory.
>> Tachyon is memory-centeric distributed storage, OK, but how would that help
>> ran Spark faster?
>>
>
> --
andy

Re: Spark is in-memory processing, how then can Tachyon make Spark faster?

Reply via email to