Re: Are tachyon and akka removed from 2.1.1 please

2017-05-23 Thread ??????????
thanks gromakowski and chin wei. ---Original--- From: "vincent gromakowski" Date: 2017/5/23 00:54:33 To: "Chin Wei Low"; Cc: "user";"??"<1427357...@qq.com>;"Gene Pang"; Subject: Re: Are tachyon and akka removed from 2.1.1 please

Re: Are tachyon and akka removed from 2.1.1 please

2017-05-23 Thread ??????????
thanks Gene. ---Original--- From: "Gene Pang" Date: 2017/5/22 22:19:47 To: "??"<1427357...@qq.com>; Cc: "user"; Subject: Re: Are tachyon and akka removed from 2.1.1 please Hi, Tachyon has been renamed to Alluxio. Here is the documentation for runn

Re: Are tachyon and akka removed from 2.1.1 please

2017-05-22 Thread vincent gromakowski
Akka has been replaced by netty in 1.6 Le 22 mai 2017 15:25, "Chin Wei Low" a écrit : > I think akka has been removed since 2.0. > > On 22 May 2017 10:19 pm, "Gene Pang" wrote: > >> Hi, >> >> Tachyon has been renamed to Alluxio. Here is t

Re: Are tachyon and akka removed from 2.1.1 please

2017-05-22 Thread Chin Wei Low
I think akka has been removed since 2.0. On 22 May 2017 10:19 pm, "Gene Pang" wrote: > Hi, > > Tachyon has been renamed to Alluxio. Here is the documentation for > running Alluxio with Spark > <http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html&g

Re: Are tachyon and akka removed from 2.1.1 please

2017-05-22 Thread Gene Pang
Hi, Tachyon has been renamed to Alluxio. Here is the documentation for running Alluxio with Spark <http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html>. Hope this helps, Gene On Sun, May 21, 2017 at 6:15 PM, 萝卜丝炒饭 <1427357...@qq.com> wrote: > HI all, > Irea

Are tachyon and akka removed from 2.1.1 please

2017-05-21 Thread ??????????
HI all, Iread some paper about source code, the paper base on version 1.2. they refer the tachyon and akka. When i read the 2.1code. I can not find the code abiut akka and tachyon. Are tachyon and akka removed from 2.1.1 please

Re: off heap to alluxio/tachyon in Spark 2

2016-09-19 Thread Bin Fan
Hi, If you are looking for how to run Spark on Alluxio (formerly Tachyon), here is the documentation from Alluxio doc site: http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html It still works for Spark 2.x. Alluxio team also published articles on when and why running Spark (2.x

Re: off heap to alluxio/tachyon in Spark 2

2016-09-19 Thread Sean Owen
wrote: > Here is my understanding. > > Spark used Tachyon as an off-heap solution for RDDs. In certain situations, > it would alleviate Garbage Collection or the RDDs. > > Tungsten, Spark 2’s off-heap (columnar format) is much more efficient and > used as the default. Alluvio no longe

Re: off heap to alluxio/tachyon in Spark 2

2016-09-19 Thread Richard Catlin
Here is my understanding. Spark used Tachyon as an off-heap solution for RDDs. In certain situations, it would alleviate Garbage Collection or the RDDs. Tungsten, Spark 2’s off-heap (columnar format) is much more efficient and used as the default. Alluvio no longer makes sense for this use

off heap to alluxio/tachyon in Spark 2

2016-09-19 Thread aka.fe2s
Hi folks, What has happened with Tachyon / Alluxio in Spark 2? Doc doesn't mention it no longer. -- Oleksiy Dyagilev

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-02-01 Thread Jia Zou
Hi, Calvin, I am running 24GB data Spark KMeans in a c3.2xlarge AWS instance with 30GB physical memory. Spark will cache data off-heap to Tachyon, the input data is also stored in Tachyon. Tachyon is configured to use 15GB memory, and use tired store. Tachyon underFS is /tmp. The only

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-29 Thread cc
Hey, Jia Zou I'm curious about this exception, the error log you showed that the exception is related to unlockBlock, could you upload your full master.log and worker.log under tachyon/logs directory? Best, Cheng 在 2016年1月29日星期五 UTC+8上午11:11:19,Calvin Jia写道: > > Hi, > &g

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-28 Thread Calvin Jia
Hi, Thanks for the detailed information. How large is the dataset you are running against? Also did you change any Tachyon configurations? Thanks, Calvin - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-27 Thread Jia Zou
) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) On Wed, Jan 27, 2016 at 5:53 AM, Jia Zou wrote: > BTW. The tachyon worker log says following: > > > > 2015-12-

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-27 Thread Jia Zou
BTW. The tachyon worker log says following: 2015-12-27 01:33:44,599 ERROR WORKER_LOGGER (WorkerBlockMasterClient.java:getId) - java.net.SocketException: Connection reset org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-27 Thread Jia Zou
BTW. the error happens when configure Spark to read input file from Tachyon like following: /home/ubuntu/spark-1.6.0/bin/spark-submit --properties-file /home/ubuntu/HiBench/report/kmeans/spark/java/conf/sparkbench/spark.conf --class org.apache.spark.examples.mllib.JavaKMeans --master spark://ip

TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-27 Thread Jia Zou
Dears, I keep getting below exception when using Spark 1.6.0 on top of Tachyon 0.8.2. Tachyon is 93% used and configured as CACHE_THROUGH. Any suggestions will be appreciated, thanks! = Exception in thread "main" org.apache.spark.SparkException: J

Re: How to query data in tachyon with spark-sql

2016-01-24 Thread Gene Pang
Hi, You should be able to point Hive to Tachyon instead of HDFS, and that should allow Hive to access data in Tachyon. If Spark SQL was pointing to an HDFS file, you could instead point it to a Tachyon file, and that should work too. Hope that helps, Gene On Wed, Jan 20, 2016 at 2:06 AM, Sea

How to query data in tachyon with spark-sql

2016-01-20 Thread Sea
Hi,all I want to mount some hive table in tachyon, but I don't know how to query data in tachyon with spark-sql, who knows?

Re: Saving RDDs in Tachyon

2015-12-09 Thread Calvin Jia
Hi Mark, Were you able to successfully store the RDD with Akhil's method? When you read it back as an objectFile, you will also need to specify the correct type. You can find more information about integrating Spark and Tachyon on this page: http://tachyon-project.org/documentation/Running-

how often you use Tachyon to accelerate Spark

2015-12-06 Thread Arvin
Hi,all Well, I have some questions about Tachyon and Spark. I found the interactive between Spark and Tachyon is the caching RDD use off-heap. I wonder if you guys use Tachyon frequently, such caching RDD by Tachyon? Is this action(caching rdd by tachyon) has a profound effect to accelerate Spark

Re: Saving RDDs in Tachyon

2015-10-30 Thread Akhil Das
I guess you can do a .saveAsObjectFiles and read it back as sc.objectFile Thanks Best Regards On Fri, Oct 23, 2015 at 7:57 AM, mark wrote: > I have Avro records stored in Parquet files in HDFS. I want to read these > out as an RDD and save that RDD in Tachyon for any spark job that wan

Re: How does Spark coordinate with Tachyon wrt data locality

2015-10-23 Thread Calvin Jia
Hi Shane, Tachyon provides an api to get the block locations of the file which Spark uses when scheduling tasks. Hope this helps, Calvin On Fri, Oct 23, 2015 at 8:15 AM, Kinsella, Shane wrote: > Hi all, > > > > I am looking into how Spark handles data locality wrt Tachyon. My

How does Spark coordinate with Tachyon wrt data locality

2015-10-23 Thread Kinsella, Shane
Hi all, I am looking into how Spark handles data locality wrt Tachyon. My main concern is how this is coordinated. Will it send a task based on a file loaded from Tachyon to a node that it knows has that file locally and how does it know which nodes has what? Kind regards, Shane This email

Saving RDDs in Tachyon

2015-10-22 Thread mark
I have Avro records stored in Parquet files in HDFS. I want to read these out as an RDD and save that RDD in Tachyon for any spark job that wants the data. How do I save the RDD in Tachyon? What format do I use? Which RDD 'saveAs...' method do I want? Thanks

Re: Spark Streaming with Tachyon : Data Loss on Receiver Failure due to WAL error

2015-09-26 Thread N B
not need to >> enable Data Check pointing. >> >> From my experiments and the PR I mentioned , I configured the Meta Data >> Check Pointing in HDFS , and stored the Received Blocks OFF_HEAP. And I >> did not use any WAL . The PR I proposed would recover from Dr

Re: Spark Streaming with Tachyon : Data Loss on Receiver Failure due to WAL error

2015-09-26 Thread N B
gt; without using any WAL like feature because Blocks are already available in > Tachyon. The Meta Data Checkpoint helps to recover the meta data about past > received blocks. > > Now the question is , can I configure Tachyon as my Metadata Checkpoint > location ? I tried that , and Stre

Re: Spark Streaming with Tachyon : Data Loss on Receiver Failure due to WAL error

2015-09-26 Thread Dibyendu Bhattacharya
not use any WAL . The PR I proposed would recover from Driver fail-over without using any WAL like feature because Blocks are already available in Tachyon. The Meta Data Checkpoint helps to recover the meta data about past received blocks. Now the question is , can I configure Tachyon as my Metad

Re: Spark Streaming with Tachyon : Data Loss on Receiver Failure due to WAL error

2015-09-26 Thread N B
Hi Dibyendu, I am not sure I understand completely. But are you suggesting that currently there is no way to enable Checkpoint directory to be in Tachyon? Thanks Nikunj On Fri, Sep 25, 2015 at 11:49 PM, Dibyendu Bhattacharya < dibyendu.bhattach...@gmail.com> wrote: > Hi, > >

Re: Spark Streaming with Tachyon : Data Loss on Receiver Failure due to WAL error

2015-09-25 Thread Dibyendu Bhattacharya
Hi, Recently I was working on a PR to use Tachyon as OFF_HEAP store for Spark Streaming and make sure Spark Streaming can recover from Driver failure and recover the blocks form Tachyon. The The Motivation for this PR is : If Streaming application stores the blocks OFF_HEAP, it may not need

Re: Spark Streaming with Tachyon : Data Loss on Receiver Failure due to WAL error

2015-09-25 Thread N B
Hi Dibyendu, How does one go about configuring spark streaming to use tachyon as its place for storing checkpoints? Also, can one do this with tachyon running on a completely different node than where spark processes are running? Thanks Nikunj On Thu, May 21, 2015 at 8:35 PM, Dibyendu

Re: BlockNotFoundException when running spark word count on Tachyon

2015-08-26 Thread Dibyendu Bhattacharya
The URL seems to have changed .. here is the one .. http://tachyon-project.org/documentation/Tiered-Storage-on-Tachyon.html On Wed, Aug 26, 2015 at 12:32 PM, Dibyendu Bhattacharya < dibyendu.bhattach...@gmail.com> wrote: > Sometime back I was playing with Spark and Tachyon and I a

Re: BlockNotFoundException when running spark word count on Tachyon

2015-08-26 Thread Dibyendu Bhattacharya
Sometime back I was playing with Spark and Tachyon and I also found this issue . The issue here is TachyonBlockManager put the blocks in WriteType.TRY_CACHE configuration . And because of this Blocks ate evicted from Tachyon Cache when Memory is full and when Spark try to find the block it throws

BlockNotFoundException when running spark word count on Tachyon

2015-08-25 Thread Todd
I am using tachyon in the spark program below,but I encounter a BlockNotFoundxception. Does someone know what's wrong and also is there guide on how to configure spark to work with Tackyon?Thanks! conf.set("spark.externalBlockStore.url", "tachyon://10.18.19.33:

Re: tachyon

2015-08-07 Thread Abhishek R. Singh
Thanks Calvin - much appreciated ! -Abhishek- On Aug 7, 2015, at 11:11 AM, Calvin Jia wrote: > Hi Abhishek, > > Here's a production use case that may interest you: > http://www.meetup.com/Tachyon/events/222485713/ > > Baidu is using Tachyon to manage more than

Re: Spark is in-memory processing, how then can Tachyon make Spark faster?

2015-08-07 Thread andy petrella
Exactly! The sharing part is used in the Spark Notebook (this one <https://github.com/andypetrella/spark-notebook/blob/master/notebooks/Tachyon%20Test.snb>) so we can share stuffs between notebooks which are different SparkContext (in diff JVM). OTOH, we have a project that creates

Re: Spark is in-memory processing, how then can Tachyon make Spark faster?

2015-08-07 Thread Calvin Jia
Hi, Tachyon <http://tachyon-project.org> manages memory off heap which can help prevent long GC pauses. Also, using Tachyon will allow the data to be shared between Spark jobs if they use the same dataset. Here's <http://www.meetup.com/Tachyon/events/222485713/> a production use

Re: tachyon

2015-08-07 Thread Calvin Jia
Hi Abhishek, Here's a production use case that may interest you: http://www.meetup.com/Tachyon/events/222485713/ <http://www.google.com/url?q=http%3A%2F%2Fwww.meetup.com%2FTachyon%2Fevents%2F222485713%2F&sa=D&sntz=1&usg=AFQjCNF3CUpEO3tNzziAS-FoYzqINijNCA> Baidu is using

Re: tachyon

2015-08-07 Thread Ted Yu
Looks like you would get better response on Tachyon's mailing list: https://groups.google.com/forum/?fromgroups#!forum/tachyon-users Cheers On Fri, Aug 7, 2015 at 9:56 AM, Abhishek R. Singh < abhis...@tetrationanalytics.com> wrote: > Do people use Tachyon in production, or is i

tachyon

2015-08-07 Thread Abhishek R. Singh
Do people use Tachyon in production, or is it experimental grade still? Regards, Abhishek - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Spark is in-memory processing, how then can Tachyon make Spark faster?

2015-08-07 Thread Muler
Spark is an in-memory engine and attempts to do computation in-memory. Tachyon is memory-centeric distributed storage, OK, but how would that help ran Spark faster?

Re: How can I use Tachyon with SPARK?

2015-06-15 Thread Himanshu Mehra
Hi June, As i understand your problem, you are running spark 1.3 and want to use Tachyon with it. what you need to do is simply build the latest Spark and Tachyon and set some configuration is Spark. In fact spark 1.3 has "spark/core/pom.xm", you have to find the "core" folde

Re: Spark Streaming with Tachyon : Data Loss on Receiver Failure due to WAL error

2015-05-21 Thread Dibyendu Bhattacharya
Hi Tathagata, Thanks for looking into this. Further investigating I found that the issue is with Tachyon does not support File Append. The streaming receiver which writes to WAL when failed, and again restarted, not able to append to same WAL file after restart. I raised this with Tachyon user

Re: Spark Streaming with Tachyon : Data Loss on Receiver Failure due to WAL error

2015-05-21 Thread Tathagata Das
some fault tolerant testing of Spark Streaming with Tachyon as > OFF_HEAP block store. As I said in earlier email, I could able to solve the > BlockNotFound exception when I used Hierarchical Storage of Tachyon , > which is good. > > I continue doing some testing around storing the S

Fast big data analytics with Spark on Tachyon in Baidu

2015-05-12 Thread Haoyuan Li
Dear all, We’re organizing a meetup <http://www.meetup.com/Tachyon/events/222485713/> on May 28th at IBM in Forster City that might be of interest to the Spark community. The focus is a production use case of Spark and Tachyon at Baidu. You can sign up here: http://www.meetup.com/Tachyon/

Re: Spark SQL 1.3.1 "saveAsParquetFile" will output tachyon file with different block size

2015-04-28 Thread Calvin Jia
Hi, You can apply this patch <https://github.com/apache/spark/pull/5354> and recompile. Hope this helps, Calvin On Tue, Apr 28, 2015 at 1:19 PM, sara mustafa wrote: > Hi Zhang, > > How did you compile Spark 1.3.1 with Tachyon? when i changed Tachyon > version > to 0.6.3

Re: tachyon on machines launched with spark-ec2 scripts

2015-04-24 Thread Haoyuan Li
Daniel, Instead of using localhost:19998, you may want to use the real ip address TachyonMaster is configured. You should be able to see more info in Tachyon's UI as well. More info could be found here: http://tachyon-project.org/master/Running-Tachyon-on-EC2.html Best, Haoyuan On Fri, A

tachyon on machines launched with spark-ec2 scripts

2015-04-24 Thread Daniel Mahler
I have a cluster launched with spark-ec2. I can see a TachyonMaster process running, but I do not seem to be able to use tachyon from the spark-shell. if I try rdd.saveAsTextFile("tachyon://localhost:19998/path") I get 15/04/24 19:18:31 INFO TaskSetManager: Starting task 12.2 in stag

Re: Why does the HDFS parquet file generated by Spark SQL have different size with those on Tachyon?

2015-04-17 Thread Reynold Xin
ngfei wrote: > Hi, > I did some tests on Parquet Files with Spark SQL DataFrame API. > I generated 36 gzip compressed parquet files by Spark SQL and stored them > on Tachyon,The size of each file is about 222M.Then read them with below > code. > val tfs > =sqlContext.parque

Re: Spark SQL 1.3.1 "saveAsParquetFile" will output tachyon file with different block size

2015-04-14 Thread Cheng Lian
Would you mind to open a JIRA for this? I think your suspicion makes sense. Will have a look at this tomorrow. Thanks for reporting! Cheng On 4/13/15 7:13 PM, zhangxiongfei wrote: Hi experts I run below code in Spark Shell to access parquet files in Tachyon. 1.First,created a DataFrame by

Re: deployment of spark on mesos and data locality in tachyon/hdfs

2015-04-01 Thread Haoyuan Li
Response inline. On Tue, Mar 31, 2015 at 10:41 PM, Sean Bigdatafun wrote: > (resending...) > > I was thinking the same setup… But the more I think of this problem, and > the more interesting this could be. > > If we allocate 50% total memory to Tachyon statically, then the M

Re: deployment of spark on mesos and data locality in tachyon/hdfs

2015-03-31 Thread Sean Bigdatafun
(resending...) I was thinking the same setup… But the more I think of this problem, and the more interesting this could be. If we allocate 50% total memory to Tachyon statically, then the Mesos benefits of dynamically scheduling resources go away altogether. Can Tachyon be resource managed by

Re: deployment of spark on mesos and data locality in tachyon/hdfs

2015-03-31 Thread Haoyuan Li
Ankur, Response inline. On Tue, Mar 31, 2015 at 4:49 PM, Ankur Chauhan wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hi Haoyuan, > > So on each mesos slave node I should allocate/section off some amount > of memory for tachyon (let's say 50% of the

Re: deployment of spark on mesos and data locality in tachyon/hdfs

2015-03-31 Thread Ankur Chauhan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Haoyuan, So on each mesos slave node I should allocate/section off some amount of memory for tachyon (let's say 50% of the total memory) and the rest for regular mesos tasks? This means, on each slave node I would have tachyon worker (+

Re: deployment of spark on mesos and data locality in tachyon/hdfs

2015-03-31 Thread Haoyuan Li
Tachyon should be co-located with Spark in this case. Best, Haoyuan On Tue, Mar 31, 2015 at 4:30 PM, Ankur Chauhan wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hi, > > I am fairly new to the spark ecosystem and I have been trying to setup > a spark on meso

deployment of spark on mesos and data locality in tachyon/hdfs

2015-03-31 Thread Ankur Chauhan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I am fairly new to the spark ecosystem and I have been trying to setup a spark on mesos deployment. I can't seem to figure out the "best practices" around HDFS and Tachyon. The documentation about Spark's data-locality section

Re: rdd.toDF().saveAsParquetFile("tachyon://host:19998/test")

2015-03-27 Thread Yin Huai
You are hitting https://issues.apache.org/jira/browse/SPARK-6330. It has been fixed in 1.3.1, which will be released soon. On Fri, Mar 27, 2015 at 10:42 PM, sud_self <852677...@qq.com> wrote: > spark version is 1.3.0 with tanhyon-0.6.1 > > QUESTION DESCRIPTION: rdd.saveAsObje

rdd.toDF().saveAsParquetFile("tachyon://host:19998/test")

2015-03-27 Thread sud_self
spark version is 1.3.0 with tanhyon-0.6.1 QUESTION DESCRIPTION: rdd.saveAsObjectFile("tachyon://host:19998/test") and rdd.saveAsTextFile("tachyon://host:19998/test") succeed, but rdd.toDF().saveAsParquetFile("tachyon://host:19998/test&

Re: Re: Building spark over specified tachyon

2015-03-15 Thread fightf...@163.com
Thanks haoyuan. fightf...@163.com From: Haoyuan Li Date: 2015-03-16 12:59 To: fightf...@163.com CC: Shao, Saisai; user Subject: Re: RE: Building spark over specified tachyon Here is a patch: https://github.com/apache/spark/pull/4867 On Sun, Mar 15, 2015 at 8:46 PM, fightf...@163.com wrote

Re: RE: Building spark over specified tachyon

2015-03-15 Thread Haoyuan Li
Here is a patch: https://github.com/apache/spark/pull/4867 On Sun, Mar 15, 2015 at 8:46 PM, fightf...@163.com wrote: > Thanks, Jerry > I got that way. Just to make sure whether there can be some option to > directly > specifying tachyon version. > > > --

Re: RE: Building spark over specified tachyon

2015-03-15 Thread fightf...@163.com
Thanks, Jerry I got that way. Just to make sure whether there can be some option to directly specifying tachyon version. fightf...@163.com From: Shao, Saisai Date: 2015-03-16 11:10 To: fightf...@163.com CC: user Subject: RE: Building spark over specified tachyon I think you could change the

RE: Building spark over specified tachyon

2015-03-15 Thread Shao, Saisai
I think you could change the pom file under Spark project to update the Tachyon related dependency version and rebuild it again (in case API is compatible, and behavior is the same). I'm not sure is there any command you can use to compile against Tachyon version. Thanks Jerry From: f

Building spark over specified tachyon

2015-03-15 Thread fightf...@163.com
Hi, all Noting that the current spark releases are built-in with tachyon 0.5.0 , if we want to recompile spark with maven and targeting on specific tachyon version (let's say the most recent 0.6.0 release), how should that be done? What maven compile command should be like ? Thanks

Re: A way to share RDD directly using Tachyon?

2015-03-09 Thread Akhil Das
Did you try something like: myRDD.saveAsObjectFile("tachyon://localhost:19998/Y") val newRDD = sc.objectFile[MyObject]("tachyon://localhost:19998/Y") Thanks Best Regards On Sun, Mar 8, 2015 at 3:59 PM, Yijie Shen wrote: > Hi, > > I would like to share a RDD

A way to share RDD directly using Tachyon?

2015-03-08 Thread Yijie Shen
Hi, I would like to share a RDD in several Spark Applications,  i.e, create one in application A, publish the ID somewhere and get the RDD back directly using ID in Application B. I know I can use Tachyon just as a filesystem and  s.saveAsTextFile("tachyon://localhost:19998/Y”) like this.

Store the shuffled files in memory using Tachyon

2015-03-06 Thread sara mustafa
Hi all, Is it possible to store Spark shuffled files on a distributed memory like Tachyon instead of spilling them to disk? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Store-the-shuffled-files-in-memory-using-Tachyon-tp21944.html Sent from the Apache

Re: Spark or Tachyon: capture data lineage

2015-01-02 Thread Sven Krasser
Agreed with Jerry. Aside from Tachyon, seeing this for general debugging would be very helpful. Haoyuan, is that feature you are referring to related to https://issues.apache.org/jira/browse/SPARK-975? In the interim, I've found the "toDebugString()" method useful (but it renders

Re: Spark or Tachyon: capture data lineage

2015-01-02 Thread Haoyuan Li
Jerry, Great question. Spark and Tachyon capture lineage information at different granularities. We are working on an integration between Spark/Tachyon about this. Hope to get it ready to be released soon. Best, Haoyuan On Fri, Jan 2, 2015 at 12:24 PM, Jerry Lam wrote: > Hi spark develop

Spark or Tachyon: capture data lineage

2015-01-02 Thread Jerry Lam
Hi spark developers, I was thinking it would be nice to extract the data lineage information from a data processing pipeline. I assume that spark/tachyon keeps this information somewhere. For instance, a data processing pipeline uses datasource A and B to produce C. C is then used by another

Re: UpdateStateByKey persist to Tachyon

2014-12-31 Thread amkcom
bumping this thread up -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/UpdateStateByKey-persist-to-Tachyon-tp20798p20930.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark on Tachyon

2014-12-20 Thread Peng Cheng
IMHO: cache doesn't provide redundancy, and its in the same jvm, so its much faster. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-Tachyon-tp1463p20800.html Sent from the Apache Spark User List mailing list archive at Nabbl

spark-ec2 starts hdfs1, tachyon but not spark

2014-12-17 Thread Al Thompson
myclus All seem to run OK. However, I got no web UI's for spark master or slave. Logging into the nodes, I see HDFS and Tachyon processes but none for Spark. The /root/tachyon folder has a full complement of files including conf, logs and so forth: $ ls /root/tachyon bin docs libexec

Re: Persist kafka streams to text file, tachyon error?

2014-11-22 Thread Haoyuan Li
StorageLevel.OFF_HEAP requires to run Tachyon: http://spark.apache.org/docs/latest/programming-guide.html If you don't know if you have tachyon or not, you probably don't :) http://tachyon-project.org/ For local testing, you can use other persist() solutions without running Tach

Persist kafka streams to text file, tachyon error?

2014-11-21 Thread Joanne Contact
related to Tachyon. But I don't know if I have tachyon or not. 14/11/21 14:17:54 WARN storage.TachyonBlockManager: Attempt 1 to create tachyon dir null failed java.io.IOException: Failed to connect to master localhost/127.0.0.1:19998 after 5 attempts at tachyon.client.TachyonFS.co

Storing shuffle files on a Tachyon

2014-10-07 Thread Soumya Simanta
Is it possible to store spark shuffle files on Tachyon ?

Fwd: Second Bay Area Tachyon meetup: October 21st, hosted by Pivotal (Limited Space)

2014-10-02 Thread Haoyuan Li
-- Forwarded message -- From: Haoyuan Li Date: Thu, Oct 2, 2014 at 10:12 AM Subject: Second Bay Area Tachyon meetup: October 21st, hosted by Pivotal (Limited Space) To: tachyon-us...@googlegroups.com Hi folks, We've posted the second Tachyon meetup featuring exciting up

Re: spark-ec2 script with Tachyon

2014-09-26 Thread mrm
Hi, Did you manage to figure this out? I would appreciate if you could share the answer. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-ec2-script-with-Tachyon-tp9996p15249.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: [Tachyon] Error reading from Parquet files in HDFS

2014-08-21 Thread Evan Chan
And it worked earlier with non-parquet directory. On Thu, Aug 21, 2014 at 12:22 PM, Evan Chan wrote: > The underFS is HDFS btw. > > On Thu, Aug 21, 2014 at 12:22 PM, Evan Chan wrote: >> Spark 1.0.2, Tachyon 0.4.1, Hadoop 1.0 (standard EC2 config) >>

Re: [Tachyon] Error reading from Parquet files in HDFS

2014-08-21 Thread Evan Chan
The underFS is HDFS btw. On Thu, Aug 21, 2014 at 12:22 PM, Evan Chan wrote: > Spark 1.0.2, Tachyon 0.4.1, Hadoop 1.0 (standard EC2 config) > > scala> val gdeltT = > sqlContext.parquetFile("tachyon://172.31.42.40:19998/gdelt-parquet/1979-2005/") > 14/08/21 19:07:14

[Tachyon] Error reading from Parquet files in HDFS

2014-08-21 Thread Evan Chan
Spark 1.0.2, Tachyon 0.4.1, Hadoop 1.0 (standard EC2 config) scala> val gdeltT = sqlContext.parquetFile("tachyon://172.31.42.40:19998/gdelt-parquet/1979-2005/") 14/08/21 19:07:14 INFO : initialize(tachyon://172.31.42.40:19998/gdelt-parquet/1979-2005, Configuration: core-defa

Re: First Bay Area Tachyon meetup: August 25th, hosted by Yahoo! (Limited Space)

2014-08-19 Thread Christopher Nguyen
Fantastic! Sent while mobile. Pls excuse typos etc. On Aug 19, 2014 4:09 PM, "Haoyuan Li" wrote: > Hi folks, > > We've posted the first Tachyon meetup, which will be on August 25th and is > hosted by Yahoo! (Limited Space): > http://www.meetup.com/Tachyon/event

First Bay Area Tachyon meetup: August 25th, hosted by Yahoo! (Limited Space)

2014-08-19 Thread Haoyuan Li
Hi folks, We've posted the first Tachyon meetup, which will be on August 25th and is hosted by Yahoo! (Limited Space): http://www.meetup.com/Tachyon/events/200387252/ . Hope to see you there! Best, Haoyuan -- Haoyuan Li AMPLab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/

Re: share/reuse off-heap persisted (tachyon) RDD in SparkContext or saveAsParquetFile on tachyon in SQLContext

2014-08-12 Thread chutium
more interesting is if spark-shell started on master node (test01) then parquetFile.saveAsParquetFile("tachyon://test01.zala:19998/parquet_tablex") 14/08/12 11:42:06 INFO : initialize(tachyon://... ... ... 14/08/12 11:42:06 INFO : File does not exist: tachyon://test01.zala:19998/parq

Re: share/reuse off-heap persisted (tachyon) RDD in SparkContext or saveAsParquetFile on tachyon in SQLContext

2014-08-12 Thread chutium
spark.speculation was not set, any speculative execution on tachyon side? tachyon-env.sh only changed following export TACHYON_MASTER_ADDRESS=test01.zala #export TACHYON_UNDERFS_ADDRESS=$TACHYON_HOME/underfs export TACHYON_UNDERFS_ADDRESS=hdfs://test01.zala:8020 export TACHYON_WORKER_MEMORY_SIZE

Re: share/reuse off-heap persisted (tachyon) RDD in SparkContext or saveAsParquetFile on tachyon in SQLContext

2014-08-11 Thread Haoyuan Li
Is the speculative execution enabled? Best, Haoyuan On Mon, Aug 11, 2014 at 8:08 AM, chutium wrote: > sharing /reusing RDDs is always useful for many use cases, is this possible > via persisting RDD on tachyon? > > such as off heap persist a named RDD into a given path

share/reuse off-heap persisted (tachyon) RDD in SparkContext or saveAsParquetFile on tachyon in SQLContext

2014-08-11 Thread chutium
sharing /reusing RDDs is always useful for many use cases, is this possible via persisting RDD on tachyon? such as off heap persist a named RDD into a given path (instead of /tmp_spark_tachyon/spark-xxx-xxx-xxx) or saveAsParquetFile on tachyon i tried to save a SchemaRDD on tachyon, val

Re: Spark-sql with Tachyon cache

2014-08-02 Thread Michael Armbrust
We are investigating various ways to integrate with Tachyon. I'll note that you can already use saveAsParquetFile and parquetFile(...).registerAsTable("tableName") (soon to be registerTempTable in Spark 1.1) to store data into tachyon and query it with Spark SQL. On Fri, Aug 1,

Spark-sql with Tachyon cache

2014-08-01 Thread Dariusz Kobylarz
Hi, I would like to ask if spark-sql tables cached by Tachyon is a feature to be migrated from shark. I imagine from the user perspective it would look like this: |CREATE TABLE data TBLPROPERTIES("sparksql.cache" = "tachyon") AS SELECT a, b, c from data_on_disk WHERE month="May";|

spark-ec2 script with Tachyon

2014-07-16 Thread nit
Hi, It seems that spark-ec2 script deploys Tachyon module along with other setup. I am trying to use .persist(OFF_HEAP) for RDD persistence, but on worker I see this error -- Failed to connect (2) to master localhost/127.0.0.1:19998 : java.net.ConnectException: Connection refused -- >F

Re: Cannot create dir in Tachyon when running Spark with OFF_HEAP caching (FileDoesNotExistException)

2014-07-08 Thread Teng Long
More updates: Seems in TachyonBlockManager.scala(line 118) of Spark 1.1.0, the TachyonFS.mkdir() method is called, which creates a directory in Tachyon. Right after that, TachyonFS.getFile() method is called. In all the versions of Tachyon I tried (0.4.1, 0.4.0), the second method will return a

Cannot create dir in Tachyon when running Spark with OFF_HEAP caching (FileDoesNotExistException)

2014-07-07 Thread Teng Long
Hi guys, I'm running Spark 1.0.0 with Tachyon 0.4.1, both in single node mode. Tachyon's own tests (./bin/tachyon runTests) works good, and manual file system operation like mkdir works well. But when I tried to run a very simple Spark task with RDD persist as OFF_HEAP, I got the