Re: Possible deadlock in registering applications in the recovery mode

2016-04-21 Thread Niranda Perera
Hi guys, any update on this? Best On Wed, Apr 20, 2016 at 3:00 AM, Niranda Perera wrote: > Hi Reynold, > > I have created a JIRA for this [1]. I have also created a PR for the same > issue [2]. > > Would be very grateful if you could look into this, because this is a > blocker in our spark dep

Re: [GRAPHX] Graph Algorithms and Spark

2016-04-21 Thread Denny Lee
BTW, we recently had a webinar on GraphFrames at http://go.databricks.com/graphframes-dataframe-based-graphs-for-apache-spark On Thu, Apr 21, 2016 at 14:30 Dimitris Kouzis - Loukas wrote: > This thread is good. Maybe it should make it to doc or the users group > > On Thu, Apr 21, 2016 at 9:25 PM

Re: [GRAPHX] Graph Algorithms and Spark

2016-04-21 Thread Dimitris Kouzis - Loukas
This thread is good. Maybe it should make it to doc or the users group On Thu, Apr 21, 2016 at 9:25 PM, Zhan Zhang wrote: > > You can take a look at this blog from data bricks about GraphFrames > > https://databricks.com/blog/2016/03/03/introducing-graphframes.html > > Thanks. > > Zhan Zhang > >

Re: Improving system design logging in spark

2016-04-21 Thread Ali Tootoonchian
Hi, My point for #2 is distinguishing between how long does it take for each task to read a data from disk and transfer it through network to targeted node. As I know (correct me if I'm wrong) block time to fetch data includes both reading a data by remote node and transferring it to requested nod

Re: RFC: Remote "HBaseTest" from examples?

2016-04-21 Thread Ted Yu
Zhan: I have mentioned the JIRA numbers in the thread starting with (note the typo in subject of this thread): RFC: Remove ... On Thu, Apr 21, 2016 at 1:28 PM, Zhan Zhang wrote: > FYI: There are several pending patches for DataFrame support on top of > HBase. > > Thanks. > > Zhan Zhang > > On A

Re: [Spark-SQL] Reduce Shuffle Data by pushing filter toward storage

2016-04-21 Thread atootoonchian
I create an issue in Spark project: SPARK-14820 -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-Reduce-Shuffle-Data-by-pushing-filter-toward-storage-tp17297p17306.html Sent from the Apache Spark Developers List mailing list archive at Nabble.

Re: [GRAPHX] Graph Algorithms and Spark

2016-04-21 Thread Zhan Zhang
You can take a look at this blog from data bricks about GraphFrames https://databricks.com/blog/2016/03/03/introducing-graphframes.html Thanks. Zhan Zhang On Apr 21, 2016, at 12:53 PM, Robin East mailto:robin.e...@xense.co.uk>> wrote: Hi Aside from LDA, which is implemented in MLLib, GraphX

Re: RFC: Remote "HBaseTest" from examples?

2016-04-21 Thread Zhan Zhang
FYI: There are several pending patches for DataFrame support on top of HBase. Thanks. Zhan Zhang On Apr 20, 2016, at 2:43 AM, Saisai Shao mailto:sai.sai.s...@gmail.com>> wrote: +1, HBaseTest in Spark Example is quite old and obsolete, the HBase connector in HBase repo has evolved a lot, it wo

Re: [GRAPHX] Graph Algorithms and Spark

2016-04-21 Thread Robin East
Hi Aside from LDA, which is implemented in MLLib, GraphX has the following built-in algorithms: PageRank/Personalised PageRank Connected Components Strongly Connected Components Triangle Count Shortest Paths Label Propagation It also implements a version of Pregel framework, a form of bulk-sync

Re: [GRAPHX] Graph Algorithms and Spark

2016-04-21 Thread Krishna Sankar
Hi, 1. Yep, GraphX is stable and would be a good choice for you to implement algorithms. For a quick intro you can refer to our Strata MLlib tutorial GraphX slides http://goo.gl/Ffq2Az 2. GraphX has implemented algorithms like PageRank & ConnectedComponents[1] 3. It also has prim

[GRAPHX] Graph Algorithms and Spark

2016-04-21 Thread tgensol
Hi there, I am working in a group of the University of Michigan, and we are trying to make (and find first) some Distributed graph algorithms. I know spark, and I found GraphX. I read the docs, but I only found Latent Dirichlet Allocation algorithms working with GraphX, so I was wondering why ?

Re: [Spark-SQL] Reduce Shuffle Data by pushing filter toward storage

2016-04-21 Thread Ted Yu
Interesting analysis. Can you log a JIRA ? > On Apr 21, 2016, at 11:07 AM, atootoonchian wrote: > > SQL query planner can have intelligence to push down filter commands towards > the storage layer. If we optimize the query planner such that the IO to the > storage is reduced at the cost of run

Re: [Spark-SQL] Reduce Shuffle Data by pushing filter toward storage

2016-04-21 Thread atootoonchian
Hi Marcin I attached a pdf format of issue. Reduce_Shuffle_Data_by_pushing_filter_toward_storage.pdf -- View this message in context: http://apache-spark-develop

Re: [Spark-SQL] Reduce Shuffle Data by pushing filter toward storage

2016-04-21 Thread Marcin Tustin
I think that's an important result. Could you format your email to split out your parts a little more? It all runs together for me in gmail, so it's hard to follow, and I very much would like to. On Thu, Apr 21, 2016 at 2:07 PM, atootoonchian wrote: > SQL query planner can have intelligence to p

[Spark-SQL] Reduce Shuffle Data by pushing filter toward storage

2016-04-21 Thread atootoonchian
SQL query planner can have intelligence to push down filter commands towards the storage layer. If we optimize the query planner such that the IO to the storage is reduced at the cost of running multiple filters (i.e., compute), this should be desirable when the system is IO bound. An example to pr

[Spark-SQL] Reduce Shuffle Data by pushing filter toward storage

2016-04-21 Thread atootoonchian
SQL query planner can have intelligence to push down filter commands towards the storage layer. If we optimize the query planner such that the IO to the storage is reduced at the cost of running multiple filters (i.e., compute), this should be desirable when the system is IO bound. An example to pr