date:20150119

Re: Will Spark-SQL support vectorized query engine someday?

2015-01-19 Thread Reynold Xin

It will probably eventually make its way into part of the query engine, one way or another. Note that there are in general a lot of other lower hanging fruits before you have to do vectorization. As far as I know, Hive doesn't really have vectorization because the vectorization in Hive is simply w

Re: Join the developer community of spark

2015-01-19 Thread Alessandro Baretta

https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark Enjoy! Alex On Mon, Jan 19, 2015 at 6:44 PM, Jeff Wang wrote: > Hi: > > I would like to contribute to the code of spark. Can I join the community? > > Thanks, > > Jeff >

Join the developer community of spark

2015-01-19 Thread Jeff Wang

Hi: I would like to contribute to the code of spark. Can I join the community? Thanks, Jeff

Will Spark-SQL support vectorized query engine someday?

2015-01-19 Thread Xuelin Cao

Hi, Correct me if I were wrong. It looks like, the current version of Spark-SQL is *tuple-at-a-time* module. Basically, each time the physical operator produces a tuple by recursively call child->execute . There are papers that illustrate the benefits of vectorized query engine. And Hiv

GraphX ShortestPaths backwards?

2015-01-19 Thread Michael Malak

GraphX ShortestPaths seems to be following edges backwards instead of forwards: import org.apache.spark.graphx._ val g = Graph(sc.makeRDD(Array((1L,""), (2L,""), (3L,""))), sc.makeRDD(Array(Edge(1L,2L,""), Edge(2L,3L,"" lib.ShortestPaths.run(g,Array(3)).vertices.collect res1: Array[(org.apac

Re: GraphX vertex partition/location strategy

2015-01-19 Thread Michael Malak

But wouldn't the gain be greater under something similar to EdgePartition1D (but perhaps better load-balanced based on number of edges for each vertex) and an algorithm that primarily follows edges in the forward direction? From: Ankur Dave To: Michael Malak Cc: "dev@spark.apache.org"

Re: GraphX vertex partition/location strategy

2015-01-19 Thread Ankur Dave

No - the vertices are hash-partitioned onto workers independently of the edges. It would be nice for each vertex to be on the worker with the most adjacent edges, but we haven't done this yet since it would add a lot of complexity to avoid load imbalance while reducing the overall communication by

GraphX vertex partition/location strategy

2015-01-19 Thread Michael Malak

Does GraphX make an effort to co-locate vertices onto the same workers as the majority (or even some) of its edges? - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache

Re: Semantics of LGTM

2015-01-19 Thread Patrick Wendell

The wiki does not seem to be operational ATM, but I will do this when it is back up. On Mon, Jan 19, 2015 at 12:00 PM, Patrick Wendell wrote: > Okay - so given all this I was going to put the following on the wiki > tentatively: > > ## Reviewing Code > Community code review is Spark's fundamental

Re: Semantics of LGTM

2015-01-19 Thread Patrick Wendell

Okay - so given all this I was going to put the following on the wiki tentatively: ## Reviewing Code Community code review is Spark's fundamental quality assurance process. When reviewing a patch, your goal should be to help streamline the committing process by giving committers confidence this pa

Re: Optimize encoding/decoding strings when using Parquet

2015-01-19 Thread Reynold Xin

Definitely go for a pull request! On Mon, Jan 19, 2015 at 10:10 AM, Mick Davies wrote: > > Looking at Parquet code - it looks like hooks are already in place to > support this. > > In particular PrimitiveConverter has methods hasDictionarySupport and > addValueFromDictionary for this purpose. T

Re: Optimize encoding/decoding strings when using Parquet

2015-01-19 Thread Mick Davies

Looking at Parquet code - it looks like hooks are already in place to support this. In particular PrimitiveConverter has methods hasDictionarySupport and addValueFromDictionary for this purpose. These are not used by CatalystPrimitiveConverter. I think that it would be pretty straightforward to

Re: RDD order guarantees

2015-01-19 Thread Ewan Higgs

Hi Reynold. I'll take a look. SPARK-5300 is open for this issue. -Ewan On 19/01/15 08:39, Reynold Xin wrote: Hi Ewan, Not sure if there is a JIRA ticket (there are too many that I lose track). I chatted briefly with Aaron on this. The way we can solve it is to create a new FileSystem impleme

Re: Optimize encoding/decoding strings when using Parquet

2015-01-19 Thread Mick Davies

Here are some timings showing effect of caching last Binary->String conversion. Query times are reduced significantly and variation in timings due to reduction in garbage is very significant. Set of sample queries selecting various columns, applying some filtering and then aggregating Spark 1.2.0

Re: Spark client reconnect to driver in yarn-cluster deployment mode

2015-01-19 Thread Romi Kuntsman

"in yarn-client mode it only controls the environment of the executor launcher" So you either use yarn-client mode, and then your app keeps running and controlling the process Or you use yarn-cluster mode, and then you send a jar to YARN, and that jar should have code to report the result back to

Re: Semantics of LGTM

2015-01-19 Thread Prashant Sharma

Patrick's original proposal LGTM :). However until now, I have been in the impression of LGTM with special emphasis on TM part. That said, I will be okay/happy(or Responsible ) for the patch, if it goes in. Prashant Sharma On Sun, Jan 18, 2015 at 2:33 PM, Reynold Xin wrote: > Maybe just to a

Re: Memory config issues

2015-01-19 Thread Sean Owen

On Mon, Jan 19, 2015 at 6:29 AM, Akhil Das wrote: > Its the executor memory (spark.executor.memory) which you can set while > creating the spark context. By default it uses 0.6% of the executor memory (Uses 0.6 or 60%) - To unsu

Re: Optimize encoding/decoding strings when using Parquet

2015-01-19 Thread Mick Davies

Added a JIRA to track https://issues.apache.org/jira/browse/SPARK-5309 -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Optimize-encoding-decoding-strings-when-using-Parquet-tp10141p10189.html Sent from the Apache Spark Developers List mailing list arch

Is there any way to support multiple users executing SQL on thrift server?

2015-01-19 Thread Yi Tian

Is there any way to support multiple users executing SQL on one thrift server? I think there are some problems for spark 1.2.0, for example: 1. Start thrift server with user A 2. Connect to thrift server via beeline with user B 3. Execute “insert into table dest select … from table src” then w

Re: Will Spark-SQL support vectorized query engine someday?

Re: Join the developer community of spark

Join the developer community of spark

Will Spark-SQL support vectorized query engine someday?

GraphX ShortestPaths backwards?

Re: GraphX vertex partition/location strategy

Re: GraphX vertex partition/location strategy

GraphX vertex partition/location strategy

Re: Semantics of LGTM

Re: Semantics of LGTM

Re: Optimize encoding/decoding strings when using Parquet

Re: Optimize encoding/decoding strings when using Parquet

Re: RDD order guarantees

Re: Optimize encoding/decoding strings when using Parquet

Re: Spark client reconnect to driver in yarn-cluster deployment mode

Re: Semantics of LGTM

Re: Memory config issues

Re: Optimize encoding/decoding strings when using Parquet

Is there any way to support multiple users executing SQL on thrift server?

19 matches

Site Navigation

Mail list logo

Footer information