It will probably eventually make its way into part of the query engine, one
way or another. Note that there are in general a lot of other lower hanging
fruits before you have to do vectorization.
As far as I know, Hive doesn't really have vectorization because the
vectorization in Hive is simply w
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
Enjoy!
Alex
On Mon, Jan 19, 2015 at 6:44 PM, Jeff Wang
wrote:
> Hi:
>
> I would like to contribute to the code of spark. Can I join the community?
>
> Thanks,
>
> Jeff
>
Hi:
I would like to contribute to the code of spark. Can I join the community?
Thanks,
Jeff
Hi,
Correct me if I were wrong. It looks like, the current version of
Spark-SQL is *tuple-at-a-time* module. Basically, each time the physical
operator produces a tuple by recursively call child->execute .
There are papers that illustrate the benefits of vectorized query
engine. And Hiv
GraphX ShortestPaths seems to be following edges backwards instead of forwards:
import org.apache.spark.graphx._
val g = Graph(sc.makeRDD(Array((1L,""), (2L,""), (3L,""))),
sc.makeRDD(Array(Edge(1L,2L,""), Edge(2L,3L,""
lib.ShortestPaths.run(g,Array(3)).vertices.collect
res1: Array[(org.apac
But wouldn't the gain be greater under something similar to EdgePartition1D
(but perhaps better load-balanced based on number of edges for each vertex) and
an algorithm that primarily follows edges in the forward direction?
From: Ankur Dave
To: Michael Malak
Cc: "dev@spark.apache.org"
No - the vertices are hash-partitioned onto workers independently of the
edges. It would be nice for each vertex to be on the worker with the most
adjacent edges, but we haven't done this yet since it would add a lot of
complexity to avoid load imbalance while reducing the overall communication
by
Does GraphX make an effort to co-locate vertices onto the same workers as the
majority (or even some) of its edges?
-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache
The wiki does not seem to be operational ATM, but I will do this when
it is back up.
On Mon, Jan 19, 2015 at 12:00 PM, Patrick Wendell wrote:
> Okay - so given all this I was going to put the following on the wiki
> tentatively:
>
> ## Reviewing Code
> Community code review is Spark's fundamental
Okay - so given all this I was going to put the following on the wiki
tentatively:
## Reviewing Code
Community code review is Spark's fundamental quality assurance
process. When reviewing a patch, your goal should be to help
streamline the committing process by giving committers confidence this
pa
Definitely go for a pull request!
On Mon, Jan 19, 2015 at 10:10 AM, Mick Davies
wrote:
>
> Looking at Parquet code - it looks like hooks are already in place to
> support this.
>
> In particular PrimitiveConverter has methods hasDictionarySupport and
> addValueFromDictionary for this purpose. T
Looking at Parquet code - it looks like hooks are already in place to
support this.
In particular PrimitiveConverter has methods hasDictionarySupport and
addValueFromDictionary for this purpose. These are not used by
CatalystPrimitiveConverter.
I think that it would be pretty straightforward to
Hi Reynold.
I'll take a look.
SPARK-5300 is open for this issue.
-Ewan
On 19/01/15 08:39, Reynold Xin wrote:
Hi Ewan,
Not sure if there is a JIRA ticket (there are too many that I lose track).
I chatted briefly with Aaron on this. The way we can solve it is to
create a new FileSystem impleme
Here are some timings showing effect of caching last Binary->String
conversion. Query times are reduced significantly and variation in timings
due to reduction in garbage is very significant.
Set of sample queries selecting various columns, applying some filtering and
then aggregating
Spark 1.2.0
"in yarn-client mode it only controls the environment of the executor
launcher"
So you either use yarn-client mode, and then your app keeps running and
controlling the process
Or you use yarn-cluster mode, and then you send a jar to YARN, and that jar
should have code to report the result back to
Patrick's original proposal LGTM :). However until now, I have been in the
impression of LGTM with special emphasis on TM part. That said, I will be
okay/happy(or Responsible ) for the patch, if it goes in.
Prashant Sharma
On Sun, Jan 18, 2015 at 2:33 PM, Reynold Xin wrote:
> Maybe just to a
On Mon, Jan 19, 2015 at 6:29 AM, Akhil Das wrote:
> Its the executor memory (spark.executor.memory) which you can set while
> creating the spark context. By default it uses 0.6% of the executor memory
(Uses 0.6 or 60%)
-
To unsu
Added a JIRA to track
https://issues.apache.org/jira/browse/SPARK-5309
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Optimize-encoding-decoding-strings-when-using-Parquet-tp10141p10189.html
Sent from the Apache Spark Developers List mailing list arch
Is there any way to support multiple users executing SQL on one thrift
server?
I think there are some problems for spark 1.2.0, for example:
1. Start thrift server with user A
2. Connect to thrift server via beeline with user B
3. Execute “insert into table dest select … from table src”
then w
19 matches
Mail list logo