Congrats to Reynold et al leading this effort!
- Henry
On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia matei.zaha...@gmail.com wrote:
Hi folks,
I interrupt your regularly scheduled user / dev list to bring you some pretty
cool news for the project, which is that we've been able to use Spark
It works fine, thanks for the help Michael.
Liancheng also told me a trick, using a subquery with LIMIT n. It works in
latest 1.2.0
BTW, looks like the broadcast optimization won't be recognized if I do a
left join instead of a inner join. Is that true? How can I make it work for
left joins?
I suspect you do not actually need to change the number of partitions
dynamically.
Do you just have groupings of data to process? use an RDD of (K,V) pairs
and things like groupByKey. If really have only 1000 unique keys, yes, only
half of the 2000 workers would get data in a phase that groups by
How was the table created? Would you mind to share related code? It
seems that the underlying type of the |customer_id| field is actually
long, but the schema says it’s integer, basically it’s a type mismatch
error.
The first query succeeds because |SchemaRDD.count()| is translated to
Hmm, the details of the error didn't show in your mail...
On 10/10/14 12:25 AM, sadhan wrote:
We have a hive deployement on which we tried running spark-sql. When we try
to do describe table_name for some of the tables, spark-sql fails with
this:
while it works for some of the other tables.
Yes of course. If your number is 123456, the this takes 4 bytes as
an int. But as a String in a 64-bit JVM you have an 8-byte reference,
4-byte object overhead, a char count of 4 bytes, and 6 2-byte chars.
Maybe more i'm not thinking of.
On Sat, Oct 11, 2014 at 6:29 AM, Liam Clarke-Hutchinson
Thank you Sean. I'll try to do it externally as you suggested, however, can
you please give me some hints on how to do that? In fact, where can I find
the 1.2 implementation you just mentioned? Thanks!
On Wed, Oct 8, 2014 at 12:58 PM, Sean Owen so...@cloudera.com wrote:
Plain old SVMs don't
-- Forwarded message --
From: Sadhan Sood sadhan.s...@gmail.com
Date: Sat, Oct 11, 2014 at 10:26 AM
Subject: Re: how to find the sources for spark-project
To: Stephen Boesch java...@gmail.com
Thanks, I still didn't find it - is it under some particular branch ? More
specifically,
I found this on computer where I built Spark:
$ jar tvf
/homes/hortonzy/.m2/repository//org/spark-project/hive/hive-exec/0.13.1/hive-exec-0.13.1.jar
| grep ParquetHiveSerDe
2228 Mon Jun 02 12:50:16 UTC 2014
org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe$1.class
1442 Mon Jun 02
Hi,
My Spark version is v1.1.0 and Hive is 0.12.0, I need to use more than 1
subquery in my Spark SQL, below are my sample table structures and a SQL that
contains more than 1 subquery.
Question 1: How to load a HIVE table into Scala/Spark?
Question 2: How to implement a
I tried even without the “T” and it still returns an empty result:
scala val sRdd = sqlContext.sql(select a from x where ts = '2012-01-01
00:00:00';)
sRdd: org.apache.spark.sql.SchemaRDD =
SchemaRDD[35] at RDD at SchemaRDD.scala:103
== Query Plan ==
== Physical Plan ==
Project [a#0]
ExistingRdd
Hi spark !
I dont quite yet understand the semantics of RDDs in a streaming context
very well yet.
Are there any examples of how to implement CustomInputDStreams, with
corresponding Receivers in the docs ?
Ive hacked together a custom stream, which is being opened and is
consuming data
It's true that it is an implementation detail, but it's a very important
one to document because it has the possibility of changing results
depending on when I use take or collect. The issue I was running in to was
when the executor had a different operating system than the driver, and I
was
Very cool Denny, thanks for sharing this!
Matei
On Oct 11, 2014, at 9:46 AM, Denny Lee denny.g@gmail.com wrote:
https://www.concur.com/blog/en-us/connect-tableau-to-sparksql
If you're wondering how to connect Tableau to SparkSQL - here are the steps
to connect Tableau to SparkSQL.
Because of how closures work in Scala, there is no support for nested
map/rdd-based operations. Specifically, if you have
Context a {
Context b {
}
}
Operations within context b, when distributed across nodes, will no longer
have visibility of variables specific to context a because
Created JIRA for this: https://issues.apache.org/jira/browse/SPARK-3915
On Sat, Oct 11, 2014 at 12:40 PM, Evan Samanas evan.sama...@gmail.com wrote:
It's true that it is an implementation detail, but it's a very important one
to document because it has the possibility of changing results
16 matches
Mail list logo