Hi,
you are right, that is an interesting question.
Looks like GROUP BY is doing something funny / magic here (spark-shell
3.3.1 and 3.5.0-SNAPSHOT):
With an alias, it behaves as you have pointed out:
spark.range(3).createTempView("ids_without_dots")
spark.sql("SELECT * FROM
Hi Mich.
This is a Spark user group mailing list where people can ask *any*
questions about spark.
You know SQL and streaming, but I don't think it's necessary to start a
replay with "*LOL*" to the question that's being asked.
No questions are to stupid to be asked.
lør. 28. jan. 2023 kl. 09:22
LOL
First one
spark-sql> select 1 as `data.group` from abc group by data.group;
1
Time taken: 0.198 seconds, Fetched 1 row(s)
means that are assigning alias data.group to select and you are using that
alias -> data.group in your group by statement
This is equivalent to
spark-sql> select 1
this SQL works
select 1 as *`data.group`* from tbl group by *data.group*
Since there's no such field as *data,* I thought the SQL has to look like
this
select 1 as *`data.group`* from tbl group by `*data.group`*
But that gives and error (cannot resolve '`data.group`') ... I'm no expert
in
Cool thanks. Will take a look...
Sent from my iPhone
On Jul 13, 2015, at 6:40 PM, Michael Armbrust mich...@databricks.com wrote:
I'd look at the JDBC server (a long running yarn job you can submit queries
too)
Hi,
I have a question for Spark SQL. Is there a way to be able to use
Spark SQL on YARN without having to submit a job?
Bottom line here is I want to be able to reduce the latency of
running queries as a job. I know that the spark sql default submission
is like a job, but was wondering if
Well for adhoc queries you can use the CLI
On Mon, Jul 13, 2015 at 5:34 PM, Ron Gonzalez zlgonza...@yahoo.com.invalid
wrote:
Hi,
I have a question for Spark SQL. Is there a way to be able to use Spark
SQL on YARN without having to submit a job?
Bottom line here is I want to be able to
I'd look at the JDBC server (a long running yarn job you can submit queries
too)
https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server
On Mon, Jul 13, 2015 at 6:31 PM, Jerrick Hoang jerrickho...@gmail.com
wrote:
Well for adhoc queries you can use
...@spark.apache.org; user
Subject: Re: Spark SQL question: why build hashtable for both sides in
HashOuterJoin?
Hi Haopu,
How about full outer join? One hash table may not be efficient for this case.
Liquan
On Mon, Sep 29, 2014 at 11:47 PM, Haopu Wang hw...@qilinsoft.com wrote:
Hi
...@spark.apache.org; user
Subject: Re: Spark SQL question: why build hashtable for both sides in
HashOuterJoin?
Hi Haopu,
How about full outer join? One hash table may not be efficient for this case.
Liquan
On Mon, Sep 29, 2014 at 11:47 PM, Haopu Wang hw...@qilinsoft.com wrote:
Hi, Liquan
*To:* Haopu Wang
*Cc:* d...@spark.apache.org; user
*Subject:* Re: Spark SQL question: why build hashtable for both sides in
HashOuterJoin?
Hi Haopu,
How about full outer join? One hash table may not be efficient for this
case.
Liquan
On Mon, Sep 29, 2014 at 11:47 PM, Haopu Wang hw
!
From: Liquan Pei [mailto:liquan...@gmail.com]
Sent: 2014年9月30日 12:31
To: Haopu Wang
Cc: d...@spark.apache.org; user
Subject: Re: Spark SQL question: why build hashtable for both sides in
HashOuterJoin?
Hi Haopu,
My understanding is that the hashtable on both left and right side
...@spark.apache.org; user
*Subject:* Re: Spark SQL question: why build hashtable for both sides in
HashOuterJoin?
Hi Haopu,
My understanding is that the hashtable on both left and right side is used
for including null values in result in an efficient manner. If hash table
is only built on one
I take a look at HashOuterJoin and it's building a Hashtable for both
sides.
This consumes quite a lot of memory when the partition is big. And it
doesn't reduce the iteration on streamed relation, right?
Thanks!
-
To
Hi Haopu,
My understanding is that the hashtable on both left and right side is used
for including null values in result in an efficient manner. If hash table
is only built on one side, let's say left side and we perform a left outer
join, for each row in left side, a scan over the right side is
[mailto:lian.cs@gmail.com]
Sent: 2014年9月26日 21:24
To: Haopu Wang; user@spark.apache.org
Subject: Re: Spark SQL question: is cached SchemaRDD storage controlled by
spark.storage.memoryFraction?
Yes it is. The in-memory storage used with SchemaRDD also uses RDD.cache()
under the hood.
On 9/26/14 4
Replicated. How can I
change the storage level? Because I have a big table there.
Thanks!
--
*From:* Cheng Lian [mailto:lian.cs@gmail.com]
*Sent:* 2014年9月26日 21:24
*To:* Haopu Wang; user@spark.apache.org
*Subject:* Re: Spark SQL question: is cached SchemaRDD
have a big table there.
Thanks!
--
*From:* Cheng Lian [mailto:lian.cs@gmail.com]
*Sent:* 2014年9月26日 21:24
*To:* Haopu Wang; user@spark.apache.org
*Subject:* Re: Spark SQL question: is cached SchemaRDD storage
controlled by spark.storage.memoryFraction
Hi, I'm querying a big table using Spark SQL. I see very long GC time in
some stages. I wonder if I can improve it by tuning the storage
parameter.
The question is: the schemaRDD has been cached with cacheTable()
function. So is the cached schemaRDD part of memory storage controlled
by the
Yes it is. The in-memory storage used with |SchemaRDD| also uses
|RDD.cache()| under the hood.
On 9/26/14 4:04 PM, Haopu Wang wrote:
Hi, I'm querying a big table using Spark SQL. I see very long GC time in
some stages. I wonder if I can improve it by tuning the storage
parameter.
The
-- Forwarded message --
From: Liquan Pei liquan...@gmail.com
Date: Fri, Sep 26, 2014 at 1:33 AM
Subject: Re: Spark SQL question: is cached SchemaRDD storage controlled by
spark.storage.memoryFraction?
To: Haopu Wang hw...@qilinsoft.com
Hi Haopu,
Internally, cactheTable
21 matches
Mail list logo