SQL GROUP BY alias with dots, was: Spark SQL question

2023-02-07 Thread Enrico Minack
Hi, you are right, that is an interesting question. Looks like GROUP BY is doing something funny / magic here (spark-shell 3.3.1 and 3.5.0-SNAPSHOT): With an alias, it behaves as you have pointed out: spark.range(3).createTempView("ids_without_dots") spark.sql("SELECT * FROM

Re: Spark SQL question

2023-01-28 Thread Bjørn Jørgensen
Hi Mich. This is a Spark user group mailing list where people can ask *any* questions about spark. You know SQL and streaming, but I don't think it's necessary to start a replay with "*LOL*" to the question that's being asked. No questions are to stupid to be asked. lør. 28. jan. 2023 kl. 09:22

Re: Spark SQL question

2023-01-28 Thread Mich Talebzadeh
LOL First one spark-sql> select 1 as `data.group` from abc group by data.group; 1 Time taken: 0.198 seconds, Fetched 1 row(s) means that are assigning alias data.group to select and you are using that alias -> data.group in your group by statement This is equivalent to spark-sql> select 1

Spark SQL question

2023-01-27 Thread Kohki Nishio
this SQL works select 1 as *`data.group`* from tbl group by *data.group* Since there's no such field as *data,* I thought the SQL has to look like this select 1 as *`data.group`* from tbl group by `*data.group`* But that gives and error (cannot resolve '`data.group`') ... I'm no expert in

Re: Basic Spark SQL question

2015-07-14 Thread Ron Gonzalez
Cool thanks. Will take a look... Sent from my iPhone On Jul 13, 2015, at 6:40 PM, Michael Armbrust mich...@databricks.com wrote: I'd look at the JDBC server (a long running yarn job you can submit queries too)

Basic Spark SQL question

2015-07-13 Thread Ron Gonzalez
Hi, I have a question for Spark SQL. Is there a way to be able to use Spark SQL on YARN without having to submit a job? Bottom line here is I want to be able to reduce the latency of running queries as a job. I know that the spark sql default submission is like a job, but was wondering if

Re: Basic Spark SQL question

2015-07-13 Thread Jerrick Hoang
Well for adhoc queries you can use the CLI On Mon, Jul 13, 2015 at 5:34 PM, Ron Gonzalez zlgonza...@yahoo.com.invalid wrote: Hi, I have a question for Spark SQL. Is there a way to be able to use Spark SQL on YARN without having to submit a job? Bottom line here is I want to be able to

Re: Basic Spark SQL question

2015-07-13 Thread Michael Armbrust
I'd look at the JDBC server (a long running yarn job you can submit queries too) https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server On Mon, Jul 13, 2015 at 6:31 PM, Jerrick Hoang jerrickho...@gmail.com wrote: Well for adhoc queries you can use

RE: Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-10-08 Thread Haopu Wang
...@spark.apache.org; user Subject: Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin? Hi Haopu, How about full outer join? One hash table may not be efficient for this case. Liquan On Mon, Sep 29, 2014 at 11:47 PM, Haopu Wang hw...@qilinsoft.com wrote: Hi

Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-10-08 Thread Matei Zaharia
...@spark.apache.org; user Subject: Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin? Hi Haopu, How about full outer join? One hash table may not be efficient for this case. Liquan On Mon, Sep 29, 2014 at 11:47 PM, Haopu Wang hw...@qilinsoft.com wrote: Hi, Liquan

Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-10-08 Thread Liquan Pei
*To:* Haopu Wang *Cc:* d...@spark.apache.org; user *Subject:* Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin? Hi Haopu, How about full outer join? One hash table may not be efficient for this case. Liquan On Mon, Sep 29, 2014 at 11:47 PM, Haopu Wang hw

RE: Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-09-30 Thread Haopu Wang
! From: Liquan Pei [mailto:liquan...@gmail.com] Sent: 2014年9月30日 12:31 To: Haopu Wang Cc: d...@spark.apache.org; user Subject: Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin? Hi Haopu, My understanding is that the hashtable on both left and right side

Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-09-30 Thread Liquan Pei
...@spark.apache.org; user *Subject:* Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin? Hi Haopu, My understanding is that the hashtable on both left and right side is used for including null values in result in an efficient manner. If hash table is only built on one

Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-09-29 Thread Haopu Wang
I take a look at HashOuterJoin and it's building a Hashtable for both sides. This consumes quite a lot of memory when the partition is big. And it doesn't reduce the iteration on streamed relation, right? Thanks! - To

Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-09-29 Thread Liquan Pei
Hi Haopu, My understanding is that the hashtable on both left and right side is used for including null values in result in an efficient manner. If hash table is only built on one side, let's say left side and we perform a left outer join, for each row in left side, a scan over the right side is

Spark SQL question: how to control the storage level of cached SchemaRDD?

2014-09-28 Thread Haopu Wang
[mailto:lian.cs@gmail.com] Sent: 2014年9月26日 21:24 To: Haopu Wang; user@spark.apache.org Subject: Re: Spark SQL question: is cached SchemaRDD storage controlled by spark.storage.memoryFraction? Yes it is. The in-memory storage used with SchemaRDD also uses RDD.cache() under the hood. On 9/26/14 4

Re: Spark SQL question: how to control the storage level of cached SchemaRDD?

2014-09-28 Thread Michael Armbrust
Replicated. How can I change the storage level? Because I have a big table there. Thanks! -- *From:* Cheng Lian [mailto:lian.cs@gmail.com] *Sent:* 2014年9月26日 21:24 *To:* Haopu Wang; user@spark.apache.org *Subject:* Re: Spark SQL question: is cached SchemaRDD

Re: Spark SQL question: how to control the storage level of cached SchemaRDD?

2014-09-28 Thread Michael Armbrust
have a big table there. Thanks! -- *From:* Cheng Lian [mailto:lian.cs@gmail.com] *Sent:* 2014年9月26日 21:24 *To:* Haopu Wang; user@spark.apache.org *Subject:* Re: Spark SQL question: is cached SchemaRDD storage controlled by spark.storage.memoryFraction

Spark SQL question: is cached SchemaRDD storage controlled by spark.storage.memoryFraction?

2014-09-26 Thread Haopu Wang
Hi, I'm querying a big table using Spark SQL. I see very long GC time in some stages. I wonder if I can improve it by tuning the storage parameter. The question is: the schemaRDD has been cached with cacheTable() function. So is the cached schemaRDD part of memory storage controlled by the

Re: Spark SQL question: is cached SchemaRDD storage controlled by spark.storage.memoryFraction?

2014-09-26 Thread Cheng Lian
Yes it is. The in-memory storage used with |SchemaRDD| also uses |RDD.cache()| under the hood. On 9/26/14 4:04 PM, Haopu Wang wrote: Hi, I'm querying a big table using Spark SQL. I see very long GC time in some stages. I wonder if I can improve it by tuning the storage parameter. The

Fwd: Spark SQL question: is cached SchemaRDD storage controlled by spark.storage.memoryFraction?

2014-09-26 Thread Liquan Pei
-- Forwarded message -- From: Liquan Pei liquan...@gmail.com Date: Fri, Sep 26, 2014 at 1:33 AM Subject: Re: Spark SQL question: is cached SchemaRDD storage controlled by spark.storage.memoryFraction? To: Haopu Wang hw...@qilinsoft.com Hi Haopu, Internally, cactheTable