I ran:
sqlContext.cacheTable("product")
var df = sqlContext.sql("...complex query...")
df.explain(true)
...and obtained: http://pastebin.com/k9skERsr
...where "[...]" corresponds therein to huge lists of records from the
addressed table (product)
The query is of the following form:
"SELECT distinct p.id, p.`aaa`, p.`bbb` FROM product p, (SELECT distinct
p1.id FROM product p1 WHERE p1.`ccc`='fff') p2, (SELECT distinct p3.id
FROM product p3 WHERE p3.`ccc`='ddd') p4 WHERE p.`eee` = '1' AND
p.id=p2.id AND p.`eee` > 137 AND p4.id=p.id UNION SELECT distinct
p.id,p.`bbb`, p.`bbb` FROM product p, (SELECT distinct p1.id FROM
product p1 WHERE p1.`ccc`='fff') p2, (SELECT distinct p5.id FROM product
p5 WHERE p5.`ccc`='ggg') p6 WHERE p.`eee` = '1' AND p.id=p2.id AND
p.`hhh` > 93 AND p6.id=p.id ORDER BY p.`bbb` LIMIT 10"
On 24.03.2016 22:16, Ted Yu wrote:
Can you obtain output from explain(true) on the query after
cacheTable() call ?
Potentially related JIRA:
[SPARK-13657] [SQL] Support parsing very long AND/OR expressions
On Thu, Mar 24, 2016 at 12:55 PM, Mohamed Nadjib MAMI
<m...@iai.uni-bonn.de <mailto:m...@iai.uni-bonn.de>> wrote:
Here is the stack trace: http://pastebin.com/ueHqiznH
Here's the code:
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val table = sqlContext.read.parquet("hdfs://...parquet_table")
table.registerTempTable("table")
sqlContext.sql("...complex query...").show() /** works */
sqlContext.cacheTable("table")
sqlContext.sql("...complex query...").show() /** works */
sqlContext.sql("...complex query...").show() /** fails */
On 24.03.2016 13:40, Ted Yu wrote:
Can you pastebin the stack trace ?
If you can show snippet of your code, that would help give us more clue.
Thanks
On Mar 24, 2016, at 2:43 AM, Mohamed Nadjib MAMI<m...@iai.uni-bonn.de>
<mailto:m...@iai.uni-bonn.de> wrote:
Hi all,
I'm running SQL queries (sqlContext.sql()) on Parquet tables and facing a
problem with table caching (sqlContext.cacheTable()), using spark-shell of
Spark 1.5.1.
After I run the sqlContext.cacheTable(table), the sqlContext.sql(query) takes longer
the first time (well, for the lazy execution reason) but it finishes and returns results.
However, the weird thing is that after I run the same query again, I get the error:
"java.lang.StackOverflowError".
I Googled it but didn't find the error appearing with table caching and
querying.
Any hint is appreciated.
--
Regards, Grüße, Cordialement, Recuerdos, Saluti, προσρήσεις, 问
候, تحياتي. Mohamed Nadjib Mami
PhD Student - EIS Department - Bonn University, Germany.
Website <http://www.mohamednadjibmami.com>.
LinkedIn <http://fr.linkedin.com/in/mohamednadjibmami>.
--
Regards, Grüße, Cordialement, Recuerdos, Saluti, προσρήσεις, 问候,
تحياتي. Mohamed Nadjib Mami
PhD Student - EIS Department - Bonn University, Germany.
Website <http://www.mohamednadjibmami.com>.
LinkedIn <http://fr.linkedin.com/in/mohamednadjibmami>.