I ran:

sqlContext.cacheTable("product")
var df = sqlContext.sql("...complex query...")
df.explain(true)

...and obtained: http://pastebin.com/k9skERsr

...where "[...]" corresponds therein to huge lists of records from the addressed table (product)

The query is of the following form:
"SELECT distinct p.id, p.`aaa`, p.`bbb` FROM product p, (SELECT distinct p1.id FROM product p1 WHERE p1.`ccc`='fff') p2, (SELECT distinct p3.id FROM product p3 WHERE p3.`ccc`='ddd') p4 WHERE p.`eee` = '1' AND p.id=p2.id AND p.`eee` > 137 AND p4.id=p.id UNION SELECT distinct p.id,p.`bbb`, p.`bbb` FROM product p, (SELECT distinct p1.id FROM product p1 WHERE p1.`ccc`='fff') p2, (SELECT distinct p5.id FROM product p5 WHERE p5.`ccc`='ggg') p6 WHERE p.`eee` = '1' AND p.id=p2.id AND p.`hhh` > 93 AND p6.id=p.id ORDER BY p.`bbb` LIMIT 10"


On 24.03.2016 22:16, Ted Yu wrote:
Can you obtain output from explain(true) on the query after cacheTable() call ?

Potentially related JIRA:

[SPARK-13657] [SQL] Support parsing very long AND/OR expressions


On Thu, Mar 24, 2016 at 12:55 PM, Mohamed Nadjib MAMI <m...@iai.uni-bonn.de <mailto:m...@iai.uni-bonn.de>> wrote:

    Here is the stack trace: http://pastebin.com/ueHqiznH

    Here's the code:

        val sqlContext = new org.apache.spark.sql.SQLContext(sc)

        val table = sqlContext.read.parquet("hdfs://...parquet_table")
        table.registerTempTable("table")

        sqlContext.sql("...complex query...").show() /** works */

        sqlContext.cacheTable("table")

        sqlContext.sql("...complex query...").show() /** works */

        sqlContext.sql("...complex query...").show() /** fails */



    On 24.03.2016 13:40, Ted Yu wrote:
    Can you pastebin the stack trace ?

    If you can show snippet of your code, that would help give us more clue.

    Thanks

    On Mar 24, 2016, at 2:43 AM, Mohamed Nadjib MAMI<m...@iai.uni-bonn.de> 
<mailto:m...@iai.uni-bonn.de>  wrote:

    Hi all,
    I'm running SQL queries (sqlContext.sql()) on Parquet tables and facing a 
problem with table caching (sqlContext.cacheTable()), using spark-shell of 
Spark 1.5.1.

    After I run the sqlContext.cacheTable(table), the sqlContext.sql(query) takes longer 
the first time (well, for the lazy execution reason) but it finishes and returns results. 
However, the weird thing is that after I run the same query again, I get the error: 
"java.lang.StackOverflowError".

    I Googled it but didn't find the error appearing with table caching and 
querying.
    Any hint is appreciated.


-- Regards, Grüße, Cordialement, Recuerdos, Saluti, προσρήσεις, 问
    候, تحياتي. Mohamed Nadjib Mami
    PhD Student - EIS Department - Bonn University, Germany.
    Website <http://www.mohamednadjibmami.com>.
    LinkedIn <http://fr.linkedin.com/in/mohamednadjibmami>.



--
Regards, Grüße, Cordialement, Recuerdos, Saluti, προσρήσεις, 问候, تحياتي. Mohamed Nadjib Mami
PhD Student - EIS Department - Bonn University, Germany.
Website <http://www.mohamednadjibmami.com>.
LinkedIn <http://fr.linkedin.com/in/mohamednadjibmami>.

Reply via email to