performance difference between Thrift server and SparkSQL?

2015-10-03 Thread Jeff Thompson
Hi, I'm running a simple SQL query over a ~700 million row table of the form: SELECT * FROM my_table WHERE id = '12345'; When I submit the query via beeline & the JDBC thrift server it returns in 35s When I submit the exact same query using sparkSQL from a pyspark shell (sqlContex.sql("SELECT *

Re: performance difference between Thrift server and SparkSQL?

2015-10-04 Thread Jeff Thompson
des would be helpful in debugging? > > > > On Sat, Oct 3, 2015 at 1:08 PM, Jeff Thompson < > jeffreykeatingthomp...@gmail.com> wrote: > >> Hi, >> >> I'm running a simple SQL query over a ~700 million row table of the form: >> >> SELECT * FROM

error in sparkSQL 1.5 using count(1) in nested queries

2015-10-08 Thread Jeff Thompson
After upgrading from 1.4.1 to 1.5.1 I found some of my spark SQL queries no longer worked. Seems to be related to using count(1) or count(*) in a nested query. I can reproduce the issue in a pyspark shell with the sample code below. The ‘people’ table is from spark-1.5.1-bin-hadoop2.4/ examples/