Good guess, but that is not the reason. Look at this code:
scala> val data = sc.parallelize(1325548800000L::1335548800000L::Nil).map(i=>
T(i.toString, new java.sql.Timestamp(i)))
data: org.apache.spark.rdd.RDD[T] = MappedRDD[17] at map at <console>:17
scala> data.collect
res3: Array[T] = Array(T(1325548800000,2012-01-02 16:00:00.0),
T(1335548800000,2012-04-27 10:46:40.0))
scala> data.registerTempTable("x")
scala> val s = sqlContext.sql("select a from x where ts>='1970-01-01
00:00:00';")
s: org.apache.spark.sql.SchemaRDD =
SchemaRDD[20] at RDD at SchemaRDD.scala:103
== Query Plan ==
== Physical Plan ==
Project [a#2]
ExistingRdd [a#2,ts#3], MapPartitionsRDD[22] at mapPartitions at
basicOperators.scala:208
scala> s.collect
res5: Array[org.apache.spark.sql.Row] = Array()
Mohammed
From: Yin Huai [mailto:[email protected]]
Sent: Monday, October 13, 2014 7:19 AM
To: Mohammed Guller
Cc: Cheng, Hao; Cheng Lian; [email protected]
Subject: Re: Spark SQL parser bug?
Seems the reason that you got "wrong" results was caused by timezone.
The time in java.sql.Timestamp(long time) means "milliseconds since January 1,
1970, 00:00:00 GMT. A negative number is the number of milliseconds before
January 1, 1970, 00:00:00 GMT."
However, in ts>='1970-01-01 00:00:00', '1970-01-01 00:00:00' is using your
local timezone.
Thanks,
Yin
On Mon, Oct 13, 2014 at 9:58 AM, Mohammed Guller
<[email protected]<mailto:[email protected]>> wrote:
Hi Cheng,
I am using version 1.1.0.
Looks like that bug was fixed sometime after 1.1.0 was released. Interestingly,
I tried your code on 1.1.0 and it gives me a different (incorrect) result:
case class T(a:String, ts:java.sql.Timestamp)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.createSchemaRDD
val data = sc.parallelize(10000::20000::Nil).map(i=> T(i.toString, new
java.sql.Timestamp(i)))
data.registerTempTable("x")
val s = sqlContext.sql("select a from x where ts>='1970-01-01 00:00:00';")
scala> s.collect
res1: Array[org.apache.spark.sql.Row] = Array()
Mohammed
From: Cheng, Hao [mailto:[email protected]<mailto:[email protected]>]
Sent: Sunday, October 12, 2014 1:35 AM
To: Mohammed Guller; Cheng Lian;
[email protected]<mailto:[email protected]>
Subject: RE: Spark SQL parser bug?
Hi, I couldn’t reproduce the bug with the latest master branch. Which version
are you using? Can you also list data in the table “x”?
case class T(a:String, ts:java.sql.Timestamp)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.createSchemaRDD
val data = sc.parallelize(10000::20000::Nil).map(i=> T(i.toString, new
java.sql.Timestamp(i)))
data.registerTempTable("x")
val s = sqlContext.sql("select a from x where ts>='1970-01-01 00:00:00';")
s.collect
output:
res1: Array[org.apache.spark.sql.Row] = Array([10000], [20000])
Cheng Hao
From: Mohammed Guller [mailto:[email protected]]
Sent: Sunday, October 12, 2014 12:06 AM
To: Cheng Lian; [email protected]<mailto:[email protected]>
Subject: RE: Spark SQL parser bug?
I tried even without the “T” and it still returns an empty result:
scala> val sRdd = sqlContext.sql("select a from x where ts >= '2012-01-01
00:00:00';")
sRdd: org.apache.spark.sql.SchemaRDD =
SchemaRDD[35] at RDD at SchemaRDD.scala:103
== Query Plan ==
== Physical Plan ==
Project [a#0]
ExistingRdd [a#0,ts#1], MapPartitionsRDD[37] at mapPartitions at
basicOperators.scala:208
scala> sRdd.collect
res10: Array[org.apache.spark.sql.Row] = Array()
Mohammed
From: Cheng Lian [mailto:[email protected]]
Sent: Friday, October 10, 2014 10:14 PM
To: Mohammed Guller; [email protected]<mailto:[email protected]>
Subject: Re: Spark SQL parser bug?
Hmm, there is a “T” in the timestamp string, which makes the string not a valid
timestamp string representation. Internally Spark SQL uses
java.sql.Timestamp.valueOf to cast a string to a timestamp.
On 10/11/14 2:08 AM, Mohammed Guller wrote:
scala> rdd.registerTempTable("x")
scala> val sRdd = sqlContext.sql("select a from x where ts >=
'2012-01-01T00:00:00';")
sRdd: org.apache.spark.sql.SchemaRDD =
SchemaRDD[4] at RDD at SchemaRDD.scala:103
== Query Plan ==
== Physical Plan ==
Project [a#0]
ExistingRdd [a#0,ts#1], MapPartitionsRDD[6] at mapPartitions at
basicOperators.scala:208
scala> sRdd.collect
res2: Array[org.apache.spark.sql.Row] = Array()