, Below is the query, looks like from physical plan, the query is same as
that of cqlsh,
val query = s"""(select * from model_data
where TimeStamp > \'$timeStamp+0000\' and TimeStamp <=
\'$startTS+0000\'
and MetricID = $metricID)"""
println("Model query" + query)
val df = spark.read
.format(Config.dbDriver)
.options(Map("Keyspace" -> Config.dbName, "table" ->
"ml_forecast_tbl"))
.load
val d = spark.sparkContext.cassandraTable("analytics_db",
"ml_forecast_tbl")
.where(" \"TimeStamp\" > ? and \"TimeStamp\" <= ? and \"MetricID\" =
1",
timeStamp + "+0000", startTS + "+0000")
d.createorReplaceTempView("mode_data")
val modelDF = spark.sql(query).cache.
println(spark.sql(query).queryExecution)
== Physical Plan ==
InMemoryTableScan [MetricID#9045, TimeStamp#9046, ResourceID#9047,
Forecast#9048, GlobalThresholdMax#9049, GlobalThresholdMin#9050, Hi85#9051,
Hi99#9052, Low85#9053, Low99#9054]
: +- InMemoryRelation [MetricID#9045, TimeStamp#9046, ResourceID#9047,
Forecast#9048, GlobalThresholdMax#9049, GlobalThresholdMin#9050, Hi85#9051,
Hi99#9052, Low85#9053, Low99#9054], true, 10000, StorageLevel(disk, memory,
deserialized, 1 replicas)
: : +- *Filter ((((cast(TimeStamp#9046 as string) > 2016-01-22
00:00:00+0000) && (cast(TimeStamp#9046 as string) <= 2016-01-22
00:30:00+0000)) && isnotnull(TimeStamp#9046)) && isnotnull(MetricID#9045))
: : +- *Scan
org.apache.spark.sql.cassandra.CassandraSourceRelation@40dc2ade
[MetricID#9045,TimeStamp#9046,ResourceID#9047,Forecast#9048,GlobalThresholdMax#9049,GlobalThresholdMin#9050,Hi85#9051,Hi99#9052,Low85#9053,Low99#9054]
PushedFilters: [IsNotNull(TimeStamp), IsNotNull(MetricID),
EqualTo(MetricID,1)], ReadSchema:
struct<MetricID:int,TimeStamp:timestamp,ResourceID:string,Forecast:double,GlobalThresholdMax:doub...
On Tue, Jun 20, 2017 at 5:13 PM, Riccardo Ferrari <[email protected]>
wrote:
> Hi,
>
> Personally I would inspect how dates are managed. How does your spark code
> looks like? What does the explain say. Does TimeStamp gets parsed the same
> way?
>
> Best,
>
> On Tue, Jun 20, 2017 at 12:52 PM, sujeet jog <[email protected]> wrote:
>
>> Hello,
>>
>> I have a table as below
>>
>> CREATE TABLE analytics_db.ml_forecast_tbl (
>> "MetricID" int,
>> "TimeStamp" timestamp,
>> "ResourceID" timeuuid
>> "Value" double,
>> PRIMARY KEY ("MetricID", "TimeStamp", "ResourceID")
>> )
>>
>> select * from ml_forecast_tbl where "MetricID" = 1 and "TimeStamp" >
>> '2016-01-22 00:00:00+0000' and "TimeStamp" <= '2016-01-22 00:30:00+0000' ;
>>
>> MetricID | TimeStamp | ResourceID
>> | Value|
>> ----------+---------------------------------+---------------
>> -----------------------+----------+
>> | 1 | 2016-01-22 00:30:00.000000+0000 |
>> 4a925190-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>> | 23.74124 | 15.2371
>> 1 | 2016-01-22 00:30:00.000000+0000 |
>> 4a92c6c0-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>> | 23.74124 | 15.2371
>> 1 | 2016-01-22 00:30:00.000000+0000 |
>> 4a936300-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>> | 23.74124 | 15.2371
>> 1 | 2016-01-22 00:30:00.000000+0000 |
>> 4a93d830-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>> | 23.74124 | 15.2371
>>
>> This query runs perfectly fine from cqlsh, but not with Spark SQL, it
>> just emits empty results,
>> Is there a catch to think about on querying timestamp ranges with
>> cassandra spark connector
>>
>> Any inputs on this ?..
>>
>>
>> Thanks,
>> Sujeet
>>
>
>