[ https://issues.apache.org/jira/browse/PHOENIX-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pablo Castilla updated PHOENIX-3664: ------------------------------------ Environment: Azure HDIndight (HDI 3.5) - pyspark using phoenix client. (Spark 1.6.3 - HBase 1.1.2 under HDP 2.5) (was: Azure HDIndight - pyspark using phoenix client.) > Pyspark: pushing filter by date against apache phoenix > ------------------------------------------------------ > > Key: PHOENIX-3664 > URL: https://issues.apache.org/jira/browse/PHOENIX-3664 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.7.0 > Environment: Azure HDIndight (HDI 3.5) - pyspark using phoenix > client. (Spark 1.6.3 - HBase 1.1.2 under HDP 2.5) > Reporter: Pablo Castilla > > I am trying to filter by date in apache phoenix from pyspark. The column in > phoenix is created as Date and the filter is a datetime. When I use explain I > see spark doesn't push the filter to phoenix. I have tried a lot of > combinations without luck. > Any way to do it? > df = sqlContext.read \ > .format("org.apache.phoenix.spark") \ > .option("table", "TABLENAME") \ > .option("zkUrl",zookepperServer +":2181:/hbase-unsecure" ) \ > .load() > print(df.printSchema()) > startValidation = datetime.datetime.now() > print(df.filter(df['FH'] >startValidation).explain(True)) > Results: > root > |-- METER_ID: string (nullable = true) > |-- FH: date (nullable = true) > None > == Parsed Logical Plan == > 'Filter (FH#53 > 1486726683446150) > +- > Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64] > PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure) > == Analyzed Logical Plan == > METER_ID: string, FH: date, SUMMERTIME: string, MAGNITUDE: int, SOURCE: int, > ENTRY_DATETIME: date, BC: string, T_VAL_AE: int, T_VAL_AI: int, T_VAL_R1: > int, T_VAL_R2: int, T_VAL_R3: int, T_VAL_R4: int > Filter (cast(FH#53 as string) > cast(1486726683446150 as string)) > +- > Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64] > PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure) > == Optimized Logical Plan == > Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615) > +- > Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64] > PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure) > == Physical Plan == > Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615) > +- Scan > PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64] > None > if I set the FH column as timestamp it pushes the filter but throws an > exception: > Caused by: org.apache.phoenix.exception.PhoenixParserException: ERROR 604 > (42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "12" at line > 1, column 219. > at > org.apache.phoenix.exception.PhoenixParserException.newException(PhoenixParserException.java:33) > at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:111) > at > org.apache.phoenix.jdbc.PhoenixStatement$PhoenixStatementParser.parseStatement(PhoenixStatement.java:1280) > at > org.apache.phoenix.jdbc.PhoenixStatement.parseStatement(PhoenixStatement.java:1363) > at > org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:1373) > at > org.apache.phoenix.jdbc.PhoenixStatement.optimizeQuery(PhoenixStatement.java:1368) > at > org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:122) > ... 102 more > Caused by: MismatchedTokenException(106!=129) > at > org.apache.phoenix.parse.PhoenixSQLParser.recoverFromMismatchedToken(PhoenixSQLParser.java:360) > at > org.apache.phoenix.shaded.org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) > at > org.apache.phoenix.parse.PhoenixSQLParser.not_expression(PhoenixSQLParser.java:6862) > at > org.apache.phoenix.parse.PhoenixSQLParser.and_expression(PhoenixSQLParser.java:6677) > at > org.apache.phoenix.parse.PhoenixSQLParser.or_expression(PhoenixSQLParser.java:6614) > at > org.apache.phoenix.parse.PhoenixSQLParser.expression(PhoenixSQLParser.java:6579) > at > org.apache.phoenix.parse.PhoenixSQLParser.single_select(PhoenixSQLParser.java:4615) > at > org.apache.phoenix.parse.PhoenixSQLParser.unioned_selects(PhoenixSQLParser.java:4697) > at > org.apache.phoenix.parse.PhoenixSQLParser.select_node(PhoenixSQLParser.java:4763) > at > org.apache.phoenix.parse.PhoenixSQLParser.oneStatement(PhoenixSQLParser.java:789) > at > org.apache.phoenix.parse.PhoenixSQLParser.statement(PhoenixSQLParser.java:508) > at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:108) > ... 107 more -- This message was sent by Atlassian JIRA (v6.3.15#6346)