Thanks, I just filed a Jira Issue https://issues.apache.org/jira/browse/PHOENIX-3506
Xindian From: James Taylor [mailto:[email protected]] Sent: Thursday, November 10, 2016 3:08 PM To: user Subject: Re: Phoenix-Spark plug in cannot select by column family name Please file a JIRA, though, Xindian. It's a reasonable request to add the ability to prefix column references with the column family name just like you can do in JDBC. On Thu, Nov 10, 2016 at 12:05 PM, Chris Tarnas <[email protected]<mailto:[email protected]>> wrote: From my experience you will need to make sure that the column names are unique, even across families, otherwise Spark will throw errors. Chris Tarnas Biotique Systems, Inc [email protected]<mailto:[email protected]> On Nov 10, 2016, at 10:14 AM, Long, Xindian <[email protected]<mailto:[email protected]>> wrote: It works with no column family, but I expect that I do not need to make sure column names are unique across different column families. Xindian From: James Taylor [mailto:[email protected]] Sent: Tuesday, November 08, 2016 5:46 PM To: user Subject: Re: Phoenix-Spark plug in cannot select by column family name Have you tried without the column family name? Unless the column names are not unique across all column families, you don't need to include the column family name. Thanks, James On Tue, Nov 8, 2016 at 2:19 PM, Long, Xindian <[email protected]<mailto:[email protected]>> wrote: I have a table with multiple column family with possible same column names. I want to use phoenix-spark plug in to select some of the fields, but it returns a AnalysisException (details in the attached file) public void testSpark(JavaSparkContext sc, String tableStr, String dataSrcUrl) { //SparkContextBuilder.buildSparkContext("Simple Application", "local"); // One JVM can only have one Spark Context now Map<String, String> options = new HashMap<String, String>(); SQLContext sqlContext = new SQLContext(sc); options.put("zkUrl", dataSrcUrl); options.put("table", tableStr); log.info("Phoenix DB URL: " + dataSrcUrl + " tableStr: " + tableStr); DataFrame df = null; try { df = sqlContext.read().format("org.apache.phoenix.spark").options(options).load(); df.explain(true); df.show(); df = df.select("I.CI<http://i.ci/>", "I.FA"); //df = df.select("\"I\".\"CI\"", "\"I\".\"FA\""); // This gives the same exception too } catch (Exception ex) { log.error("sql error: ", ex); } try { log.info("Count By phoenix spark plugin: " + df.count()); } catch (Exception ex) { log.error("dataframe error: ", ex); } } I can see in the log that there is something like 10728 [INFO] main org.apache.phoenix.mapreduce.PhoenixInputFormat - Select Statement: SELECT "RID","I"."CI","I"."FA","I"."FPR","I"."FPT","I"."FR","I"."LAT","I"."LNG","I"."NCG","I"."NGPD","I"."VE","I"."VMJ","I"."VMR","I"."VP","I"."CSRE","I"."VIB","I"."IIICS","I"."LICSCD","I"."LEDC","I"."ARM","I"."FBM","I"."FTB","I"."NA2FR","I"."NA2PT","S"."AHDM","S"."ARTJ","S"."ATBM","S"."ATBMR","S"."ATBR","S"."ATBRR","S"."CS","S"."LAMT","S"."LTFCT","S"."LBMT","S"."LDTI","S"."LMT","S"."LMTN","S"."LMTR","S"."LPET","S"."LPORET","S"."LRMT","S"."LRMTP","S"."LRMTR","S"."LSRT","S"."LSST","S"."MHDMS0","S"."MHDMS1","S"."RFD","S"."RRN","S"."RRR","S"."TD","S"."TSM","S"."TC","S"."TPM","S"."LRMCT","S"."SS13FSK34","S"."LERMT","S"."LEMDMT","S"."AGTBRE","S"."SRM","S"."LTET","S"."TPMS","S"."TPMSM","S"."TM","S"."TMF","S"."TMFM","S"."NA2TLS","S"."NA2IT","S"."CWR","S"."BPR","S"."LR","S"."HLB","S"."NA2UFTBFR","S"."DT","S"."NA28ARE","S"."RM","S"."LMTB","S"."LRMTB","S"."RRB","P"."BADUC","P"."UAN","P"."BAPS","P"."BAS","P"."UAS","P"."BATBBR","P"."BBRI","P"."BLBR","P"."ULHT","P"."BLPST","P"."BLPT","P"."UTI","P"."UUC" FROM TESTING.ENDPOINTS But obviously, the column family is left out of the Dataframe column name somewhere in the process. Any fix for the problem? Thanks Xindian
