Please file a JIRA, though, Xindian. It's a reasonable request to add the ability to prefix column references with the column family name just like you can do in JDBC.
On Thu, Nov 10, 2016 at 12:05 PM, Chris Tarnas <[email protected]> wrote: > From my experience you will need to make sure that the column names are > unique, even across families, otherwise Spark will throw errors. > > Chris Tarnas > Biotique Systems, Inc > [email protected] > > > On Nov 10, 2016, at 10:14 AM, Long, Xindian <[email protected]> > wrote: > > It works with no column family, but I expect that I do not need to make > sure column names are unique across different column families. > > > > Xindian > > > > > > *From:* James Taylor [mailto:[email protected] > <[email protected]>] > *Sent:* Tuesday, November 08, 2016 5:46 PM > *To:* user > *Subject:* Re: Phoenix-Spark plug in cannot select by column family name > > > > Have you tried without the column family name? Unless the column names are > not unique across all column families, you don't need to include the column > family name. > > > > Thanks, > > James > > > > On Tue, Nov 8, 2016 at 2:19 PM, Long, Xindian <[email protected]> > wrote: > > I have a table with multiple column family with possible same column names. > > I want to use phoenix-spark plug in to select some of the fields, but it > returns a AnalysisException (details in the attached file) > > > > public void testSpark(JavaSparkContext sc, String tableStr, String > dataSrcUrl) { > //SparkContextBuilder.buildSparkContext("Simple Application", > "local"); > > // One JVM can only have one Spark Context now > Map<String, String> options = new HashMap<String, String>(); > SQLContext sqlContext = new SQLContext(sc); > > > options.put("zkUrl", dataSrcUrl); > options.put("table", tableStr); > *log*.info("Phoenix DB URL: " + dataSrcUrl + " tableStr: " + tableStr) > ; > > DataFrame df = null; > try { > df = sqlContext.read().format("org.apache.phoenix.spark"). > options(options).load(); > df.explain(true); > df.show(); > > df = df.select("I.CI <http://i.ci/>", "I.FA"); > > //df = df.select("\"I\".\"CI\"", "\"I\".\"FA\""); // This gives the > same exception too > > > } catch (Exception ex) { > *log*.error("sql error: ", ex); > } > > try { > *log*.info("Count By phoenix spark plugin: " + df.count()); > } catch (Exception ex) { > *log*.error("dataframe error: ", ex); > } > > } > > > > > > I can see in the log that there is something like > > > > *10728 [INFO] main org.apache.phoenix.mapreduce.PhoenixInputFormat - > Select Statement: SELECT > "RID","I"."CI","I"."FA","I"."FPR","I"."FPT","I"."FR","I"."LAT","I"."LNG","I"."NCG","I"."NGPD","I"."VE","I"."VMJ","I"."VMR","I"."VP","I"."CSRE","I"."VIB","I"."IIICS","I"."LICSCD","I"."LEDC","I"."ARM","I"."FBM","I"."FTB","I"."NA2FR","I"."NA2PT","S"."AHDM","S"."ARTJ","S"."ATBM","S"."ATBMR","S"."ATBR","S"."ATBRR","S"."CS","S"."LAMT","S"."LTFCT","S"."LBMT","S"."LDTI","S"."LMT","S"."LMTN","S"."LMTR","S"."LPET","S"."LPORET","S"."LRMT","S"."LRMTP","S"."LRMTR","S"."LSRT","S"."LSST","S"."MHDMS0","S"."MHDMS1","S"."RFD","S"."RRN","S"."RRR","S"."TD","S"."TSM","S"."TC","S"."TPM","S"."LRMCT","S"."SS13FSK34","S"."LERMT","S"."LEMDMT","S"."AGTBRE","S"."SRM","S"."LTET","S"."TPMS","S"."TPMSM","S"."TM","S"."TMF","S"."TMFM","S"."NA2TLS","S"."NA2IT","S"."CWR","S"."BPR","S"."LR","S"."HLB","S"."NA2UFTBFR","S"."DT","S"."NA28ARE","S"."RM","S"."LMTB","S"."LRMTB","S"."RRB","P"."BADUC","P"."UAN","P"."BAPS","P"."BAS","P"."UAS","P"."BATBBR","P"."BBRI","P"."BLBR","P"."ULHT","P"."BLPST","P"."BLPT","P"."UTI","P"."UUC" > FROM TESTING.ENDPOINTS* > > > > But obviously, the column family is left out of the Dataframe column name > somewhere in the process. > > Any fix for the problem? > > > > Thanks > > > > Xindian > > > > > > >
