It works with no column family, but I expect that I do not need to make sure 
column names are unique  across different column families.

Xindian


From: James Taylor [mailto:[email protected]]
Sent: Tuesday, November 08, 2016 5:46 PM
To: user
Subject: Re: Phoenix-Spark plug in cannot select by column family name

Have you tried without the column family name? Unless the column names are not 
unique across all column families, you don't need to include the column family 
name.

Thanks,
James

On Tue, Nov 8, 2016 at 2:19 PM, Long, Xindian 
<[email protected]<mailto:[email protected]>> wrote:
I have a table with multiple column family with possible same column names.
I want to use phoenix-spark plug in to select some of the fields, but it 
returns a AnalysisException (details in the attached file)

public void testSpark(JavaSparkContext sc, String tableStr, String dataSrcUrl) {
    //SparkContextBuilder.buildSparkContext("Simple Application", "local");

    // One JVM can only have one Spark Context now
    Map<String, String> options = new HashMap<String, String>();
    SQLContext sqlContext = new SQLContext(sc);


    options.put("zkUrl", dataSrcUrl);
    options.put("table", tableStr);
    log.info("Phoenix DB URL: " + dataSrcUrl + " tableStr: " + tableStr);

    DataFrame df = null;
    try {
        df = 
sqlContext.read().format("org.apache.phoenix.spark").options(options).load();
        df.explain(true);
        df.show();

        df = df.select("I.CI<http://I.CI>", "I.FA");

        //df = df.select("\"I\".\"CI\"", "\"I\".\"FA\""); // This gives the 
same exception too

    } catch (Exception ex) {
        log.error("sql error: ", ex);
    }

    try {
        log.info("Count By phoenix spark plugin: " + df.count());
   } catch (Exception ex) {
        log.error("dataframe error: ", ex);
    }

}


I can see in the log that there is something like

10728 [INFO] main  org.apache.phoenix.mapreduce.PhoenixInputFormat  - Select 
Statement: SELECT 
"RID","I"."CI","I"."FA","I"."FPR","I"."FPT","I"."FR","I"."LAT","I"."LNG","I"."NCG","I"."NGPD","I"."VE","I"."VMJ","I"."VMR","I"."VP","I"."CSRE","I"."VIB","I"."IIICS","I"."LICSCD","I"."LEDC","I"."ARM","I"."FBM","I"."FTB","I"."NA2FR","I"."NA2PT","S"."AHDM","S"."ARTJ","S"."ATBM","S"."ATBMR","S"."ATBR","S"."ATBRR","S"."CS","S"."LAMT","S"."LTFCT","S"."LBMT","S"."LDTI","S"."LMT","S"."LMTN","S"."LMTR","S"."LPET","S"."LPORET","S"."LRMT","S"."LRMTP","S"."LRMTR","S"."LSRT","S"."LSST","S"."MHDMS0","S"."MHDMS1","S"."RFD","S"."RRN","S"."RRR","S"."TD","S"."TSM","S"."TC","S"."TPM","S"."LRMCT","S"."SS13FSK34","S"."LERMT","S"."LEMDMT","S"."AGTBRE","S"."SRM","S"."LTET","S"."TPMS","S"."TPMSM","S"."TM","S"."TMF","S"."TMFM","S"."NA2TLS","S"."NA2IT","S"."CWR","S"."BPR","S"."LR","S"."HLB","S"."NA2UFTBFR","S"."DT","S"."NA28ARE","S"."RM","S"."LMTB","S"."LRMTB","S"."RRB","P"."BADUC","P"."UAN","P"."BAPS","P"."BAS","P"."UAS","P"."BATBBR","P"."BBRI","P"."BLBR","P"."ULHT","P"."BLPST","P"."BLPT","P"."UTI","P"."UUC"
 FROM TESTING.ENDPOINTS

But obviously, the column family is  left out of the Dataframe column name 
somewhere in the process.
Any fix for the problem?

Thanks

Xindian


Reply via email to