Hi Ralph Glad that worked partly. For the issue that you are mentioning I am not sure of any easy way out as there could be some rows with null column values
Regards Ravi Magham. On Monday, December 8, 2014, Perko, Ralph J <ralph.pe...@pnnl.gov> wrote: > Ravi, > > Your suggestion worked – thank you! > > But I am now getting a > org.apache.phoenix.schema.ConstraintViolationException on some data files. > > "T1_LOG_DNS.PERIOD may not be null” > > However there is no record with a null value for this field. > > I tried hardcoding a value in the pig script to see if I could get past > this error and it just moved the error to the next field: > > "T1_LOG_DNS.DEPLOYMENT may not be null” > > This is an intermittent error and does not happen with every file but > does have consistently with the same file. > > Thank you for the help > > Ralph > > > __________________________________________________ > *Ralph Perko* > Pacific Northwest National Laboratory > (509) 375-2272 > ralph.pe...@pnnl.gov > <javascript:_e(%7B%7D,'cvml','ralph.pe...@pnnl.gov');> > > > From: Ravi Kiran <maghamraviki...@gmail.com > <javascript:_e(%7B%7D,'cvml','maghamraviki...@gmail.com');>> > Reply-To: "user@phoenix.apache.org > <javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>" < > user@phoenix.apache.org > <javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>> > Date: Friday, December 5, 2014 at 3:20 PM > To: "user@phoenix.apache.org > <javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>" < > user@phoenix.apache.org > <javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>> > Subject: Re: pig and phoenix > > Hi Ralph. > Can you please try to modify the STORE command in the script to the > following. > STORE D into 'hbase://$table_name/period,deployment,file_id, recnum' > using org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize > 1000'); > > Primarily, Phoenix generates the default UPSERT query to the table and > it assumes the order to be that of the columns mentioned in your CREATE > table. In your case, I see you are reordering the columns during the STORE > command . Hence, with the above change, Phoenix constructs the right UPSERT > query for you with the columns you mention after $table_name. > > Also, to have the look at the query Phoenix has generated, you should > see a log entry which starts with " > *Phoenix Generic Upsert Statement: * > That also will give insights into the UPSERT query. > > Happy to help!! > > Regards > Ravi > > > On Fri, Dec 5, 2014 at 2:57 PM, Perko, Ralph J <ralph.pe...@pnnl.gov > <javascript:_e(%7B%7D,'cvml','ralph.pe...@pnnl.gov');>> wrote: > >> Hi, I wrote a series of pig scripts to load data that were working well >> with 4.0, but since upgrading to 4.2.x (4.2.1 currently) are now failing. >> >> Here is an example: >> >> Table def: >> CREATE TABLE IF NOT EXISTS t1_log_dns >> ( >> period BIGINT NOT NULL, >> deployment VARCHAR NOT NULL, >> file_id VARCHAR NOT NULL, >> recnum INTEGER NOT NULL, >> f1 VARCHAR, >> f2 VARCHAR, >> f3 VARCHAR, >> f4 BIGINT, >> ... >> CONSTRAINT pkey PRIMARY KEY (period, deployment, file_id, recnum) >> ) >> IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10,SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy'; >> >> --- some index def’s – same error occurs with or without them >> >> Pig script: >> >> register $phoenix_jar; >> register $udf_jar; >> >> Z = load '$data' as ( >> file_id, >> recnum, >> period, >> deployment, >> ... more fields >> ); >> >> -- put it all together and generate final output! >> D = foreach Z generate >> period, >> deployment, >> file_id, >> recnum , >> ... more fields; >> >> STORE D into 'hbase://$table_name' using >> org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize 1000'); >> >> Error: >> 2014-12-05 14:24:06,450 [main] ERROR >> org.apache.pig.tools.pigstats.SimplePigStats - ERROR: Unable to process >> column RECNUM:INTEGER, innerMessage=java.lang.String cannot be coerced to >> INTEGER >> 2014-12-05 14:24:06,450 [main] ERROR >> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! >> 2014-12-05 14:24:06,452 [main] INFO >> org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: >> >> HadoopVersion PigVersionUserId StartedAtFinishedAt Features >> 2.4.0.2.1.5.0-695 0.12.1.2.1.5.0-695perko 2014-12-05 14:23:172014-12-05 >> 14:24:06 UNKNOWN >> >> Based on the error it would seem that some non-integer value cannot be >> cast to an integer. But the data does not show this. Stepping through the >> Pig script and running "dump" on each variable >> shows the data in the right place and the right coercible type – for >> example the recnum has nothing but single digits of sample data. >> >> I have tried to set "recnum" to an int in pig but this just pushes the >> error up to the previous field - file_id: >> >> ERROR 2999: Unexpected internal error. Unable to process column >> FILE_ID:VARCHAR, innerMessage=java.lang.Integer cannot be coerced to VARCHAR >> >> Other times I get a different error: >> >> Unable to process column _SALT:BINARY, >> innerMessage=org.apache.phoenix.schema.TypeMismatchException: ERROR 203 >> (22005): Type mismatch. BINARY cannot be coerced to LONG >> >> Is there something obvious I am doing wrong? Did something significant >> change between 4.0 and 4.2.x in this regard? I would not rule out some >> silly user error I inadvertently introduced :-/ >> >> Thanks for your help >> Ralph >> >> >