Oppps, sorry by my oversight I was checking the code and I was surprised it did not work with that pig script ...
now , It works fine .. Many thanks,Chad Have a nice day Miguel Angel Martín Junquera Analyst Engineer. miguelangel.mar...@brainsins.com 2013/9/3 Chad Johnston <cjohns...@megatome.com> > You're trying to use FromCqlColumn on a tuple that has been flattened. The > schema still thinks it's {title: chararray}, but the flattened tuple is now > two values. I don't know how to retrieve the data values in this case. > > Your code will work correctly if you do this: > *values3 = FOREACH rows GENERATE FromCqlColumn(title) AS title;* > *dump values3;* > *describe values3;* > > (Use FromCqlColumn on the original data, not the flattened data.) > > Chad > > > On Mon, Sep 2, 2013 at 8:45 AM, Miguel Angel Martin junquera < > mianmarjun.mailingl...@gmail.com> wrote: > >> Hi >> >> >> 1.- >> >> May be? >> >> -- Register the UDF >> REGISTER /path/to/cqlstorageudf-1.0-SNAPSHOT >> >> -- FromCqlColumn will convert chararray, int, long, float, double >> DEFINE FromCqlColumn com.megatome.pig.piggybank.tuple.FromCqlColumn(); >> >> -- Load data as normal >> data_raw = LOAD 'cql://bookcrossing/books' USING CqlStorage(); >> >> -- Use the UDF >> data = FOREACH data_raw GENERATE >> *FromCqlColumn*(isbn) AS ISBN, >> *FromCqlColumn*(bookauthor) AS BookAuthor, >> >> >> *FromCqlColumn*(booktitle) AS BookTitle, >> *FromCqlColumn*(publisher) AS Publisher, >> >> >> *FromCqlColumn*(yearofpublication) AS YearOfPublication; >> >> >> >> >> >> and 2.: >> >> with the data in cql cassandra 1.2.8, pig 0.11.11 and cql3: >> >> *CREATE KEYSPACE keyspace1* >> >> * WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' >> : 1 }* >> >> * AND durable_writes = true;* >> >> * >> * >> >> *use keyspace2;* >> >> * >> * >> >> * CREATE TABLE test (* >> >> * id text PRIMARY KEY,* >> >> * title text,* >> >> * age int* >> >> * ) WITH COMPACT STORAGE;* >> >> * >> * >> >> * >> * >> >> * insert into test (id, title, age) values('1', 'child', 21);* >> >> * insert into test (id, title, age) values('2', 'support', 21);* >> >> * insert into test (id, title, age) values('3', 'manager', 31);* >> >> * insert into test (id, title, age) values('4', 'QA', 41);* >> >> * insert into test (id, title, age) values('5', 'QA', 30);* >> >> * insert into test (id, title, age) values('6', 'QA', 30);* >> >> >> >> >> >> and script: >> >> * >> * >> *register './libs/cqlstorageudf-1.0-SNAPSHOT.jar';* >> *DEFINE FromCqlColumn com.megatome.pig.piggybank.tuple.FromCqlColumn();* >> *rows = LOAD >> 'cql://keyspace1/test?page_size=1&split_size=4&where_clause=age%3D30' USING >> CqlStorage();* >> *dump rows;* >> *ILLUSTRATE rows;* >> *describe rows;* >> *A = FOREACH rows GENERATE FLATTEN(title);* >> *dump A;* >> *values3 = FOREACH A GENERATE FromCqlColumn(title) AS title;* >> *dump values3;* >> *describe values3;* >> >> >> -- >> >> >> >> I have this error: >> >> >> >> >> .... >> >> ------------------------------------------------------------- >> | rows | id:chararray | age:int | title:chararray | >> ------------------------------------------------------------- >> | | (id, 5) | (age, 30) | (title, QA) | >> ------------------------------------------------------------- >> >> rows: {id: chararray,age: int,title: chararray} >> >> >> ... >> >> (title,QA) >> (title,QA) >> .. >> 2013-09-02 16:40:52,454 [Thread-11] WARN >> org.apache.hadoop.mapred.LocalJobRunner - job_local_0003 >> *java.lang.ClassCastException: java.lang.String cannot be cast to >> org.apache.pig.data.Tuple* >> at com.megatome.pig.piggybank.tuple.ColumnBase.exec(ColumnBase.java:32) >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:337) >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:434) >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:340) >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372) >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) >> at >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) >> 2013-09-02 16:40:52,832 [main] INFO >> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >> - HadoopJobId: job_local_0003 >> >> >> >> 8-| >> >> Regards >> >> ... >> >> >> Miguel Angel Martín Junquera >> Analyst Engineer. >> miguelangel.mar...@brainsins.com >> >> >> >> 2013/9/2 Miguel Angel Martin junquera <mianmarjun.mailingl...@gmail.com> >> >>> hi all: >>> >>> More info : >>> >>> https://issues.apache.org/jira/browse/CASSANDRA-5941 >>> >>> >>> >>> I tried this (and gen. cassandra 1.2.9) but do not work for me, >>> >>> git clone http://git-wip-us.apache.org/repos/asf/cassandra.git >>> cd cassandra >>> git checkout cassandra-1.2 >>> patch -p1 < 5867-bug-fix-filter-push-down-1.2-branch.txt >>> ant >>> >>> >>> >>> Miguel Angel Martín Junquera >>> Analyst Engineer. >>> miguelangel.mar...@brainsins.com >>> >>> >>> >>> 2013/9/2 Miguel Angel Martin junquera <mianmarjun.mailingl...@gmail.com> >>> >>>> *good/nice job !!!* >>>> * >>>> * >>>> * >>>> * >>>> *I'd testing with an udf only with string schema type this is better >>>> and elaborate work..* >>>> * >>>> * >>>> *Regads* >>>> >>>> >>>> Miguel Angel Martín Junquera >>>> Analyst Engineer. >>>> miguelangel.mar...@brainsins.com >>>> >>>> >>>> >>>> 2013/8/31 Chad Johnston <cjohns...@megatome.com> >>>> >>>>> I threw together a quick UDF to work around this issue. It just >>>>> extracts the value portion of the tuple while taking advantage of the >>>>> CqlStorage generated schema to keep the type correct. >>>>> >>>>> You can get it here: https://github.com/iamthechad/cqlstorage-udf >>>>> >>>>> I'll see if I can find more useful information and open a defect, >>>>> since that's what this seems to be. >>>>> >>>>> Chad >>>>> >>>>> >>>>> On Fri, Aug 30, 2013 at 2:02 AM, Miguel Angel Martin junquera < >>>>> mianmarjun.mailingl...@gmail.com> wrote: >>>>> >>>>>> I try this: >>>>>> >>>>>> *rows = LOAD >>>>>> 'cql://keyspace1/test?page_size=1&split_size=4&where_clause=age%3D30' >>>>>> USING >>>>>> CqlStorage();* >>>>>> >>>>>> *dump rows;* >>>>>> >>>>>> *ILLUSTRATE rows;* >>>>>> >>>>>> *describe rows;* >>>>>> >>>>>> * >>>>>> * >>>>>> >>>>>> *values2= FOREACH rows GENERATE TOTUPLE (id) as >>>>>> (mycolumn:tuple(name,value));* >>>>>> >>>>>> *dump values2;* >>>>>> >>>>>> *describe values2;* >>>>>> * >>>>>> * >>>>>> >>>>>> But I get this results: >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------- >>>>>> | rows | id:chararray | age:int | title:chararray | >>>>>> ------------------------------------------------------------- >>>>>> | | (id, 6) | (age, 30) | (title, QA) | >>>>>> ------------------------------------------------------------- >>>>>> >>>>>> rows: {id: chararray,age: int,title: chararray} >>>>>> 2013-08-30 09:54:37,831 [main] ERROR org.apache.pig.tools.grunt.Grunt >>>>>> - ERROR 1031: Incompatable field schema: left is >>>>>> "tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray))", right is >>>>>> "org.apache.pig.builtin.totuple_id_1:tuple(id:chararray)" >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> or >>>>>> >>>>>> >>>>>> >>>>>> .... >>>>>> >>>>>> *values2= FOREACH rows GENERATE TOTUPLE (id) ;* >>>>>> *dump values2;* >>>>>> *describe values2;* >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> and the results are: >>>>>> >>>>>> >>>>>> ... >>>>>> (((id,6))) >>>>>> (((id,5))) >>>>>> values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)} >>>>>> >>>>>> >>>>>> >>>>>> Aggg!!!!! >>>>>> >>>>>> >>>>>> * >>>>>> * >>>>>> >>>>>> >>>>>> >>>>>> Miguel Angel Martín Junquera >>>>>> Analyst Engineer. >>>>>> miguelangel.mar...@brainsins.com >>>>>> >>>>>> >>>>>> >>>>>> 2013/8/26 Miguel Angel Martin junquera < >>>>>> mianmarjun.mailingl...@gmail.com> >>>>>> >>>>>>> hi Chad . >>>>>>> >>>>>>> I have this issue >>>>>>> >>>>>>> I send a mail to user-pig-list and I still i can resolve this, and >>>>>>> I can not access to column values. >>>>>>> In this mail I write some things that I try without results... and >>>>>>> information about this issue. >>>>>>> >>>>>>> >>>>>>> >>>>>>> http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3ccajeg_hq9s2po3_xytzx5xki4j1mao8q26jydg2wndy_kyiv...@mail.gmail.com%3E >>>>>>> >>>>>>> >>>>>>> >>>>>>> I hope someOne reply one comment, idea or solution about this >>>>>>> issue or bug. >>>>>>> >>>>>>> >>>>>>> I have reviewed the CqlStorage class in code cassandra 1.2.8 but i >>>>>>> do not have configure the environmetn to debug and trace this issue. >>>>>>> >>>>>>> Only I find some comments like, but I do not understand at all. >>>>>>> >>>>>>> >>>>>>> /** >>>>>>> >>>>>>> * A LoadStoreFunc for retrieving data from and storing data to >>>>>>> Cassandra >>>>>>> >>>>>>> * >>>>>>> >>>>>>> * A row from a standard CF will be returned as nested tuples: >>>>>>> >>>>>>> * (((key1, value1), (key2, value2)), ((name1, val1), (name2, >>>>>>> val2))). >>>>>>> */ >>>>>>> >>>>>>> >>>>>>> I you found some idea or solution, please post it >>>>>>> >>>>>>> thanks >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2013/8/23 Chad Johnston <cjohns...@megatome.com> >>>>>>> >>>>>>>> (I'm using Cassandra 1.2.8 and Pig 0.11.1) >>>>>>>> >>>>>>>> I'm loading some simple data from Cassandra into Pig using >>>>>>>> CqlStorage. The CqlStorage loader defines a Pig schema based on the >>>>>>>> Cassandra schema, but it seems to be wrong. >>>>>>>> >>>>>>>> If I do: >>>>>>>> >>>>>>>> data = LOAD 'cql://bookdata/books' USING CqlStorage(); >>>>>>>> DESCRIBE data; >>>>>>>> >>>>>>>> I get this: >>>>>>>> >>>>>>>> data: {isbn: chararray,bookauthor: chararray,booktitle: >>>>>>>> chararray,publisher: chararray,yearofpublication: int} >>>>>>>> >>>>>>>> However, if I DUMP data, I get results like these: >>>>>>>> >>>>>>>> ((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in >>>>>>>> the Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986)) >>>>>>>> >>>>>>>> Clearly the results from Cassandra are key/value pairs, as would be >>>>>>>> expected. I don't know why the schema generated by CqlStorage() would >>>>>>>> be so >>>>>>>> different. >>>>>>>> >>>>>>>> This is really causing me problems trying to access the column >>>>>>>> values. I tried a naive approach of FLATTENing each tuple, then trying >>>>>>>> to >>>>>>>> access the values that way: >>>>>>>> >>>>>>>> flattened = FOREACH data GENERATE >>>>>>>> FLATTEN(isbn), >>>>>>>> FLATTEN(booktitle), >>>>>>>> ... >>>>>>>> values = FOREACH flattened GENERATE >>>>>>>> $1 AS ISBN, >>>>>>>> $3 AS BookTitle, >>>>>>>> ... >>>>>>>> >>>>>>>> As soon as I try to access field $5, Pig complains about the index >>>>>>>> being out of bounds. >>>>>>>> >>>>>>>> Is there a way to solve the schema/reality mismatch? Am I doing >>>>>>>> something wrong, or have I stumbled across a defect? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Chad >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >