Re: CqlStorage creates wrong schema for Pig

Miguel Angel Martin junquera Mon, 02 Sep 2013 06:10:18 -0700

hi all:

More info :


https://issues.apache.org/jira/browse/CASSANDRA-5941



I tried this (and gen. cassandra 1.2.9)  but do not work for me,

git clone http://git-wip-us.apache.org/repos/asf/cassandra.git
cd cassandra
git checkout cassandra-1.2
patch -p1 < 5867-bug-fix-filter-push-down-1.2-branch.txt
ant



Miguel Angel Martín Junquera
Analyst Engineer.
[email protected]



2013/9/2 Miguel Angel Martin junquera <[email protected]>

> *good/nice job !!!*
> *
> *
> *
> *
> *I'd testing with an udf only with  string schema type  this is better
> and elaborate work..*
> *
> *
> *Regads*
>
>
> Miguel Angel Martín Junquera
> Analyst Engineer.
> [email protected]
>
>
>
> 2013/8/31 Chad Johnston <[email protected]>
>
>> I threw together a quick UDF to work around this issue. It just extracts
>> the value portion of the tuple while taking advantage of the CqlStorage
>> generated schema to keep the type correct.
>>
>> You can get it here: https://github.com/iamthechad/cqlstorage-udf
>>
>> I'll see if I can find more useful information and open a defect, since
>> that's what this seems to be.
>>
>> Chad
>>
>>
>> On Fri, Aug 30, 2013 at 2:02 AM, Miguel Angel Martin junquera <
>> [email protected]> wrote:
>>
>>> I try this:
>>>
>>> *rows = LOAD
>>> 'cql://keyspace1/test?page_size=1&split_size=4&where_clause=age%3D30' USING
>>> CqlStorage();*
>>>
>>> *dump rows;*
>>>
>>> *ILLUSTRATE rows;*
>>>
>>> *describe rows;*
>>>
>>> *
>>> *
>>>
>>> *values2= FOREACH rows GENERATE  TOTUPLE (id) as
>>> (mycolumn:tuple(name,value));*
>>>
>>> *dump values2;*
>>>
>>> *describe values2;*
>>> *
>>> *
>>>
>>> But I get this results:
>>>
>>>
>>>
>>> -------------------------------------------------------------
>>> | rows     | id:chararray   | age:int   | title:chararray   |
>>> -------------------------------------------------------------
>>> |          | (id, 6)        | (age, 30) | (title, QA)       |
>>> -------------------------------------------------------------
>>>
>>> rows: {id: chararray,age: int,title: chararray}
>>> 2013-08-30 09:54:37,831 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>>> ERROR 1031: Incompatable field schema: left is
>>> "tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray))", right is
>>> "org.apache.pig.builtin.totuple_id_1:tuple(id:chararray)"
>>>
>>>
>>>
>>>
>>>
>>> or
>>>
>>>
>>>
>>> ....
>>>
>>> *values2= FOREACH rows GENERATE  TOTUPLE (id) ;*
>>> *dump values2;*
>>> *describe values2;*
>>>
>>>
>>>
>>>
>>> and  the results are:
>>>
>>>
>>> ...
>>> (((id,6)))
>>> (((id,5)))
>>> values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)}
>>>
>>>
>>>
>>> Aggg!!!!!
>>>
>>>
>>> *
>>> *
>>>
>>>
>>>
>>> Miguel Angel Martín Junquera
>>> Analyst Engineer.
>>> [email protected]
>>>
>>>
>>>
>>> 2013/8/26 Miguel Angel Martin junquera <[email protected]
>>> >
>>>
>>>> hi Chad .
>>>>
>>>> I have this issue
>>>>
>>>> I send a mail to user-pig-list and  I still i can resolve this, and I
>>>> can not  access to column values.
>>>> In this mail  I write some things that I try without results... and
>>>> information about this issue.
>>>>
>>>>
>>>>
>>>> http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3ccajeg_hq9s2po3_xytzx5xki4j1mao8q26jydg2wndy_kyiv...@mail.gmail.com%3E
>>>>
>>>>
>>>>
>>>> I hope  someOne reply  one comment, idea or  solution about  this issue
>>>> or bug.
>>>>
>>>>
>>>> I have reviewed the CqlStorage class in code cassandra 1.2.8  but i do
>>>> not have configure the environmetn to debug  and trace this issue.
>>>>
>>>> Only  I find some comments like, but I do not understand at all.
>>>>
>>>>
>>>> /**
>>>>
>>>>  * A LoadStoreFunc for retrieving data from and storing data to
>>>> Cassandra
>>>>
>>>>  *
>>>>
>>>>  * A row from a standard CF will be returned as nested tuples:
>>>>
>>>>  * (((key1, value1), (key2, value2)), ((name1, val1), (name2, val2))).
>>>>  */
>>>>
>>>>
>>>> I you found some idea or solution, please post it
>>>>
>>>> thanks
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 2013/8/23 Chad Johnston <[email protected]>
>>>>
>>>>> (I'm using Cassandra 1.2.8 and Pig 0.11.1)
>>>>>
>>>>> I'm loading some simple data from Cassandra into Pig using CqlStorage.
>>>>> The CqlStorage loader defines a Pig schema based on the Cassandra schema,
>>>>> but it seems to be wrong.
>>>>>
>>>>> If I do:
>>>>>
>>>>> data = LOAD 'cql://bookdata/books' USING CqlStorage();
>>>>> DESCRIBE data;
>>>>>
>>>>> I get this:
>>>>>
>>>>> data: {isbn: chararray,bookauthor: chararray,booktitle:
>>>>> chararray,publisher: chararray,yearofpublication: int}
>>>>>
>>>>> However, if I DUMP data, I get results like these:
>>>>>
>>>>> ((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in
>>>>> the Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986))
>>>>>
>>>>> Clearly the results from Cassandra are key/value pairs, as would be
>>>>> expected. I don't know why the schema generated by CqlStorage() would be 
>>>>> so
>>>>> different.
>>>>>
>>>>> This is really causing me problems trying to access the column values.
>>>>> I tried a naive approach of FLATTENing each tuple, then trying to access
>>>>> the values that way:
>>>>>
>>>>> flattened = FOREACH data GENERATE
>>>>>   FLATTEN(isbn),
>>>>>   FLATTEN(booktitle),
>>>>>   ...
>>>>> values = FOREACH flattened GENERATE
>>>>>   $1 AS ISBN,
>>>>>   $3 AS BookTitle,
>>>>>   ...
>>>>>
>>>>> As soon as I try to access field $5, Pig complains about the index
>>>>> being out of bounds.
>>>>>
>>>>> Is there a way to solve the schema/reality mismatch? Am I doing
>>>>> something wrong, or have I stumbled across a defect?
>>>>>
>>>>> Thanks,
>>>>> Chad
>>>>>
>>>>
>>>>
>>>
>>
>

Re: CqlStorage creates wrong schema for Pig

Reply via email to