2) should be the right approach, more concisely, you can say:
processed = foreach flatTuple generate com.myUDF.UDF(*)

It seems to be something wrong with your schema propagation. How do you generate inputBag? If it is generated by UDF, make sure your UDF declare proper output schema. You can check the schema by using "describe".

Daniel

On 03/22/2011 09:30 PM, deepak kumar v wrote:
Hi Daniel,

I have a bag of tuples
inputBag =
{ (day, age, name, address,  ['k1#v1','k2#v2']),
(12/2,22,deepak,newyork,  ['k1#v1','k2#v2']),
(12/3,22,deepak,newjersy,  ['k1#v1','k2#v2'])
}

I need to invoke a UDF for each tuple, so i have to flatten the bag which i do as

flatTuples = foreach inputBag generate FLATTEN($0)

Now i get a list of tuples
(day, age, name, address,  ['k1#v1','k2#v2']),
(12/2,22,deepak,newyork,  ['k1#v1','k2#v2']),
(12/3,22,deepak,newjersy,  ['k1#v1','k2#v2'])

I tried few options to invoke my UDF for each tuple

1)
processed = foreach flatTuple generate com.myUDF.UDF($0).
I am expecting $0 will point to entire tuple (day, age, name, address, ['k1#v1','k2#v2']), but $0 within my UDF returns only (day, age, name, address), *For some unknown reason the map is not passed into UDF.*

2)
As option 1 did not work, i assumed that $0 points to item0 , $1 points to item1 and so on of the flattened tuples. As a result
processed = foreach flatTuple generate com.myUDF.UDF($0, $1, $2, $3, $4).
would pass each of the item of input tuple into the UDF, But this threw the following error
$1 i get "Out of bound access. Trying to access non-existent column: 1.
Schema {bytearray} has 1 column(s)",

3)
As above option did not worked i tried
processed = foreach flatTuple generate com.myUDF.UDF($0.$0, $0.$1, $0.$2, $0.$3, $0.$4). Assuming $0 would point to input tuple and each of $0, $1... would point to individual items in the the tuple
But this threw the following error
$0.$1 throws java.lang.ClassCastException: java.lang.String cannot be cast
to org.apache.pig.data.Tuple
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:389)

Regards,
Deepak


item 0-3 are of type char array and item4 is a map.

I iterate through these tuples

On Fri, Mar 18, 2011 at 8:48 AM, Daniel Dai <jiany...@yahoo-inc.com <mailto:jiany...@yahoo-inc.com>> wrote:

    Hi, Deepak,
    Can you be more specific? I did some simple test and cannot
    reproduce. What is your query? UDF?

    Daniel


    On 03/16/2011 11:24 PM, deepak kumar v wrote:

        Hi,
        Below are list of tuples generated after flattening a bag .

        (day, age, name, address,  ['k1#v1','k2#v2']),
        (12/2,22,deepak,newyork,  ['k1#v1','k2#v2']),
        (12/3,22,deepak,newjersy,  ['k1#v1','k2#v2'])

        process = foreach inputs generate com.myUDF.UDF($0);
        Here $0 some how gets only (day, age, name, address) and the
        map is skipped.
        *How can i access the map? *
        With
        $1 i get "Out of bound access. Trying to access non-existent
        column: 1.
        Schema {bytearray} has 1 column(s)",
        $0.$1 throws java.lang.ClassCastException: java.lang.String
        cannot be cast
        to org.apache.pig.data.Tuple
        at
        
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:389)

        Also,
        With
        tuples = foreach flattenedTuples generate $0
        generates
        (day, age, name, address),
        (12/2,22,deepak,newyork),
        (12/3,22,deepak,newjersy)

        After flatenning if i dump, i see the map in the resultant
        tuples, but $0
        instead referring to entire tuple, referes only to data part
        (map skipped)
        Regards,
        Deepak




Reply via email to