Hi Daniel,
I have a bag of tuples
inputBag =
{ (day, age, name, address, ['k1#v1','k2#v2']),
(12/2,22,deepak,newyork, ['k1#v1','k2#v2']),
(12/3,22,deepak,newjersy, ['k1#v1','k2#v2'])
}
I need to invoke a UDF for each tuple, so i have to flatten the bag
which i do as
flatTuples = foreach inputBag generate FLATTEN($0)
Now i get a list of tuples
(day, age, name, address, ['k1#v1','k2#v2']),
(12/2,22,deepak,newyork, ['k1#v1','k2#v2']),
(12/3,22,deepak,newjersy, ['k1#v1','k2#v2'])
I tried few options to invoke my UDF for each tuple
1)
processed = foreach flatTuple generate com.myUDF.UDF($0).
I am expecting $0 will point to entire tuple (day, age, name,
address, ['k1#v1','k2#v2']), but
$0 within my UDF returns only (day, age, name, address), *For some
unknown reason the map is not passed into UDF.*
2)
As option 1 did not work, i assumed that $0 points to item0 , $1
points to item1 and so on of the flattened tuples. As a result
processed = foreach flatTuple generate com.myUDF.UDF($0, $1, $2, $3, $4).
would pass each of the item of input tuple into the UDF, But this
threw the following error
$1 i get "Out of bound access. Trying to access non-existent column: 1.
Schema {bytearray} has 1 column(s)",
3)
As above option did not worked i tried
processed = foreach flatTuple generate com.myUDF.UDF($0.$0,
$0.$1, $0.$2, $0.$3, $0.$4).
Assuming $0 would point to input tuple and each of $0, $1... would
point to individual items in the the tuple
But this threw the following error
$0.$1 throws java.lang.ClassCastException: java.lang.String cannot be cast
to org.apache.pig.data.Tuple
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:389)
Regards,
Deepak
item 0-3 are of type char array and item4 is a map.
I iterate through these tuples
On Fri, Mar 18, 2011 at 8:48 AM, Daniel Dai <jiany...@yahoo-inc.com
<mailto:jiany...@yahoo-inc.com>> wrote:
Hi, Deepak,
Can you be more specific? I did some simple test and cannot
reproduce. What is your query? UDF?
Daniel
On 03/16/2011 11:24 PM, deepak kumar v wrote:
Hi,
Below are list of tuples generated after flattening a bag .
(day, age, name, address, ['k1#v1','k2#v2']),
(12/2,22,deepak,newyork, ['k1#v1','k2#v2']),
(12/3,22,deepak,newjersy, ['k1#v1','k2#v2'])
process = foreach inputs generate com.myUDF.UDF($0);
Here $0 some how gets only (day, age, name, address) and the
map is skipped.
*How can i access the map? *
With
$1 i get "Out of bound access. Trying to access non-existent
column: 1.
Schema {bytearray} has 1 column(s)",
$0.$1 throws java.lang.ClassCastException: java.lang.String
cannot be cast
to org.apache.pig.data.Tuple
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:389)
Also,
With
tuples = foreach flattenedTuples generate $0
generates
(day, age, name, address),
(12/2,22,deepak,newyork),
(12/3,22,deepak,newjersy)
After flatenning if i dump, i see the map in the resultant
tuples, but $0
instead referring to entire tuple, referes only to data part
(map skipped)
Regards,
Deepak