Hi all,
I'm walking through a pig script in grunt, but I am getting stuck with some
issues using nested foreach. I'm using Pig version 0.9.2
I'm trying to find the number of unique users from a bag 'top100'
grunt> describe top100
top100: {name: chararray,licenses: long,instance: chararray,transactions:
long,users: {(projected::userId: chararray)},runTimes: {(projected::runTime:
double)}}
grunt> uu = foreach top100 {
>> uniqUsers = distinct users;
>> generate uniqUsers as uniqUsers;
>> }
ERROR 1200: Pig script failed to parse:
<line 132, column 9> Invalid scalar projection: uniqUsers : A column needs
to be projected from a relation for it to be used as a scalar
I realized that I had defined uniqUsers earlier, but I didn't think it would
conflict inside the nested foreach block. The schema for uniqUsers is:
grunt> describe uniqUsers
uniqUsers: {key: chararray,uniqUsers: long}
I tried a different alias for the distinct clause and it seems to work.
grunt> uu = foreach top100 {
>> un = distinct users;
>> generate un as uniqUsers;
>> }
grunt> describe uu
uu: {un: {(projected::userId: chararray)}}
grunt> uu = foreach top100 {
>> un = distinct users;
>> generate COUNT(un) as uniqUsers;
>> }
grunt> describe uu
uu: {uniqUsers: long}
I was curious, so I tried the following, but I do not understand what the
results are.
grunt> u2 = foreach top100 {
>> uniqUsers = distinct users;
>> generate uniqUsers.key;
>> }
grunt> describe u2
u2: {projected::userId: chararray}
grunt> u3 = foreach top100 {
>> uniqUsers = distinct users;
>> generate uniqUsers.uniqUsers;
>> }
grunt> describe u3
u3: {projected::userId: chararray}
Specifically, what is actually in the result of u3? Why is it a chararray
when uniqUsers.uniqUsers is a long? Why is the alias still
projected::userId?
Thanks for any help!
-Chun
PS Sorry for the double post, I accidentally hit a keyboard shortcut for
Send.