I believe the format of the FOREACH statement should be:

> B = FOREACH A GENERATE (long)$94 AS publisher, (chararray)$93 AS associate , 
> (long)$16 AS site, (long)$27 AS category,
>   (long)$23 AS story, (int)$2 AS hits, (int)$3 AS comments;


Hope that helps,
Bryce

On Oct 21, 2010, at 8:15 PM, Renato Marroquín Mogrovejo wrote:

> Hi Marcos, just a quick question, have you check whether or not your data
> has all the fields in all the rows? Maybe you are dealing with sparse data,
> but due to the amount of data you are not noticing it.
> First, what does your data look like? My choice would be to first try with a
> subset of the whole data, and then write my own UDF to parse, and retrieve
> just the values I want.
> 
> 
> Renato M.
> 
> 2010/10/20 Marcos Medrado Rubinelli <[email protected]>
> 
>> Hi everybody,
>> 
>> I'm trying to use vanilla Pig 0.7.0 to generate monthly consolidations of
>> log files with relatively long lines: 95 fields and growing, of which I'll
>> be using just 7. Just so I didn't have to declare all the fields in the LOAD
>> command, I tried to define the schema in my first FOREACH...GENERATE, so the
>> first lines of my script look like this:
>> 
>> input = LOAD '/tmp/test.log';
>> A = FILTER input BY SIZE(*) >= 95;
>> B = FOREACH A GENERATE (long)$94, (chararray)$93, (long)$16, (long)$27,
>>   (long)$23, (int)$2, (int)$3
>>   AS publisher, associate, site, category,
>>   story, hits, comments;
>> 
>> As you can guess by now, Pig complains while still parsing:
>> 
>> ERROR 1000: Error during parsing. Invalid alias: category in null
>> 
>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error
>> during parsing. Invalid alias: associate in null
>>   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1170)
>>   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
>>   at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
>>   at
>> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:73)
>> 
>> Am I overlooking anything? Should I give up and declare a 95-field schema?
>> Write a LOAD UDF? Or is there a simpler way to do what I want?
>> 
>> Thank you!
>> Marcos Rubinelli
>> 

Reply via email to