This has been fixed in pig 0.9 . Pig 0.9 should get released in few days.

You can also build it from svn -
svn co http://svn.apache.org/repos/asf/pig/branches/branch-0.9; cd branch-0.9; ant

-Thejas


On 7/18/11 12:31 PM, Jameson Lopp wrote:
I'm loading sequence files, of which each row's 'value' is a tab
delimited set of columns. I'm exploding the values out so that I can
work with them separately, but pig's syntax parser is giving me a hard
time.

-----------------------------------------------------------------
logs = LOAD '/data/2011-07-17/part-*' USING SequenceFileLoader;
logs = FOREACH logs GENERATE
$0,
FLATTEN(STRSPLIT ($1, '\t'));

opens = FILTER logs BY $3 == 'open';
-----------------------------------------------------------------

gets me a syntax error:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during
parsing. Out of bound access. Trying to access non-existent column: 16.
Schema {bytearray,bytearray} has 2 column(s).

which makes sense because if I do a :
grunt> describe logs;
logs: {bytearray,bytearray}

But... I KNOW that $3 exists because I have dumped that data during my
debugging and the split / flatten are working as expected... how do I
tell pig that there are more columns?

Reply via email to