Schema for STRSPLIT output

Ashish Jain Wed, 25 Jun 2014 20:00:06 -0700

Hello,

Two lines from my data looks as follows -
a, b, c, d, e
a, b, c


I load the contents of my file and split each line based on ','.
x = LOAD 'file.txt' as content:chararray;
y = FOREACH x GENERATE STRSPLIT(content, ',') as tuple();

So my question is -
1) Is there any way I can specify the schema of y to be a tuple of various
numbers of chararrays? Something on the lines of y = FOREACH x GENERATE
STRSPLIT(content, ',') as tuple(chararray(*))
2) If I try to do the above in an UDF, how do I create output schema which
depends on the input? From my experiments, outputSchema() is called before
exec() so I can't specify the number of fields in my output schema.

The reason I am trying to do this is, once I get 'y', I want to write it to
elasticsearch. The hadoop-elasticsearch plugin has direct mapping from
chararray(pig)<->string(elasticsearch).

Thanks
Ashish

Schema for STRSPLIT output

Reply via email to