Yep, getSchemaFromString is what I was looking for, but I can't get it to generate a schema (for unit test purposes) that matches what I get inside my script during a real run.
As an example, say I have a file like this: foo\t2 bar\t3 baz\t3 marge\t4 homer\t4 and I load it like this: infile = load 'test.txt' as (name:chararray, weight:int); grouped = group infile all; bucketed = foreach grouped generate flatten(Buckets(infile)); the outputSchema method of my UDF (Buckets) gets called with a schema that stringifies like so: {infile: {name: chararray,weight: int}} i.e. it has a single field, which is a bag, containing two elements directly (no wrapping tuple, presumably because this is Pig 0.8.1?). (sidenote, I guess the outermost {}s are a display convention, as there's only one bag there) When I'm unit-testing the UDF's outputSchema method, I'd like to generate exactly that schema. But if I call getSchemaFromString like this: Utils.getSchemaFromString("B: {f1: chararray, f2: int}") It throws a parser error: Encountered " "{" "{ "" at line 1, column 4. Was expecting one of: "int" ... "long" ... "float" ... "double" ... "chararray" ... "bytearray" ... "int" ... "long" ... "float" ... "double" ... "chararray" ... "bytearray" ... Two questions I guess. (1) Is there a way of generating a schema like that via Utils? (2) ... or is this schema actually wrong, and I'm looking at a symptom of https://issues.apache.org/jira/browse/PIG-767 that would behave differently if I was in Pig 0.9.0? Many thanks, Andrew. On 4 October 2011 00:14, Raghu Angadi <rang...@apache.org> wrote: > Utils.getSchemaFromString() seems like exactly what you want ( > from org_apache_pig_impl_util ). > > Raghu. > > [btw. my two previous attempts to send to the list got rejected as spam ] > > On Mon, Oct 3, 2011 at 3:41 PM, Andrew Clegg > <andrew.clegg+mah...@gmail.com>wrote: > >> Thanks Raghu (and Dmitry). >> >> Could this maybe get added to the docs page on UDFs? (Apologies if >> it's there already and I missed it.) >> >> Also -- it's a bit cumbersome writing all these nested Schema and >> FieldSchema constructors, especially when you're writing tests for >> UDFs with flexible schema support. >> >> I was wondering if it would be practical to reuse whatever code the >> front-end uses to parse schema descriptions from load statements in >> scripts. Is this a silly idea? If it isn't silly, does anyone know >> where I need to look for that code? >> >> >> On 3 October 2011 22:56, Raghu Angadi <ang...@gmail.com> wrote: >> > my understanding is that Pig 0.8 expects the first form and Pig 0.9 >> requires >> > the second. >> > >> > Raghu. >> > >> > On Mon, Oct 3, 2011 at 8:27 AM, Andrew Clegg >> > <andrew.clegg+mah...@gmail.com>wrote: >> > >> >> Hi, >> >> >> >> When you have a UDF that returns a bag, and you're writing the >> >> outputSchema method, do you have to explicitly include the mandatory >> >> 'container' tuple within the bag, or is this implicit? >> >> >> >> i.e. if I'm returning a bag of ints, do I have to do: >> >> >> >> return new Schema( >> >> new FieldSchema(null, >> >> new Schema( >> >> new FieldSchema(null, DataType.INTEGER)), DataType.BAG)); >> >> >> >> Or do I have to explicitly define a tuple like so: >> >> >> >> return new Schema( >> >> new FieldSchema(null, >> >> new Schema( >> >> new FieldSchema(null, >> >> new Schema( >> >> new FieldSchema(null, DataType.INTEGER)), DataType.TUPLE)), >> >> DataType.BAG)); >> >> >> >> The docs seem pretty vague on this, and you're allowed to do either. >> >> My feeling would be that if the first form was illegal, you wouldn't >> >> be allowed to create a schema like that, but this may be wishful >> >> thinking. >> >> >> >> Thanks, >> >> >> >> Andrew. >> >> >> >> -- >> >> >> >> http://tinyurl.com/andrew-clegg-linkedin | >> http://twitter.com/andrew_clegg >> >> >> > >> >> >> >> -- >> >> http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg >> > -- http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg