Expanding upon this, the following use case's Schema Object can be resolved from inputs:
String string_databag = "{(a,(b,d),f)}"; String string_schema = "b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}"; Schema schema = Utils.getSchemaFromString(string_schema); Next step is to resolve a DataBag Object from String string_databag and the Schema Object. -Dan On Tue, Mar 19, 2013 at 9:37 AM, Dan DeCapria, CivicScience < dan.decap...@civicscience.com> wrote: > Thank you for your reply. > > The problem is I cannot find a methodology to go from a String > representation of a complex data type to a nested Object of pig DataTypes. > I looked over the pig 0.10.1 docs, but cannot find a way to go from String > and Schema to pig DataType Object. > > For context, I am generating these Strings for my own JUnit testing of > other UDFs. Currently, for complex types, I have to generate each nesting > from Tuple and DataBag factories, append data, and next them manually. For > larger unit tests, this process becomes unwieldy (hundreds of lines per > method, non-dynamic), and it would be much simpler to go directly from a > String and a Schema to a DataBag Object for UDF testing (few lines of code, > easily modifiable). > > -Dan > > > On Mon, Mar 18, 2013 at 6:31 PM, Jonathan Coveney <jcove...@gmail.com>wrote: > >> Why not just use PigStorage? This is essentially what it does. It saves a >> bag as text, and then loads it again. >> >> I suppose the question becomes: why do you need to do this? >> >> >> 2013/3/18 Dan DeCapria, CivicScience <dan.decap...@civicscience.com> >> >> > In Java, I am trying to convert a DataBag from it's String >> representation >> > with its schema String to a valid DataBag Object: >> > >> > String databag_string = "{(apples,1024)}"; >> > String schema_string = "b1:bag{t1:tuple(a:chararray,b:long)}"; >> > >> > I've tried implementing something along the lines of this, but I believe >> > it's in the wrong direction, and then I get stuck: >> > >> > String[] aliases = {"b1", "t1", "a", "b"}; >> > byte[] types = {DataType.BAG, DataType.TUPLE, >> DataType.CHARARRAY, >> > DataType.LONG}; >> > List<Schema.FieldSchema> fsList = new >> > ArrayList<Schema.FieldSchema>(); >> > for (int i = 0; i < aliases.length; i++) { >> > fsList.add(new Schema.FieldSchema(aliases[i], types[i])) ; >> > } >> > Schema origSchema = new Schema(fsList); >> > ResourceSchema rsSchema = new ResourceSchema(origSchema); >> > Schema genSchema = Schema.getPigSchema(rsSchema); >> > ResourceSchema.ResourceFieldSchema[] rfschema = >> > rsSchema.getFields(); >> > ... lost here, maybe Utf8StorageConverter c = new >> > Utf8StorageConverter(); ??? >> > >> > >> > An ideal process would be along the lines of: >> > >> > DataBag d = BagFactory.getInstance().newDefaultBag(); >> > d.something(databag_string, schema_string); // ??? no idea what this >> > process could be >> > d.toString().equals(databag_string) == true. >> > >> > Thanks, -Dan >> > >> > > > > -- > Dan DeCapria > CivicScience, Inc. > Senior Informatics / DM / ML / BI Specialist > -- Dan DeCapria CivicScience, Inc. Senior Informatics / DM / ML / BI Specialist