Expanding upon this, the following use case's Schema Object can be resolved
from inputs:

        String string_databag = "{(a,(b,d),f)}";
        String string_schema =
"b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}";
        Schema schema = Utils.getSchemaFromString(string_schema);

Next step is to resolve a DataBag Object from String string_databag and the
Schema Object.

-Dan

On Tue, Mar 19, 2013 at 9:37 AM, Dan DeCapria, CivicScience <
dan.decap...@civicscience.com> wrote:

> Thank you for your reply.
>
> The problem is I cannot find a methodology to go from a String
> representation of a complex data type to a nested Object of pig DataTypes.
> I looked over the pig 0.10.1 docs, but cannot find a way to go from String
> and Schema to pig DataType Object.
>
> For context, I am generating these Strings for my own JUnit testing of
> other UDFs.  Currently, for complex types, I have to generate each nesting
> from Tuple and DataBag factories, append data, and next them manually.  For
> larger unit tests, this process becomes unwieldy (hundreds of lines per
> method, non-dynamic), and it would be much simpler to go directly from a
> String and a Schema to a DataBag Object for UDF testing (few lines of code,
> easily modifiable).
>
> -Dan
>
>
> On Mon, Mar 18, 2013 at 6:31 PM, Jonathan Coveney <jcove...@gmail.com>wrote:
>
>> Why not just use PigStorage? This is essentially what it does. It saves a
>> bag as text, and then loads it again.
>>
>> I suppose the question becomes: why do you need to do this?
>>
>>
>> 2013/3/18 Dan DeCapria, CivicScience <dan.decap...@civicscience.com>
>>
>> > In Java, I am trying to convert a DataBag from it's String
>> representation
>> > with its schema String to a valid DataBag Object:
>> >
>> > String databag_string = "{(apples,1024)}";
>> > String schema_string = "b1:bag{t1:tuple(a:chararray,b:long)}";
>> >
>> > I've tried implementing something along the lines of this, but I believe
>> > it's in the wrong direction, and then I get stuck:
>> >
>> >         String[] aliases = {"b1", "t1", "a", "b"};
>> >         byte[] types = {DataType.BAG, DataType.TUPLE,
>> DataType.CHARARRAY,
>> > DataType.LONG};
>> >         List<Schema.FieldSchema> fsList = new
>> > ArrayList<Schema.FieldSchema>();
>> >         for (int i = 0; i < aliases.length; i++) {
>> >             fsList.add(new Schema.FieldSchema(aliases[i], types[i])) ;
>> >         }
>> >         Schema origSchema = new Schema(fsList);
>> >         ResourceSchema rsSchema = new ResourceSchema(origSchema);
>> >         Schema genSchema = Schema.getPigSchema(rsSchema);
>> >         ResourceSchema.ResourceFieldSchema[] rfschema =
>> > rsSchema.getFields();
>> >         ... lost here, maybe Utf8StorageConverter c = new
>> > Utf8StorageConverter(); ???
>> >
>> >
>> > An ideal process would be along the lines of:
>> >
>> > DataBag d = BagFactory.getInstance().newDefaultBag();
>> > d.something(databag_string, schema_string);    // ??? no idea what this
>> > process could be
>> > d.toString().equals(databag_string) == true.
>> >
>> > Thanks, -Dan
>> >
>>
>
>
>
> --
> Dan DeCapria
> CivicScience, Inc.
> Senior Informatics / DM / ML / BI Specialist
>



-- 
Dan DeCapria
CivicScience, Inc.
Senior Informatics / DM / ML / BI Specialist

Reply via email to