Ack, hit enter. I'd look at the LoadFunc interface, the PigSTorage class,
and if you can't make it work without playing a little, let me know.


2013/3/19 Jonathan Coveney <jcove...@gmail.com>

> doing "new PigStorage()" is possible, but tricky. Maybe some of the other
> contributors have an easier way of doing this, but in the short term, I'd
> work on getting that to work. It's mainly just making sure you initialize
> it properly.
>
>
> 2013/3/19 Dan DeCapria, CivicScience <dan.decap...@civicscience.com>
>
>> This would work, but the goal would be to *not* invoke local interactive
>> pig to execute a LOAD USING PigStorage() and pass the data into the UDF.
>>  I
>> was hoping to keep this completely in the Java and JUnit testing universe.
>>
>> Looking over the PigStorage()
>> doc<
>> https://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/PigStorage.html
>> >,
>> would you know how to construct this process from a baseline PigStorage
>> Object, such as:
>>
>> PigStorage pigstorage = new PigStorage();
>>
>> Any ideas?
>>
>> -Dan
>>
>> On Tue, Mar 19, 2013 at 12:08 PM, Jonathan Coveney <jcove...@gmail.com
>> >wrote:
>>
>> > I definitely understand the benefits, I just wanted to understand your
>> > workflow so could weigh in with what I would do.
>> >
>> > In your case, if you're going to be making these by hand, then I would
>> > mimic what PigStorage outputs, and then just load it in using
>> PigStorage.
>> >
>> >
>> > 2013/3/19 Dan DeCapria, CivicScience <dan.decap...@civicscience.com>
>> >
>> > > By hand; creating a new JUnit method to test a specific use case
>> against
>> > a
>> > > functional requirement in the UDF.
>> > >
>> > > The UDFs I am testing are part of a larger ETL testing initiative I
>> have
>> > > been undertaking.  To ensure that the various states of legacy data
>> are
>> > > correctly extracted and transformed into a Pig context, I am creating
>> > > specific JUnit tests per each UDF containing specific use cases as
>> > testing
>> > > methods.
>> > >
>> > > Motivation to use String inputs for the Data Objects and Schema
>> Objects
>> > is
>> > > the improvement on the conventional approach - creating Java Objects
>> and
>> > > adding and appending nested Objects to create the desired complex type
>> > > DataBag Object to pass to the UDF as use case input. This simpler
>> process
>> > > I'm looking for should improve scale-ability and rapid-prototyping
>> within
>> > > the testing scripts.  It will also make the process more approachable
>> for
>> > > another programmer to write additional unit tests.
>> > >
>> > > -Dan
>> > >
>> > > On Tue, Mar 19, 2013 at 11:43 AM, Jonathan Coveney <
>> jcove...@gmail.com
>> > > >wrote:
>> > >
>> > > > How are you planning on generating these cases? By hand? Or
>> automated?
>> > > >
>> > > >
>> > > > 2013/3/19 Dan DeCapria, CivicScience <dan.decap...@civicscience.com
>> >
>> > > >
>> > > > > String string_databag in this example was typed out by me, as the
>> > input
>> > > > > String for a JUnit test method. I am considering generating many
>> of
>> > > these
>> > > > > for case specific unit testing of my UDFs.
>> > > > >
>> > > > > -Dan
>> > > > >
>> > > > > On Tue, Mar 19, 2013 at 11:27 AM, Jonathan Coveney <
>> > jcove...@gmail.com
>> > > > > >wrote:
>> > > > >
>> > > > > > how was string_databag generated?
>> > > > > >
>> > > > > >
>> > > > > > 2013/3/19 Dan DeCapria, CivicScience <
>> > dan.decap...@civicscience.com>
>> > > > > >
>> > > > > > > Expanding upon this, the following use case's Schema Object
>> can
>> > be
>> > > > > > resolved
>> > > > > > > from inputs:
>> > > > > > >
>> > > > > > >         String string_databag = "{(a,(b,d),f)}";
>> > > > > > >         String string_schema =
>> > > > > > >
>> > > "b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}";
>> > > > > > >         Schema schema =
>> Utils.getSchemaFromString(string_schema);
>> > > > > > >
>> > > > > > > Next step is to resolve a DataBag Object from String
>> > string_databag
>> > > > and
>> > > > > > the
>> > > > > > > Schema Object.
>> > > > > > >
>> > > > > > > -Dan
>> > > > > > >
>> > > > > > > On Tue, Mar 19, 2013 at 9:37 AM, Dan DeCapria, CivicScience <
>> > > > > > > dan.decap...@civicscience.com> wrote:
>> > > > > > >
>> > > > > > > > Thank you for your reply.
>> > > > > > > >
>> > > > > > > > The problem is I cannot find a methodology to go from a
>> String
>> > > > > > > > representation of a complex data type to a nested Object of
>> pig
>> > > > > > > DataTypes.
>> > > > > > > > I looked over the pig 0.10.1 docs, but cannot find a way to
>> go
>> > > from
>> > > > > > > String
>> > > > > > > > and Schema to pig DataType Object.
>> > > > > > > >
>> > > > > > > > For context, I am generating these Strings for my own JUnit
>> > > testing
>> > > > > of
>> > > > > > > > other UDFs.  Currently, for complex types, I have to
>> generate
>> > > each
>> > > > > > > nesting
>> > > > > > > > from Tuple and DataBag factories, append data, and next them
>> > > > > manually.
>> > > > > > >  For
>> > > > > > > > larger unit tests, this process becomes unwieldy (hundreds
>> of
>> > > lines
>> > > > > per
>> > > > > > > > method, non-dynamic), and it would be much simpler to go
>> > directly
>> > > > > from
>> > > > > > a
>> > > > > > > > String and a Schema to a DataBag Object for UDF testing (few
>> > > lines
>> > > > of
>> > > > > > > code,
>> > > > > > > > easily modifiable).
>> > > > > > > >
>> > > > > > > > -Dan
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Mon, Mar 18, 2013 at 6:31 PM, Jonathan Coveney <
>> > > > > jcove...@gmail.com
>> > > > > > > >wrote:
>> > > > > > > >
>> > > > > > > >> Why not just use PigStorage? This is essentially what it
>> does.
>> > > It
>> > > > > > saves
>> > > > > > > a
>> > > > > > > >> bag as text, and then loads it again.
>> > > > > > > >>
>> > > > > > > >> I suppose the question becomes: why do you need to do this?
>> > > > > > > >>
>> > > > > > > >>
>> > > > > > > >> 2013/3/18 Dan DeCapria, CivicScience <
>> > > > dan.decap...@civicscience.com
>> > > > > >
>> > > > > > > >>
>> > > > > > > >> > In Java, I am trying to convert a DataBag from it's
>> String
>> > > > > > > >> representation
>> > > > > > > >> > with its schema String to a valid DataBag Object:
>> > > > > > > >> >
>> > > > > > > >> > String databag_string = "{(apples,1024)}";
>> > > > > > > >> > String schema_string =
>> > "b1:bag{t1:tuple(a:chararray,b:long)}";
>> > > > > > > >> >
>> > > > > > > >> > I've tried implementing something along the lines of
>> this,
>> > > but I
>> > > > > > > believe
>> > > > > > > >> > it's in the wrong direction, and then I get stuck:
>> > > > > > > >> >
>> > > > > > > >> >         String[] aliases = {"b1", "t1", "a", "b"};
>> > > > > > > >> >         byte[] types = {DataType.BAG, DataType.TUPLE,
>> > > > > > > >> DataType.CHARARRAY,
>> > > > > > > >> > DataType.LONG};
>> > > > > > > >> >         List<Schema.FieldSchema> fsList = new
>> > > > > > > >> > ArrayList<Schema.FieldSchema>();
>> > > > > > > >> >         for (int i = 0; i < aliases.length; i++) {
>> > > > > > > >> >             fsList.add(new Schema.FieldSchema(aliases[i],
>> > > > > > types[i])) ;
>> > > > > > > >> >         }
>> > > > > > > >> >         Schema origSchema = new Schema(fsList);
>> > > > > > > >> >         ResourceSchema rsSchema = new
>> > > > ResourceSchema(origSchema);
>> > > > > > > >> >         Schema genSchema = Schema.getPigSchema(rsSchema);
>> > > > > > > >> >         ResourceSchema.ResourceFieldSchema[] rfschema =
>> > > > > > > >> > rsSchema.getFields();
>> > > > > > > >> >         ... lost here, maybe Utf8StorageConverter c = new
>> > > > > > > >> > Utf8StorageConverter(); ???
>> > > > > > > >> >
>> > > > > > > >> >
>> > > > > > > >> > An ideal process would be along the lines of:
>> > > > > > > >> >
>> > > > > > > >> > DataBag d = BagFactory.getInstance().newDefaultBag();
>> > > > > > > >> > d.something(databag_string, schema_string);    // ??? no
>> > idea
>> > > > what
>> > > > > > > this
>> > > > > > > >> > process could be
>> > > > > > > >> > d.toString().equals(databag_string) == true.
>> > > > > > > >> >
>> > > > > > > >> > Thanks, -Dan
>> > > > > > > >> >
>> > > > > > > >>
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > > Dan DeCapria
>> > > > > > > > CivicScience, Inc.
>> > > > > > > > Senior Informatics / DM / ML / BI Specialist
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Dan DeCapria
>> > > > > > > CivicScience, Inc.
>> > > > > > > Senior Informatics / DM / ML / BI Specialist
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Dan DeCapria
>> > > > > CivicScience, Inc.
>> > > > > Senior Informatics / DM / ML / BI Specialist
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Dan DeCapria
>> > > CivicScience, Inc.
>> > > Senior Informatics / DM / ML / BI Specialist
>> > >
>> >
>>
>>
>>
>> --
>> Dan DeCapria
>> CivicScience, Inc.
>> Senior Informatics / DM / ML / BI Specialist
>>
>
>

Reply via email to