Ack, hit enter. I'd look at the LoadFunc interface, the PigSTorage class, and if you can't make it work without playing a little, let me know.
2013/3/19 Jonathan Coveney <jcove...@gmail.com> > doing "new PigStorage()" is possible, but tricky. Maybe some of the other > contributors have an easier way of doing this, but in the short term, I'd > work on getting that to work. It's mainly just making sure you initialize > it properly. > > > 2013/3/19 Dan DeCapria, CivicScience <dan.decap...@civicscience.com> > >> This would work, but the goal would be to *not* invoke local interactive >> pig to execute a LOAD USING PigStorage() and pass the data into the UDF. >> I >> was hoping to keep this completely in the Java and JUnit testing universe. >> >> Looking over the PigStorage() >> doc< >> https://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/PigStorage.html >> >, >> would you know how to construct this process from a baseline PigStorage >> Object, such as: >> >> PigStorage pigstorage = new PigStorage(); >> >> Any ideas? >> >> -Dan >> >> On Tue, Mar 19, 2013 at 12:08 PM, Jonathan Coveney <jcove...@gmail.com >> >wrote: >> >> > I definitely understand the benefits, I just wanted to understand your >> > workflow so could weigh in with what I would do. >> > >> > In your case, if you're going to be making these by hand, then I would >> > mimic what PigStorage outputs, and then just load it in using >> PigStorage. >> > >> > >> > 2013/3/19 Dan DeCapria, CivicScience <dan.decap...@civicscience.com> >> > >> > > By hand; creating a new JUnit method to test a specific use case >> against >> > a >> > > functional requirement in the UDF. >> > > >> > > The UDFs I am testing are part of a larger ETL testing initiative I >> have >> > > been undertaking. To ensure that the various states of legacy data >> are >> > > correctly extracted and transformed into a Pig context, I am creating >> > > specific JUnit tests per each UDF containing specific use cases as >> > testing >> > > methods. >> > > >> > > Motivation to use String inputs for the Data Objects and Schema >> Objects >> > is >> > > the improvement on the conventional approach - creating Java Objects >> and >> > > adding and appending nested Objects to create the desired complex type >> > > DataBag Object to pass to the UDF as use case input. This simpler >> process >> > > I'm looking for should improve scale-ability and rapid-prototyping >> within >> > > the testing scripts. It will also make the process more approachable >> for >> > > another programmer to write additional unit tests. >> > > >> > > -Dan >> > > >> > > On Tue, Mar 19, 2013 at 11:43 AM, Jonathan Coveney < >> jcove...@gmail.com >> > > >wrote: >> > > >> > > > How are you planning on generating these cases? By hand? Or >> automated? >> > > > >> > > > >> > > > 2013/3/19 Dan DeCapria, CivicScience <dan.decap...@civicscience.com >> > >> > > > >> > > > > String string_databag in this example was typed out by me, as the >> > input >> > > > > String for a JUnit test method. I am considering generating many >> of >> > > these >> > > > > for case specific unit testing of my UDFs. >> > > > > >> > > > > -Dan >> > > > > >> > > > > On Tue, Mar 19, 2013 at 11:27 AM, Jonathan Coveney < >> > jcove...@gmail.com >> > > > > >wrote: >> > > > > >> > > > > > how was string_databag generated? >> > > > > > >> > > > > > >> > > > > > 2013/3/19 Dan DeCapria, CivicScience < >> > dan.decap...@civicscience.com> >> > > > > > >> > > > > > > Expanding upon this, the following use case's Schema Object >> can >> > be >> > > > > > resolved >> > > > > > > from inputs: >> > > > > > > >> > > > > > > String string_databag = "{(a,(b,d),f)}"; >> > > > > > > String string_schema = >> > > > > > > >> > > "b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}"; >> > > > > > > Schema schema = >> Utils.getSchemaFromString(string_schema); >> > > > > > > >> > > > > > > Next step is to resolve a DataBag Object from String >> > string_databag >> > > > and >> > > > > > the >> > > > > > > Schema Object. >> > > > > > > >> > > > > > > -Dan >> > > > > > > >> > > > > > > On Tue, Mar 19, 2013 at 9:37 AM, Dan DeCapria, CivicScience < >> > > > > > > dan.decap...@civicscience.com> wrote: >> > > > > > > >> > > > > > > > Thank you for your reply. >> > > > > > > > >> > > > > > > > The problem is I cannot find a methodology to go from a >> String >> > > > > > > > representation of a complex data type to a nested Object of >> pig >> > > > > > > DataTypes. >> > > > > > > > I looked over the pig 0.10.1 docs, but cannot find a way to >> go >> > > from >> > > > > > > String >> > > > > > > > and Schema to pig DataType Object. >> > > > > > > > >> > > > > > > > For context, I am generating these Strings for my own JUnit >> > > testing >> > > > > of >> > > > > > > > other UDFs. Currently, for complex types, I have to >> generate >> > > each >> > > > > > > nesting >> > > > > > > > from Tuple and DataBag factories, append data, and next them >> > > > > manually. >> > > > > > > For >> > > > > > > > larger unit tests, this process becomes unwieldy (hundreds >> of >> > > lines >> > > > > per >> > > > > > > > method, non-dynamic), and it would be much simpler to go >> > directly >> > > > > from >> > > > > > a >> > > > > > > > String and a Schema to a DataBag Object for UDF testing (few >> > > lines >> > > > of >> > > > > > > code, >> > > > > > > > easily modifiable). >> > > > > > > > >> > > > > > > > -Dan >> > > > > > > > >> > > > > > > > >> > > > > > > > On Mon, Mar 18, 2013 at 6:31 PM, Jonathan Coveney < >> > > > > jcove...@gmail.com >> > > > > > > >wrote: >> > > > > > > > >> > > > > > > >> Why not just use PigStorage? This is essentially what it >> does. >> > > It >> > > > > > saves >> > > > > > > a >> > > > > > > >> bag as text, and then loads it again. >> > > > > > > >> >> > > > > > > >> I suppose the question becomes: why do you need to do this? >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> 2013/3/18 Dan DeCapria, CivicScience < >> > > > dan.decap...@civicscience.com >> > > > > > >> > > > > > > >> >> > > > > > > >> > In Java, I am trying to convert a DataBag from it's >> String >> > > > > > > >> representation >> > > > > > > >> > with its schema String to a valid DataBag Object: >> > > > > > > >> > >> > > > > > > >> > String databag_string = "{(apples,1024)}"; >> > > > > > > >> > String schema_string = >> > "b1:bag{t1:tuple(a:chararray,b:long)}"; >> > > > > > > >> > >> > > > > > > >> > I've tried implementing something along the lines of >> this, >> > > but I >> > > > > > > believe >> > > > > > > >> > it's in the wrong direction, and then I get stuck: >> > > > > > > >> > >> > > > > > > >> > String[] aliases = {"b1", "t1", "a", "b"}; >> > > > > > > >> > byte[] types = {DataType.BAG, DataType.TUPLE, >> > > > > > > >> DataType.CHARARRAY, >> > > > > > > >> > DataType.LONG}; >> > > > > > > >> > List<Schema.FieldSchema> fsList = new >> > > > > > > >> > ArrayList<Schema.FieldSchema>(); >> > > > > > > >> > for (int i = 0; i < aliases.length; i++) { >> > > > > > > >> > fsList.add(new Schema.FieldSchema(aliases[i], >> > > > > > types[i])) ; >> > > > > > > >> > } >> > > > > > > >> > Schema origSchema = new Schema(fsList); >> > > > > > > >> > ResourceSchema rsSchema = new >> > > > ResourceSchema(origSchema); >> > > > > > > >> > Schema genSchema = Schema.getPigSchema(rsSchema); >> > > > > > > >> > ResourceSchema.ResourceFieldSchema[] rfschema = >> > > > > > > >> > rsSchema.getFields(); >> > > > > > > >> > ... lost here, maybe Utf8StorageConverter c = new >> > > > > > > >> > Utf8StorageConverter(); ??? >> > > > > > > >> > >> > > > > > > >> > >> > > > > > > >> > An ideal process would be along the lines of: >> > > > > > > >> > >> > > > > > > >> > DataBag d = BagFactory.getInstance().newDefaultBag(); >> > > > > > > >> > d.something(databag_string, schema_string); // ??? no >> > idea >> > > > what >> > > > > > > this >> > > > > > > >> > process could be >> > > > > > > >> > d.toString().equals(databag_string) == true. >> > > > > > > >> > >> > > > > > > >> > Thanks, -Dan >> > > > > > > >> > >> > > > > > > >> >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > -- >> > > > > > > > Dan DeCapria >> > > > > > > > CivicScience, Inc. >> > > > > > > > Senior Informatics / DM / ML / BI Specialist >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > Dan DeCapria >> > > > > > > CivicScience, Inc. >> > > > > > > Senior Informatics / DM / ML / BI Specialist >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > Dan DeCapria >> > > > > CivicScience, Inc. >> > > > > Senior Informatics / DM / ML / BI Specialist >> > > > > >> > > > >> > > >> > > >> > > >> > > -- >> > > Dan DeCapria >> > > CivicScience, Inc. >> > > Senior Informatics / DM / ML / BI Specialist >> > > >> > >> >> >> >> -- >> Dan DeCapria >> CivicScience, Inc. >> Senior Informatics / DM / ML / BI Specialist >> > >