Re: String Representation of DataBag and its Schema

2013-03-21 Thread William Oberman
We managed to piece this together. It's not fully generic (we assume a single field). But, it gets the job done for unit testing. -- package com.civicscience.util; import org.apache.pig.ResourceSchema; import org.apache.pig.builtin.Utf8StorageConverter; import org.apache.pig.impl.uti

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
I'll give it an honest try, and any additional from the community is greatly appreciated! I've been on this idea for a few days now. I even implemented my own UDF parser by converting the input to a char[] array and a push/popping on a Stack of Node Objects to generate the nested inner complex Da

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Jonathan Coveney
Ack, hit enter. I'd look at the LoadFunc interface, the PigSTorage class, and if you can't make it work without playing a little, let me know. 2013/3/19 Jonathan Coveney > doing "new PigStorage()" is possible, but tricky. Maybe some of the other > contributors have an easier way of doing this,

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Jonathan Coveney
doing "new PigStorage()" is possible, but tricky. Maybe some of the other contributors have an easier way of doing this, but in the short term, I'd work on getting that to work. It's mainly just making sure you initialize it properly. 2013/3/19 Dan DeCapria, CivicScience > This would work, but

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
This would work, but the goal would be to *not* invoke local interactive pig to execute a LOAD USING PigStorage() and pass the data into the UDF. I was hoping to keep this completely in the Java and JUnit testing universe. Looking over the PigStorage() doc

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Jonathan Coveney
I definitely understand the benefits, I just wanted to understand your workflow so could weigh in with what I would do. In your case, if you're going to be making these by hand, then I would mimic what PigStorage outputs, and then just load it in using PigStorage. 2013/3/19 Dan DeCapria, CivicSc

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
By hand; creating a new JUnit method to test a specific use case against a functional requirement in the UDF. The UDFs I am testing are part of a larger ETL testing initiative I have been undertaking. To ensure that the various states of legacy data are correctly extracted and transformed into a

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Jonathan Coveney
How are you planning on generating these cases? By hand? Or automated? 2013/3/19 Dan DeCapria, CivicScience > String string_databag in this example was typed out by me, as the input > String for a JUnit test method. I am considering generating many of these > for case specific unit testing of m

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
Such that this string_input matches the Schema: String string_databag = "{(apples,(banana,1024),2048)}"; String string_schema = "b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}"; Schema schema = Utils.getSchemaFromString(string_schema); LogicalSche

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
String string_databag in this example was typed out by me, as the input String for a JUnit test method. I am considering generating many of these for case specific unit testing of my UDFs. -Dan On Tue, Mar 19, 2013 at 11:27 AM, Jonathan Coveney wrote: > how was string_databag generated? > > > 20

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Jonathan Coveney
how was string_databag generated? 2013/3/19 Dan DeCapria, CivicScience > Expanding upon this, the following use case's Schema Object can be resolved > from inputs: > > String string_databag = "{(a,(b,d),f)}"; > String string_schema = > "b1:bag{t1:tuple(a:chararray,t2:tuple(b:cha

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
Expanding upon this, the following use case's Schema Object can be resolved from inputs: String string_databag = "{(a,(b,d),f)}"; String string_schema = "b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}"; Schema schema = Utils.getSchemaFromString(string_sch

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
Thank you for your reply. The problem is I cannot find a methodology to go from a String representation of a complex data type to a nested Object of pig DataTypes. I looked over the pig 0.10.1 docs, but cannot find a way to go from String and Schema to pig DataType Object. For context, I am gener

Re: String Representation of DataBag and its Schema

2013-03-18 Thread Jonathan Coveney
Why not just use PigStorage? This is essentially what it does. It saves a bag as text, and then loads it again. I suppose the question becomes: why do you need to do this? 2013/3/18 Dan DeCapria, CivicScience > In Java, I am trying to convert a DataBag from it's String representation > with it

String Representation of DataBag and its Schema

2013-03-18 Thread Dan DeCapria, CivicScience
In Java, I am trying to convert a DataBag from it's String representation with its schema String to a valid DataBag Object: String databag_string = "{(apples,1024)}"; String schema_string = "b1:bag{t1:tuple(a:chararray,b:long)}"; I've tried implementing something along the lines of this, but I be