Re: Failure to run pig jobs using HbaseStorage in Oozie

2013-03-19 Thread Rohini Palaniswamy
The oozie pig launcher log cannot have emtpy stdout. Can you rerun your oozie workflow and check what is the stack trace in the pig launcher stdout/stderr log? Regards, Rohini On Mon, Mar 18, 2013 at 9:16 PM, Praveen Bysani wrote: > Hi, > > When i checked in the Job Tracker UI, the job is in re

Re: SUM of project-range of fields?

2013-03-19 Thread Nathan Neff
It works This confirms that Pig is better than Java MapReduce :-) Thanks everyone for their help. Input: Toy Story|0|0|0|0|1|1|0|0|0 GoldenEye|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0 SomeNewMovie|0|0|0|0|1|1|0|0|0|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1 S

Re: Extract Date part from ISO datetime format

2013-03-19 Thread Mason
There might be an easier way, but I bet you could whip up a solution with the helpers in Piggybank: http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/datetime/ . On Mon, Mar 18, 2013 at 1:24 PM, Mix Nin wrote: > How to get only Date p

Re: SUM of project-range of fields?

2013-03-19 Thread Abhinav Neelam
Russell's code works with a little modification. (The cast to int doesn't work though.) movie_and_genres = FOREACH movies GENERATE $0 as movie_name, (bag{tuple()})TOBAG($2 ..) AS genres: bag{genre_bit: tuple()}; foo = foreach movies_and_genres generate movie_name, (int)SUM(genres) as genre_total;

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
I'll give it an honest try, and any additional from the community is greatly appreciated! I've been on this idea for a few days now. I even implemented my own UDF parser by converting the input to a char[] array and a push/popping on a Stack of Node Objects to generate the nested inner complex Da

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Jonathan Coveney
Ack, hit enter. I'd look at the LoadFunc interface, the PigSTorage class, and if you can't make it work without playing a little, let me know. 2013/3/19 Jonathan Coveney > doing "new PigStorage()" is possible, but tricky. Maybe some of the other > contributors have an easier way of doing this,

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Jonathan Coveney
doing "new PigStorage()" is possible, but tricky. Maybe some of the other contributors have an easier way of doing this, but in the short term, I'd work on getting that to work. It's mainly just making sure you initialize it properly. 2013/3/19 Dan DeCapria, CivicScience > This would work, but

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
This would work, but the goal would be to *not* invoke local interactive pig to execute a LOAD USING PigStorage() and pass the data into the UDF. I was hoping to keep this completely in the Java and JUnit testing universe. Looking over the PigStorage() doc

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Jonathan Coveney
I definitely understand the benefits, I just wanted to understand your workflow so could weigh in with what I would do. In your case, if you're going to be making these by hand, then I would mimic what PigStorage outputs, and then just load it in using PigStorage. 2013/3/19 Dan DeCapria, CivicSc

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
By hand; creating a new JUnit method to test a specific use case against a functional requirement in the UDF. The UDFs I am testing are part of a larger ETL testing initiative I have been undertaking. To ensure that the various states of legacy data are correctly extracted and transformed into a

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Jonathan Coveney
How are you planning on generating these cases? By hand? Or automated? 2013/3/19 Dan DeCapria, CivicScience > String string_databag in this example was typed out by me, as the input > String for a JUnit test method. I am considering generating many of these > for case specific unit testing of m

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
Such that this string_input matches the Schema: String string_databag = "{(apples,(banana,1024),2048)}"; String string_schema = "b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}"; Schema schema = Utils.getSchemaFromString(string_schema); LogicalSche

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
String string_databag in this example was typed out by me, as the input String for a JUnit test method. I am considering generating many of these for case specific unit testing of my UDFs. -Dan On Tue, Mar 19, 2013 at 11:27 AM, Jonathan Coveney wrote: > how was string_databag generated? > > > 20

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Jonathan Coveney
how was string_databag generated? 2013/3/19 Dan DeCapria, CivicScience > Expanding upon this, the following use case's Schema Object can be resolved > from inputs: > > String string_databag = "{(a,(b,d),f)}"; > String string_schema = > "b1:bag{t1:tuple(a:chararray,t2:tuple(b:cha

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
Expanding upon this, the following use case's Schema Object can be resolved from inputs: String string_databag = "{(a,(b,d),f)}"; String string_schema = "b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}"; Schema schema = Utils.getSchemaFromString(string_sch

Re: String Representation of DataBag and its Schema

2013-03-19 Thread Dan DeCapria, CivicScience
Thank you for your reply. The problem is I cannot find a methodology to go from a String representation of a complex data type to a nested Object of pig DataTypes. I looked over the pig 0.10.1 docs, but cannot find a way to go from String and Schema to pig DataType Object. For context, I am gener

Re: nested order limit by percentage of overall records

2013-03-19 Thread Marco Cadetg
Actually what I was looking for isn't for distributed quantiles. I was looking for the share top x% do have. E.g. in my example it could be that the top 10% of the users do have 50% of the total money. So it looks like I'll need to come up with a UDF which delivers this. Cheers, -Marco On 19 Mar