RE: how to get input schema in UDF

2012-08-13 Thread Danfeng Li
Ok, I found the solution Replace Schema tupleSchema = new Schema(input.getFields()); With Schema tupleSchema = new Schema(input.getField(0).schema.getField(0).schema.getFields()); Will to the trick. Thanks. Dan -Original Message- From: Danfeng Li [mailto:d...@operasolutions.com] Sen

Re: Pig HBaseStorage error

2012-08-13 Thread Bill Graham
HBase CF names are case sensitive, so you're query might be off since you're using lowercase. If that the problem still persists with the same case, would it be possible to see if you can reproduce against a Pig build from the trunk? On Mon, Aug 13, 2012 at 8:28 AM, Mohit Anchlia wrote: > > > O

RE: how to get input schema in UDF

2012-08-13 Thread Danfeng Li
Thanks, Robert. However, I'm still not clear on how to get the original fields for the tuple inside the bag. Following is the code to generate the schema. public Schema outputSchema(Schema input) { try{ Schema.FieldSchema counter = new Schema.FieldSchema("counter", DataType.INTEGER);

Re: Operator and Function Reference

2012-08-13 Thread Dmitriy Ryaboy
That would be quite handy I think. D On Thu, Aug 9, 2012 at 12:24 PM, Xavier Stevens wrote: > Does anyone else think it would make sense to have all operators and > functions listed on a single page somewhere as a reference? Right now they > are split up over the "Pig Latin Basics" and "Built In

Re: Distributed accumulator functions

2012-08-13 Thread Dmitriy Ryaboy
For CSV excel, check out http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/piggybank/storage/CSVExcelStorage.html D >> Also, is PigStorage compatible with the quoting expected by excel >> tab-delimited files? AIUI that would require quoting the values with >> "value\tvalue" and escaping doub

Re: Using Distributed Cache in PIG

2012-08-13 Thread Dmitriy Ryaboy
You are talking about changing the way hadoop works; something like this would be transparent to Pig. Note that Hadoop Distributed Cache != "distributed memory cache". I suppose you could replace the value of fs.file.impl from org.apache.hadoop.fs.LocalFileSystem to something else.. might be qui

Re: Pig 0.10.0 slow startup

2012-08-13 Thread Dmitriy Ryaboy
Julien removed a dozen or so loader/storer instantiations. That can do it if you do work in constructors. D On Fri, Aug 10, 2012 at 1:15 PM, Prashant Kommireddi wrote: > Thanks Chun. > > Jon, any idea what on 0.11 might have fixed it? > > On Thu, Aug 9, 2012 at 3:32 PM, Chun Yang > wrote: > >> I

Re: Can anyone give me a hint about this column behavior?

2012-08-13 Thread Bill Graham
This seems like a bug in PigStorage. Would you mind opening a JIRA with the steps to reproduce that you've include here? thanks, Bill On Mon, Aug 13, 2012 at 3:44 PM, jeremiah rounds wrote: > Greetings pig users, > > This is regarding my previous post (in quotes below) > > > I was able to remove

Re: how to get input schema in UDF

2012-08-13 Thread Robert Yerex
Chapter 10 in Alan Gates' excellent book "Programmin Pig" discusses this issue. Robert Yerex Data Scientist Civitas Leaning On Mon, Aug 13, 2012 at 3:43 PM, Danfeng Li wrote: > I have a big, e.g. A: {(name: chararray,age: int)}, I wrote a udf which > adds 1 more field in the tuple inside the ba

Re: Can anyone give me a hint about this column behavior?

2012-08-13 Thread jeremiah rounds
Greetings pig users, This is regarding my previous post (in quotes below) I was able to remove this column error by using the start up: pig -x local -M -t ColumnMapKeyPrune I have no more insight than that I only tried it because someone else reported their column oriented error went away wit

how to get input schema in UDF

2012-08-13 Thread Danfeng Li
I have a big, e.g. A: {(name: chararray,age: int)}, I wrote a udf which adds 1 more field in the tuple inside the bag. E.g. B: {(name: chararray,age: int, rank:int)}. Because the number of fields in the original bag is not fixed, e.g I can have one more field such as gender:int. In my udf, in o

Can anyone give me a hint about this column behavior?

2012-08-13 Thread jeremiah rounds
Greetings, I am new to pig. I am trying to get to know it on a laptop with hadoop 20.2 installed in local mode. I have prior experience with hadoop, but I figure my error is so weird I blew the pig install or something. Here is what I have my problem distilled down too: $ pig -x local -M gru

Using Distributed Cache in PIG

2012-08-13 Thread kapil bhosale
Hello Can we use Distributed Cache to store intermediate results after the Map Phase so that these can be used in Reduce phase from cache. So as to improve performance of Map-Reduce Job. I found a Paper regarding usage of Cache in Map-Reduce, http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5

Re: Distributed accumulator functions

2012-08-13 Thread Alan Gates
On Aug 13, 2012, at 9:05 AM, Benjamin Smedberg wrote: > I'm a new-ish pig user querying data on an hbase cluster. I have a question > about accumulator-style functions. > > When writing an accumulator-style UDF, is all of the data shipped to a single > machine before it is reduced/accumulated?

Distributed accumulator functions

2012-08-13 Thread Benjamin Smedberg
I'm a new-ish pig user querying data on an hbase cluster. I have a question about accumulator-style functions. When writing an accumulator-style UDF, is all of the data shipped to a single machine before it is reduced/accumulated? For example, if I were doing to write re-implement SUM as a UDF

Re: Pig HBaseStorage error

2012-08-13 Thread Mohit Anchlia
On Sun, Aug 12, 2012 at 11:26 PM, Bill Graham wrote: > This seems to be the problem: > > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hbase.zookeeper.ZKConfig.parseZooCfg(ZKConfig.java:167) > > Which seems like the Conf is null, which is really odd. > > http://svn.ap