Re: Unable to disable compression of output

2012-07-30 Thread souri datta
James, I may not have understood your question fully,but did you try renaming the file w/o the '.gz' ? In hadoop/pig, if you have bz2 extn, files get compressed.Hence,.. On Mon, Jul 30, 2012 at 11:51 PM, James Kebinger wrote: > Hello, I'm running a pretty simple pig job but despite my best effor

Re: Early exit from Pig udf

2011-05-26 Thread souri datta
; What you want is an early EOF from your InputFormat. So I guess the answer >> is to have a custom inputFormat that monitors some object that your UDF >> can >> modify, and the input format can report EOF if the condition is satisfied >> (or it's actually out of data). &

Re: Early exit from Pig udf

2011-05-26 Thread souri datta
guess the answer > is to have a custom inputFormat that monitors some object that your UDF can > modify, and the input format can report EOF if the condition is satisfied > (or it's actually out of data). > > D > > > On Sun, May 1, 2011 at 11:11 PM, souri datta >

Re: Early exit from Pig udf

2011-05-01 Thread souri datta
ocess all the input files. Thanks, Souri On Mon, May 2, 2011 at 1:46 AM, Dmitriy Ryaboy wrote: > Right. I assume there is a reason you don't want to or are unable to have > your udf check your condition and call return? > > -Original Message- > From: "souri dat

Re: Early exit from Pig udf

2011-05-01 Thread souri datta
Meaning it should be able to finish quickly(return from the method). On Fri, Apr 29, 2011 at 9:52 PM, Dmitriy Ryaboy wrote: > What do you mean by return? > > > On Fri, Apr 29, 2011 at 5:01 AM, souri datta >wrote: > > > Hi, > > I have a pig udf.My requirement is

Early exit from Pig udf

2011-04-29 Thread souri datta
Hi, I have a pig udf.My requirement is , on meeting certain criteria, I want to return from Pig udf.Is there any way I can early exit from Pig udf? Also, how can it be done in a Map/Reduce job? Thanks, Souri

passing a long argument

2011-04-03 Thread souri datta
Hi, I am trying to pass a list of names to my pig script through cmd line as follows: -param names = `cat /tmp/names.txt' But when I try to start the pig script I am getting error: *Argument list too long* (something seen in shell when running command with '*' like ls */ rm * ) Is there a way

Re: implementing "if" logic

2011-03-28 Thread souri datta
SOME OPERATION > $comment result = LIMIT result $x > > > On Sun, Mar 27, 2011 at 12:36 PM, souri datta wrote: > >> Hi all, >> >> I have a problem where I need to limit the number of results generated by >> pig script based on some condition. >> >> s

max integer allowed?

2011-03-28 Thread souri datta
Hi All, Could not find any proper documentation for it.Can someone please let me know what is the maximum integer supported in Pig scripts? (something like Integer.MAX_VALUE) Will it be version dependent? I found this link http://db.apache.org/derby/docs/10.1/ref/rrefsqlj30435.html and used this

implementing "if" logic

2011-03-27 Thread souri datta
Hi all, I have a problem where I need to limit the number of results generated by pig script based on some condition. say, if ( $x == 0 ) then do not limit #results else: limited_result = LIMIT results $x ; (here x comes from cmd line) How can I achieve this with a single Pig script ? T

Re: question about Pig UDF

2011-03-17 Thread souri datta
on which it is in). Pig itself also constructs your UDF during > planning on the machine you launch your job on. > > Alan. > > > On Mar 17, 2011, at 11:12 AM, souri datta wrote: > > Hi, >>If in a UDF , say in the constructor of the class, i initialize a list >

question about Pig UDF

2011-03-17 Thread souri datta
Hi, If in a UDF , say in the constructor of the class, i initialize a list (say ArrayList namesList) of objects(say names). And in the exec() method , I do some processing. When I am using this udf in a 20 node hadoop cluster, will this list 'nameList' be instantiated multiple times or will

Limting output

2011-03-09 Thread souri datta
Hi, I have a big dataset which contains mainly urls and their html contents. Now given a regular expression I want to get 'x' number of urls matching the regex pattern. I have written a UDF to filter out urls based on regular expression. Is there a way in Pig script to limit the number of results