James,
I may not have understood your question fully,but did you try renaming the
file w/o the '.gz' ? In hadoop/pig, if you have bz2 extn, files get
compressed.Hence,..
On Mon, Jul 30, 2012 at 11:51 PM, James Kebinger wrote:
> Hello, I'm running a pretty simple pig job but despite my best effor
; What you want is an early EOF from your InputFormat. So I guess the answer
>> is to have a custom inputFormat that monitors some object that your UDF
>> can
>> modify, and the input format can report EOF if the condition is satisfied
>> (or it's actually out of data).
&
guess the answer
> is to have a custom inputFormat that monitors some object that your UDF can
> modify, and the input format can report EOF if the condition is satisfied
> (or it's actually out of data).
>
> D
>
>
> On Sun, May 1, 2011 at 11:11 PM, souri datta >
ocess all the input files.
Thanks,
Souri
On Mon, May 2, 2011 at 1:46 AM, Dmitriy Ryaboy wrote:
> Right. I assume there is a reason you don't want to or are unable to have
> your udf check your condition and call return?
>
> -Original Message-
> From: "souri dat
Meaning it should be able to finish quickly(return from the method).
On Fri, Apr 29, 2011 at 9:52 PM, Dmitriy Ryaboy wrote:
> What do you mean by return?
>
>
> On Fri, Apr 29, 2011 at 5:01 AM, souri datta >wrote:
>
> > Hi,
> > I have a pig udf.My requirement is
Hi,
I have a pig udf.My requirement is , on meeting certain criteria, I want to
return from Pig udf.Is there any way I can early exit from Pig udf?
Also, how can it be done in a Map/Reduce job?
Thanks,
Souri
Hi,
I am trying to pass a list of names to my pig script through cmd line as
follows:
-param names = `cat /tmp/names.txt'
But when I try to start the pig script I am getting error: *Argument list
too long*
(something seen in shell when running command with '*' like ls */ rm * )
Is there a way
SOME OPERATION
> $comment result = LIMIT result $x
>
>
> On Sun, Mar 27, 2011 at 12:36 PM, souri datta wrote:
>
>> Hi all,
>>
>> I have a problem where I need to limit the number of results generated by
>> pig script based on some condition.
>>
>> s
Hi All,
Could not find any proper documentation for it.Can someone please let me
know what is the maximum integer supported in Pig scripts? (something like
Integer.MAX_VALUE)
Will it be version dependent?
I found this link
http://db.apache.org/derby/docs/10.1/ref/rrefsqlj30435.html
and used this
Hi all,
I have a problem where I need to limit the number of results generated by
pig script based on some condition.
say,
if ( $x == 0 )
then do not limit #results
else:
limited_result = LIMIT results $x ;
(here x comes from cmd line)
How can I achieve this with a single Pig script ?
T
on which it is in). Pig itself also constructs your UDF during
> planning on the machine you launch your job on.
>
> Alan.
>
>
> On Mar 17, 2011, at 11:12 AM, souri datta wrote:
>
> Hi,
>>If in a UDF , say in the constructor of the class, i initialize a list
>
Hi,
If in a UDF , say in the constructor of the class, i initialize a list
(say ArrayList namesList) of objects(say names). And in the exec()
method , I do some processing. When I am using this udf in a 20 node hadoop
cluster, will this list 'nameList' be instantiated multiple times or will
Hi,
I have a big dataset which contains mainly urls and their html
contents. Now given a regular expression I want to get 'x' number of
urls matching the regex pattern. I have written a UDF to filter out
urls based on regular expression. Is there a way in Pig script to
limit the number of results
13 matches
Mail list logo