The LoadFunc refactoring was painful. I think what you are describing
absolutely needs to happen, but may need to be a 2.0 thing.
On Wed, Mar 9, 2011 at 5:06 PM, Julien Le Dem wrote:
> Before moving to 1.0, I think the public APIs should be refactored a bit.
> (UDFs, ...: all the classes users e
Before moving to 1.0, I think the public APIs should be refactored a bit.
(UDFs, ...: all the classes users extend or use)
Some of the Pig APIs have grown organically and would need changes.
examples:
- inconsistencies between EvalFunc and Accumulator
- Algebraic UDFs can not pass FuncSpec param
Did you try checking the task logs ?
There might be more details there ...
Regards,
Mridul
On Wednesday 09 March 2011 04:23 AM, Kris Coward wrote:
So I queued up a batch of jobs last night to run overnight (and into the
day a bit, owing to to a bottleneck on the scheduler the way that things
In which case, cant you not model that as a Bag ?
I imagine something like Tuple with fields person:chararray,
books_read:bag{ (name:chararray, isbn:chararray) }, etc ?
Ofcourse, it will work as a bag if the tuple contained within it has a
fixed schema :-) (unless you repeat this process N nu
Question, do normal map-reduce jobs run on this cluster? Like the example jar
jobs?
Guy
On Mar 9, 2011, at 2:29 PM, Kris Coward wrote:
>
> Also, reading some uncompressed data off the same cluster using
> PigStorage shows a failure to even read the data in the first place :|
>
> -K
>
> On
Does anyone know of any resources other than
http://wiki.apache.org/hadoop/Support for finding consultants who'd be
able to help with pig/hadoop administration issues. With my sysadmin on
vacation, I'm just looking for someone who can get things running again
without completely displacing him, and
Also, reading some uncompressed data off the same cluster using
PigStorage shows a failure to even read the data in the first place :|
-K
On Tue, Mar 08, 2011 at 09:24:18PM -0500, Kris Coward wrote:
>
> None of the nodes have more than 20% utilization on any of their disks;
> so it must be the
Are you looking for:
udf_regex_results = my_UDF(...);
limited_regex_results = LIMIT udf_regex_results 10; -- 10 is configurable
-e
On Wed, Mar 9, 2011 at 13:58, souri datta wrote:
> Hi,
> I have a big dataset which contains mainly urls and their html
> contents. Now given a regular expression
Hi,
I have a big dataset which contains mainly urls and their html
contents. Now given a regular expression I want to get 'x' number of
urls matching the regex pattern. I have written a UDF to filter out
urls based on regular expression. Is there a way in Pig script to
limit the number of results
Begin forwarded message:
From: Mark Kerzner
Date: March 7, 2011 7:37:38 PM PST
To: Hadoop Discussion Group
Subject: First Hadoop meetup in Houston
Reply-To: "common-u...@hadoop.apache.org" >
Hi,
I have just created the Houston Hadoop Meetup group, and all
suggestions are
welcome.
http
sorry to hear that. We used it in a old project. It works well with pig0.6.0.
Shawn
On Tue, Mar 8, 2011 at 3:04 PM, Dexin Wang wrote:
> Unfortunately, it doesn't work.
> Seems the same problem as in https://issues.apache.org/jira/browse/PIG-1547
>
> On Tue, Mar 8, 2011 at 1:22 PM, Dexin Wang wr
It's the latter..
You can imagine my EvalFunc as
ArrayList booksRead(Person p) {}
So for a list of people I get a List of ArrayList of different lengths..
-Original Message-
From: Jonathan Coveney [mailto:jcove...@gmail.com]
Sent: Wednesday, March 09, 2011 6:12 PM
To: user@pig.apache.or
In any given instance will the size of the tuple change, or will it change
on a row by row basis? If it's the former, you can have a constructor that
indicates how many arguments, and the outputSchema can use that.
Barring that, it is "good practice" to do so, but it's not necessary. Your
script w
Hello,
I read that it is good practice to declare the schema in Pig Script as well as
in the UDF (by implementing outputSchema), because of performance reasons.
Now in my case I have a EvalFunc that takes a chararray as input and produces a
tuple with a dynamic number of chararrays (it creates
14 matches
Mail list logo