Take a look at:
REGEX_EXTRACT -
http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#REGEX_EXTRACT
and REGEX_EXTRACT_ALL:
http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#REGEX_EXTRACT_ALL
You could also use SUBSTRING, but I think a regex would be more applicable
here for date/time extracti
I ended up fixing this issue - i did change it to a bag after but the main
problem was that regexextractall was returning everything as a string (bia
group) which meant that max, avg etc... was not matched as a matching function
for a bag of tuple doubles.
I ended up writing a new udf for extr
Hi all,
I'm getting the exception (at the end) from the following using Pig:
eLine = FOREACH logLine
GENERATE
FLATTEN(
REGEX_EXTRACT_ALL(
$0,
'.*Output.Count\\s*\\-\\s*([A-Za-z\\.]+)\\s*(\\d+)'
)
) AS (ename:CHARARRAY
I'm having trouble trying to flatten a bag to a tuple of int's in Pig,
e.g.
{(12),(4),(7),(190)}
to:
(12,4,7,190)
It seems like it should be trivial to do, but not quite sure how to do it.
Can this by done with inbuilt Pig
commands or do i need a custom UDF or an exec?
Many thanks,
Jon.
tent/xdocs/cont.xml?view=**markup<http://svn.apache.org/viewvc/pig/branches/branch-0.9/src/docs/src/documentation/content/xdocs/cont.xml?view=markup>
>
> Alan.
>
>
> On Jun 20, 2011, at 8:03 AM, Jonathan Holloway wrote:
>
> Hi all,
>>
>> Does anybody have a list
Hi all,
Does anybody have a list of the features for the Pig 0.9 release. I noticed
from SVN that there control flow
structures have been added. How would these work with 0.9?
Many thanks,
Jon.
but not the uncompressed version.
>
> On Jun 15, 2011, at 6:57 PM, Jonathan Holloway <
> jonathan.hollo...@gmail.com> wrote:
>
> > Hi all,
> >
> > I was wondering whether somebody could explain how Pig deals with nested
> > directories of log files,
> > Somet
Hi all,
I was wondering whether somebody could explain how Pig deals with nested
directories of log files,
Something like:
/logs/2011-01-01/a.log
/logs/2011-01-01/b.log
/logs/2011-01-01/c.log
I'm pretty sure if I give a Pig script the /logs directory as input it will
successfully process all log
Hi,
This is a followon from another question I asked a while ago. I'm
calculating percentiles etc.. for datasets
and I wondered how I would do this with a histogram instead so it's more
efficient.
Does anybody have an example of this currently in the Pig source code or
some advice on how to go a
Hi all,
I have the following:
A {(3),(Log Message A)}
A {(5),(Log Message B)}
B{(8),(Log Message C)}
B {(1),(Log Message D)}
C {(2),(Log message E)}
C {(7),(Log message F)}
and I want to merge the related line letters (A, B, C) into the same bag:
A{(3),(Log M
Hi all,
I'm trying to do something with Pig and I'm not quite sure whether it's
possible
or not. Hoping somebody could provide with some help on how to proceed
here.
I have a log file with a number of log lines that have relationships with
each other.
The structure of the log line is:
DATE, UUI
Hi all,
I'm working with some data at the moment, for which I needed to generate
multiple reports for a given grouped set of data by name.
I wasn't initially sure about how to do this, I came across MultiStorage in
Pig contrib, but a little worried about the 7k limit there at
the moment due to a b
I've got a general question surrounding the output of various Pig scripts
and generally where people are
storing that data and in what kind of format?
I read Dmitriy's article on Apache log processing and noticed that the
output of the scripts was a format more
suitable for reporting and graphing
u'll use to filter?
>
> It sounds like you'll want to write your own FilterFunc
>
> 2011/3/18 Jonathan Holloway
>
>> Hi,
>>
>> I want to iterate through the fields in a tuple and then pass each field to
>> a FILTER statement.
>> Does anybody know how I would go about doing this?
>>
>> Many thanks,
>> Jon.
>>
Hi,
I want to iterate through the fields in a tuple and then pass each field to
a FILTER statement.
Does anybody know how I would go about doing this?
Many thanks,
Jon.
Hi,
Given the following:
Group 1 - Tests Totals:
(A, 4)
(B, 30)
(C, 40)
(D, 30)
Group 2 - Tests Passed:
(A,1)
(B,30)
How would I calculate the percentage of Group 2 / Group 1 using Pig? I'm
assuming one way is to join on the the two datasets and calculate the
percentage that way. Another way
I'd be
interested in hearing about it.
Cheers,
Jon.
On 10 March 2011 21:01, Jonathan Holloway wrote:
> Hey Josh,
>
> That's the path I started down today, I don't suppose the UDF you wrote is
> in the public domain
> at all - would you consider contributing it to pig
I ran into an issue tonight with parsing log lines whereby I had to generate
a schema in a user defined function.
Part of that involved converting various values into their associated data
types, but I couldn't see a way to do
it via Pig. Enclosed is a patch to convert org.apache.pig.data.DataType
pull out all the fields as double[] and pass
> into
> Percentile.
>
>
> http://commons.apache.org/math/apidocs/org/apache/commons/math/stat/descriptive/rank/Percentile.html
>
> Josh
>
>
> On 10 March 2011 19:38, Kris Coward wrote:
>
> > On Thu, Mar 10, 2011 at 0
HI all,
Does anybody have a UDF for calculating the percentile (median, 99%) at all?
I took a look at the builtins
and the piggybank project, but couldn't seem to see anything. Is there a
reason why there isn't a builtin for
this?
Many thanks,
Jon.
Hi all,
Does anybody know if a Percentile UDF exists at all, I've searched through
the manual and the piggybank project, but can't
seem to see one there.
Many thanks,
Jon.
21 matches
Mail list logo