"JamesBTobin (at) Gmail (dot) Com". Kind regards, James
I am not sure about Pig, but its easily achievable in MapReduce. We had a
similar requirement, we had to convert logs from RFC syslog format (5424)
into JSON. We have a MR job which does this for us. The reason why we chose
MR was mainly for Error Handling - like missing fields in some records,
assignment, as it would be
confused. I might wonder which 'data' does 'out' relies.
James
Dear all,
I try the following code:
```
data = load 'xxx' as (col: String);
data = filter data by col != '';
out = foreach data generate col;
```
I don't know why pig allow recursive value assignment, as it would be confused.
I might wonder which 'data' does 'out' relies.
James
Hello,
I am trying to use merge join to speed up a join operator that finding friends
of a list of person from a social network. But in my trail I get incorrect
output.
Could you give me some advise about what I have do wrong?
Here is my code:
```
/* Find frinds of sample, merge join
Hello,
I hope to write a pig script executor using python as nohup export too many
useless infomation. My target is to write a python script to execute specific
pig script and log the number of records store in HDFS.
Thus I want to know what can I obtain from the result object return by the
How about using the explicit cast like
ordered = ORDER query BY (int)z;
Alcaid
-- Original --
From: Patcharee Thongtra;patcharee.thong...@uni.no;
Date: Fri, May 30, 2014 06:02 PM
To: useruser@pig.apache.org;
Subject: java.lang.String cannot be cast to
Hello,
There is a similar UDF in DataFu named Enumerate.
http://datafu.incubator.apache.org/docs/datafu/1.2.0/datafu/pig/bags/Enumerate.html
I wish it may help.
James
I have defined a pig UDF want to track problems using warnings like this:
warn(My warning, PigWarning.UDF_WARNING_1);
I'm testing this in local mode first, but I never see this warning anywhere.
Any help/ideas would be appreciated.
Thanks,
James
a relation that contains two bags so it can be supplied to
the DIFF function?
Any suggestions would be appreciated.
Thanks,
James
on in
there, and why it is erroring out. Then I would follow that backwards
into
whatever in Pig was generating that issue. That's pretty vague but
can't
really say much else unless I knew a ton about CassandraStorage.
2012/12/10 James Schappet jschap...@gmail.com
Any thoughts on how I can start
Any thoughts on how I can start diagnosing this problem?
On 12/5/12 12:43 PM, Schappet, James C james-schap...@uiowa.edu wrote:
Hi folks,
I am new to pig, and I am trying to get the basic pig + cassandra samples
working.
I have created the PigTest Keyspace, and I am trying to run some
Hi folks,
I am new to pig, and I am trying to get the basic pig + cassandra samples
working.
I have created the PigTest Keyspace, and I am trying to run some of the command
in test_storage.pig, but I get the following:
tsunami:pig schappetj$ bin/pig_cassandra -x local
Using
GENERATE Flatten(BagSplit(50,A));
COMPLETE_RCORDS = FOREACH SPLITS GENERATE FLATTEN(MyCustomUDF($0));
Thanks,
James
Thanks Dmitriy, all sorted now.
James
On Mon, Sep 3, 2012 at 6:21 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:
That's cause you used group all which groups everything into one
group, which by definition can only go to one reducer.
What if instead you group into some large-enough number
Yes, whatever I name the output directory, the files inside are compressed
with the lzo codec.
On Mon, Jul 30, 2012 at 2:52 PM, souri datta souri.isthe...@gmail.comwrote:
James,
I may not have understood your question fully,but did you try renaming the
file w/o the '.gz' ? In hadoop/pig
Turns out I needed to set the option inside my pig script, as in:
SET mapred.output.compress false;
On Mon, Jul 30, 2012 at 2:21 PM, James Kebinger jkebin...@gmail.com wrote:
Hello, I'm running a pretty simple pig job but despite my best efforts to
disable compression, the output parts
Thanks Jonathan. That worked fine.
James
On 29 May 2012, at 08:43 PM, Jonathan Coveney jcove...@gmail.com wrote:
If you do a grouping, the ordering changes. What you want to do is:
D = FOREACH C GENERATE COUNT($1) as countd;
D1 = GROUP D ALL;
D2 = FOREACH D1 {
ord = ORDER $1 BY $0 desc
,$1;
DESCRIBE F;
F: {id: chararray,countd: long}
G = FOREACH F GENERATE BagSplit(??);
Not sure what I can put in ??
Thanks,
James
Thanks Jonathan. Great explanation.
On Mon, May 28, 2012 at 6:53 PM, Jonathan Coveney jcove...@gmail.comwrote:
Howdy James. It's important to remember that relations and bags are not the
same (though they feel pretty similar). EvalFuncs can never be run directly
on a relation, only on a bag
Given a relation that contains this:
({(11),(9)})
({(8),(7)})
Is it possible for me to SUM the contents of each bag so I get:
(20)
(15)
Thanks,
James
in pig, to extract only the top 3 products with the highest
counts for each city, ordered from highest to lowest?
Ideally, I would like the output to be like this:
(New York City, ((apples, 50), (oranges, 34), (pears, 23)))
(Another City, ((oranges, 52), (pears, 32), (apples, 12)))
Thanks,
James
Thanks Steve,
Yes I did discover nested foreach, but I can't get the syntax right. Can
anyone help get me started on how it's meant to look?
Regards,
James
On Wed, May 9, 2012 at 4:55 PM, Steve Bernstein steve.bernst...@deem.comwrote:
You can. Check out nested Foreach, order by then limit
Ok, figured out the nested foreach. Thanks for your help.
Regards,
James
On Wed, May 9, 2012 at 5:33 PM, James Newhaven james.newha...@gmail.comwrote:
Thanks Steve,
Yes I did discover nested foreach, but I can't get the syntax right. Can
anyone help get me started on how it's meant
I have a bag of tuples like this:
{ (product, unwanted, count), (product, unwanted, count) }
Is it possible in Pig to generate a new bag with a revised tuple structure
with one of its columns removed?
The desired structure I want is:
{ (product, count), (product, count) }
Thanks,
James
,
James
On Wed, May 9, 2012 at 7:47 PM, Steve Bernstein steve.bernst...@deem.comwrote:
FLATTEN() the bag, re-project (foreach/generate) leaving out the unwanted
items, then group back together if you like.
_
Steve Bernstein
VP, Analytics
Rearden Commerce, Inc.
+1.408.499.0961
to another name e.g.
results.csv
Thanks,
James
You could also try the Pig DataFu Library FirstTupleFromBag -
FirstTupleFromBaghttp://sna-projects.com/datafu/javadoc/0.0.4/datafu/pig/bags/FirstTupleFromBag.html
James
On Thu, Apr 12, 2012 at 8:01 AM, keeyong han keeyong...@hotmail.com wrote:
I have a bag consisting of tuples and I would
there is null prefixing group and tagcount. Can
anyone help explain what this means?
Thanks,
James
pablo.daniel.marti...@gmail.comwrote:
this is a very good explanation of flatten:
http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html#foreach_flatten
On Wed, Apr 11, 2012 at 9:46 AM, James Newhaven james.newha...@gmail.com
wrote:
Hi,
Sorry, this is a PIG newbie question
by BagSplit, I get:
{datafu.pig.bags.bagsplit_J_1979: {(data: {(A::group:chararray,A::tagcount:
long)})}}
Thanks,
James
On Wed, Apr 11, 2012 at 5:11 PM, Dan Feldman hriunde...@gmail.com wrote:
Hey James,
Have you looked at linkedIn's collection of UDFs, datafu (
http://engineering.linkedin.com/open
Thanks, it must have been the lack of parenthesis that did me in when i
tried the ternary expression, or some other typo. I'll use that in the
future.
On Mon, Aug 29, 2011 at 2:29 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:
Hi James,
I use ternary expressions for this: foreach joined generate
that you want), but wasn't able to give a concrete example.
2011/2/16 James Kebinger jkebin...@gmail.com
Hello all, I've been scratching my head over a problem with a pig script
I'm
having, and hoping another set of eyeballs will help. I'm using pig 0.8,
in
local mode
Here's my simplified
https://issues.apache.org/jira/browse/PIG-1859
On Thu, Feb 17, 2011 at 12:32 PM, James Kebinger jkebin...@gmail.comwrote:
Interesting, maybe I should file a bug report then?
On Thu, Feb 17, 2011 at 10:41 AM, Jonathan Coveney jcove...@gmail.comwrote:
I am glad that you got
in the bash script that calls the pig job, replace the interpreter definition:
#!/bin/bash
with this:
#!/bin/bash -l
-Original Message-
From: Marilson Campos [mailto:mbc_act...@yahoo.com]
Sent: Wednesday, November 10, 2010 3:53 PM
To: user@pig.apache.org
Subject: Re: Encoding byte
35 matches
Mail list logo