JOB | Permanent Data Engineer (Cambridge, UK)

2017-11-06 Thread James Tobin
"JamesBTobin (at) Gmail (dot) Com". Kind regards, James

Re: how to write custom log loader and store in JSON format

2015-07-06 Thread James Bond
I am not sure about Pig, but its easily achievable in MapReduce. We had a similar requirement, we had to convert logs from RFC syslog format (5424) into JSON. We have a MR job which does this for us. The reason why we chose MR was mainly for Error Handling - like missing fields in some records,

?????? Recursive value assignment

2015-02-03 Thread James
assignment, as it would be confused. I might wonder which 'data' does 'out' relies. James

Recursive value assignment

2015-01-26 Thread James
Dear all, I try the following code: ``` data = load 'xxx' as (col: String); data = filter data by col != ''; out = foreach data generate col; ``` I don't know why pig allow recursive value assignment, as it would be confused. I might wonder which 'data' does 'out' relies. James

Merge join produce incorrect output

2014-11-12 Thread James
Hello, I am trying to use merge join to speed up a join operator that finding friends of a list of person from a social network. But in my trail I get incorrect output. Could you give me some advise about what I have do wrong? Here is my code: ``` /* Find frinds of sample, merge join

What can we get from the result of embed-python script?

2014-10-24 Thread James
Hello, I hope to write a pig script executor using python as nohup export too many useless infomation. My target is to write a python script to execute specific pig script and log the number of records store in HDFS. Thus I want to know what can I obtain from the result object return by the

Re:java.lang.String cannot be cast to java.lang.Integer

2014-05-30 Thread James
How about using the explicit cast like ordered = ORDER query BY (int)z; Alcaid -- Original -- From: Patcharee Thongtra;patcharee.thong...@uni.no; Date: Fri, May 30, 2014 06:02 PM To: useruser@pig.apache.org; Subject: java.lang.String cannot be cast to

??????Re: Any way to join two aliases without using CROSS

2014-03-25 Thread James
Hello, There is a similar UDF in DataFu named Enumerate. http://datafu.incubator.apache.org/docs/datafu/1.2.0/datafu/pig/bags/Enumerate.html I wish it may help. James

Debugging UDFs

2013-04-13 Thread James Newhaven
I have defined a pig UDF want to track problems using warnings like this: warn(My warning, PigWarning.UDF_WARNING_1); I'm testing this in local mode first, but I never see this warning anywhere. Any help/ideas would be appreciated. Thanks, James

Subtracting contents of two bags

2013-01-22 Thread James Newhaven
a relation that contains two bags so it can be supplied to the DIFF function? Any suggestions would be appreciated. Thanks, James

Re: ERROR 2999: Unexpected internal error. null

2012-12-11 Thread James Schappet
on in there, and why it is erroring out. Then I would follow that backwards into whatever in Pig was generating that issue. That's pretty vague but can't really say much else unless I knew a ton about CassandraStorage. 2012/12/10 James Schappet jschap...@gmail.com Any thoughts on how I can start

Re: ERROR 2999: Unexpected internal error. null

2012-12-10 Thread James Schappet
Any thoughts on how I can start diagnosing this problem? On 12/5/12 12:43 PM, Schappet, James C james-schap...@uiowa.edu wrote: Hi folks, I am new to pig, and I am trying to get the basic pig + cassandra samples working. I have created the PigTest Keyspace, and I am trying to run some

ERROR 2999: Unexpected internal error. null

2012-12-05 Thread Schappet, James C
Hi folks, I am new to pig, and I am trying to get the basic pig + cassandra samples working. I have created the PigTest Keyspace, and I am trying to run some of the command in test_storage.pig, but I get the following: tsunami:pig schappetj$ bin/pig_cassandra -x local Using

UDF Performance Problem

2012-09-03 Thread James Newhaven
GENERATE Flatten(BagSplit(50,A)); COMPLETE_RCORDS = FOREACH SPLITS GENERATE FLATTEN(MyCustomUDF($0)); Thanks, James

Re: UDF Performance Problem

2012-09-03 Thread James Newhaven
Thanks Dmitriy, all sorted now. James On Mon, Sep 3, 2012 at 6:21 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote: That's cause you used group all which groups everything into one group, which by definition can only go to one reducer. What if instead you group into some large-enough number

Re: Unable to disable compression of output

2012-07-30 Thread James Kebinger
Yes, whatever I name the output directory, the files inside are compressed with the lzo codec. On Mon, Jul 30, 2012 at 2:52 PM, souri datta souri.isthe...@gmail.comwrote: James, I may not have understood your question fully,but did you try renaming the file w/o the '.gz' ? In hadoop/pig

Re: Unable to disable compression of output

2012-07-30 Thread James Kebinger
Turns out I needed to set the option inside my pig script, as in: SET mapred.output.compress false; On Mon, Jul 30, 2012 at 2:21 PM, James Kebinger jkebin...@gmail.com wrote: Hello, I'm running a pretty simple pig job but despite my best efforts to disable compression, the output parts

Re: Losing ordering after using ORDER BY

2012-05-30 Thread James Newhaven
Thanks Jonathan. That worked fine. James On 29 May 2012, at 08:43 PM, Jonathan Coveney jcove...@gmail.com wrote: If you do a grouping, the ordering changes. What you want to do is: D = FOREACH C GENERATE COUNT($1) as countd; D1 = GROUP D ALL; D2 = FOREACH D1 { ord = ORDER $1 BY $0 desc

Passing a single Bag to an Eval function

2012-05-28 Thread James Newhaven
,$1; DESCRIBE F; F: {id: chararray,countd: long} G = FOREACH F GENERATE BagSplit(??); Not sure what I can put in ?? Thanks, James

Re: Passing a single Bag to an Eval function

2012-05-28 Thread James Newhaven
Thanks Jonathan. Great explanation. On Mon, May 28, 2012 at 6:53 PM, Jonathan Coveney jcove...@gmail.comwrote: Howdy James. It's important to remember that relations and bags are not the same (though they feel pretty similar). EvalFuncs can never be run directly on a relation, only on a bag

Summing the contents of a bag

2012-05-28 Thread James Newhaven
Given a relation that contains this: ({(11),(9)}) ({(8),(7)}) Is it possible for me to SUM the contents of each bag so I get: (20) (15) Thanks, James

Ordering and limiting Tuples inside a Bag

2012-05-09 Thread James Newhaven
in pig, to extract only the top 3 products with the highest counts for each city, ordered from highest to lowest? Ideally, I would like the output to be like this: (New York City, ((apples, 50), (oranges, 34), (pears, 23))) (Another City, ((oranges, 52), (pears, 32), (apples, 12))) Thanks, James

Re: Ordering and limiting Tuples inside a Bag

2012-05-09 Thread James Newhaven
Thanks Steve, Yes I did discover nested foreach, but I can't get the syntax right. Can anyone help get me started on how it's meant to look? Regards, James On Wed, May 9, 2012 at 4:55 PM, Steve Bernstein steve.bernst...@deem.comwrote: You can. Check out nested Foreach, order by then limit

Re: Ordering and limiting Tuples inside a Bag

2012-05-09 Thread James Newhaven
Ok, figured out the nested foreach. Thanks for your help. Regards, James On Wed, May 9, 2012 at 5:33 PM, James Newhaven james.newha...@gmail.comwrote: Thanks Steve, Yes I did discover nested foreach, but I can't get the syntax right. Can anyone help get me started on how it's meant

Removing unwanted items in tuple

2012-05-09 Thread James Newhaven
I have a bag of tuples like this: { (product, unwanted, count), (product, unwanted, count) } Is it possible in Pig to generate a new bag with a revised tuple structure with one of its columns removed? The desired structure I want is: { (product, count), (product, count) } Thanks, James

Re: Removing unwanted items in tuple

2012-05-09 Thread James Newhaven
, James On Wed, May 9, 2012 at 7:47 PM, Steve Bernstein steve.bernst...@deem.comwrote: FLATTEN() the bag, re-project (foreach/generate) leaving out the unwanted items, then group back together if you like. _ Steve Bernstein VP, Analytics Rearden Commerce, Inc. +1.408.499.0961

Combining pig output

2012-04-15 Thread James Newhaven
to another name e.g. results.csv Thanks, James

Re: Extracting only the first tuple out of a bag

2012-04-13 Thread James Newhaven
You could also try the Pig DataFu Library FirstTupleFromBag - FirstTupleFromBaghttp://sna-projects.com/datafu/javadoc/0.0.4/datafu/pig/bags/FirstTupleFromBag.html James On Thu, Apr 12, 2012 at 8:01 AM, keeyong han keeyong...@hotmail.com wrote: I have a bag consisting of tuples and I would

Strange behaviour when using FLATTEN

2012-04-11 Thread James Newhaven
there is null prefixing group and tagcount. Can anyone help explain what this means? Thanks, James

Re: Strange behaviour when using FLATTEN

2012-04-11 Thread James Newhaven
pablo.daniel.marti...@gmail.comwrote: this is a very good explanation of flatten: http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html#foreach_flatten On Wed, Apr 11, 2012 at 9:46 AM, James Newhaven james.newha...@gmail.com wrote: Hi, Sorry, this is a PIG newbie question

Re: Dividing a bag into smaller bags

2012-04-11 Thread James Newhaven
by BagSplit, I get: {datafu.pig.bags.bagsplit_J_1979: {(data: {(A::group:chararray,A::tagcount: long)})}} Thanks, James On Wed, Apr 11, 2012 at 5:11 PM, Dan Feldman hriunde...@gmail.com wrote: Hey James, Have you looked at linkedIn's collection of UDFs, datafu ( http://engineering.linkedin.com/open

Re: How to coalesce fields in Pig?

2011-08-29 Thread James Kebinger
Thanks, it must have been the lack of parenthesis that did me in when i tried the ternary expression, or some other typo. I'll use that in the future. On Mon, Aug 29, 2011 at 2:29 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote: Hi James, I use ternary expressions for this: foreach joined generate

Re: Problems with union, projection producing unexpected results

2011-02-17 Thread James Kebinger
that you want), but wasn't able to give a concrete example. 2011/2/16 James Kebinger jkebin...@gmail.com Hello all, I've been scratching my head over a problem with a pig script I'm having, and hoping another set of eyeballs will help. I'm using pig 0.8, in local mode Here's my simplified

Re: Problems with union, projection producing unexpected results

2011-02-17 Thread James Kebinger
https://issues.apache.org/jira/browse/PIG-1859 On Thu, Feb 17, 2011 at 12:32 PM, James Kebinger jkebin...@gmail.comwrote: Interesting, maybe I should file a bug report then? On Thu, Feb 17, 2011 at 10:41 AM, Jonathan Coveney jcove...@gmail.comwrote: I am glad that you got

RE: Encoding byte code 254 in pig.

2010-12-06 Thread James Brown
in the bash script that calls the pig job, replace the interpreter definition: #!/bin/bash with this: #!/bin/bash -l -Original Message- From: Marilson Campos [mailto:mbc_act...@yahoo.com] Sent: Wednesday, November 10, 2010 3:53 PM To: user@pig.apache.org Subject: Re: Encoding byte