Exciting time for Pig!!
2013/9/12 ajay kumar
> congratulations guys ...!
>
>
> On Thu, Sep 12, 2013 at 11:54 PM, Bill Graham
> wrote:
>
> > Congrats guys! Well deserved indeed.
> >
> >
> > On Wed, Sep 11, 2013 at 10:58 PM, Jarek Jarcec Cecho > >wrote:
> >
> > > Congratulations Rohini and Cheo
Very well deserved!!
2013/9/12 Thejas Nair
> Congrats Koji! Very well deserved!
>
>
> On Wed, Sep 11, 2013 at 9:49 AM, Daniel Dai wrote:
> > Congratulation! You are well deserved.
> >
> >
> >
> >
> > On Wed, Sep 11, 2013 at 6:33 AM, Miguel Angel Martin junquera <
> > mianmarjun.mailingl...@gma
Hello!
I implemented the SchemaTuple stuff. Glad to hear you're trying it out! I
did not test it with YARN at all. It looks like the way that the filesystem
and distributed cache work have changed. I myself am not super up on that,
but perhaps there is known documentation on how it differs? The wa
Very cool!
2013/6/17 Russell Jurney
> Awesome!
>
>
> On Sun, Jun 16, 2013 at 3:15 PM, Connor Woodson >wrote:
>
> > I mentioned a few months ago that I was interested in creating a new
> > Scripting Engine for Pig based off of the R language. I have finally
> gotten
> > that project to a point
s (since gz doesn't allow splitting).
>
> In the uncompressed case, blocks before AND AFTER the nulls were ok and
> contributed data to my COUNT(*).
>
> In the compressed case, only data before the nulls contributed data to my
> COUNT(*).
>
> will
>
>
> On Tue,
William,
It would be really awesome if you could furnish a file that replicates this
issue that we can attach to a bug in jira. A long time ago I had a very
weird issue with some gzip files and never got to the bottom of it...I'm
wondering if this could be it!
2013/6/10 Niels Basjes
> Bzip2 is
It uses the best one it can. Algebraic is generally better than
Accumulator, and if it can use Algebraic it will. If it can't use either,
it will use the default EvalFunc.
In Pig, there aren't too many cases where an Algebraic/Accumulator EvalFunc
will have to be evaluated as an Accumulator...in i
You can do this, but pig has a CROSS keyword that you can use.
2013/5/23 Mehmet Tepedelenlioglu
> Hi,
>
> I am using this:
>
> x = join a by 1, b by 1 using 'replicated';
>
> with the hope that it generates some synthetic key '1' on both a and b and
> joins it on that key, thereby, in this cas
Any chance you could replicate this for us? Ideally some dummy data and a
script?
2013/5/19 Mehmet Tepedelenlioglu
> Hi,
>
> Recently I was taking the cross product between 2 bags of tuples one of
> which has only one tuple, to append the one with one element to all the
> others (I know this is
Also, look into the TOP udf instead of doing the limit. It can potentially
be a lot faster and is cleaner, IMHO.
2013/5/19 Norbert Burger
> Take a look at the PARALLEL clause:
>
> http://pig.apache.org/docs/r0.7.0/cookbook.html#Use+the+PARALLEL+Clause
>
> On Fri, May 17, 2013 at 10:48 AM, Vince
pig latin does not support it, but it is pretty easy to do it by using the
python control flow. this or java is the preferred way of doing it.
2013/5/7 yonghu
> Dear all,
>
> I wonder if someone can tell me if the current version of pig support loop
> and branching?
>
> regards!
>
> Yong
>
cdh-user to bcc
Your question doesn't make much sense...I think you may have left a piece
off?
2013/5/7 abhishek
> Hi all,
>
> In my script
>
> a = load 'data' using PigStorage();
>
> b = foreach a generate
> 342 as col1,
> substring(x,0,4) as col2,
> ;
>
> I want to use col2 later in foreach
= load 'hbase://data' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('1:*', ' -loadKey true')
> AS (id:chararray, data:map[]);
>
> Would i call the invoke after the load?
>
>
> thanks.
>
> JM
>
>
>
>
>
>
>
> ---
You could also use the following (in trunk):
https://issues.apache.org/jira/browse/PIG-3198
so you'd do: invoke&Integer.valueOf(x, 16); where x would be the hex string
2013/5/6 Alan Gates
> I am not aware of any built in or Piggybank UDF that converts Hex to Int,
> but it would be a welcome co
Are you familiar with the CUBE keyword that was relatively recently added?
This sounds like a perfect use case for it. Furthermore, how are you
splitting on activity? There is a SPLIT operator which is perfect for this,
as you can have a different relation for each one.
What I would do would be to
Why do you have an "as" statement with the store? The schema should come
down with the script. That's probably the issue.
2013/5/4 ÐΞ€ρ@Ҝ (๏̯͡๏)
> Ignore above query. Its incorrect.
>
> I have following pig script
> A = LOAD 'textinput' using PigStorage() as (a0:chararray, a1:chararray,
> a2:ch
Dan,
I implemented most of the jruby stuff. Glad to hear you're trying it out!
Please let us know what your experience is like.
I definitely had plans to upgrade to jruby 1.7, and am not sure why I never
did...hmm.
Ahh, ok, it's part of this patch...which still isn't committed...you should
bump
woops, wrong listserv :)
2013/4/5 Jonathan Coveney
> The following gist illustrates my question:
>
> https://gist.github.com/jcoveney/5320422
>
> It seems pretty surprising to me that all of these cases all return 1.0,
> at least in python (I will now do this in Java, it
The following gist illustrates my question:
https://gist.github.com/jcoveney/5320422
It seems pretty surprising to me that all of these cases all return 1.0, at
least in python (I will now do this in Java, it's just more verbose). Is
this an issue with python? Is this an issue period? Is this une
as far as when the storefunc works, it depends on whether the job is map
only or map/reduce. It'll work on the last phase. Generally this is the
reduce phase.
As far as how pig knows where to send it's output, there are keys in pig.
Basically, a reduce job is necessary any time you have a group, j
t; > When initialized, the ParselyMetadataService creates a new Mongo and
> > Jedis
> > > instance which the EvalFunc queries using a public method fetch().
> > > Instance of ParselyMetadataService also have a close() function which
> > > simply calls:
> >
Ack, hit enter. I'd look at the LoadFunc interface, the PigSTorage class,
and if you can't make it work without playing a little, let me know.
2013/3/19 Jonathan Coveney
> doing "new PigStorage()" is possible, but tricky. Maybe some of the other
> contributors have a
https://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/PigStorage.html
> >,
> would you know how to construct this process from a baseline PigStorage
> Object, such as:
>
> PigStorage pigstorage = new PigStorage();
>
> Any ideas?
>
> -Dan
>
> On Tue, Mar 1
l also make the process more approachable for
> another programmer to write additional unit tests.
>
> -Dan
>
> On Tue, Mar 19, 2013 at 11:43 AM, Jonathan Coveney >wrote:
>
> > How are you planning on generating these cases? By hand? Or automated?
> >
> >
>
esting of my UDFs.
>
> -Dan
>
> On Tue, Mar 19, 2013 at 11:27 AM, Jonathan Coveney >wrote:
>
> > how was string_databag generated?
> >
> >
> > 2013/3/19 Dan DeCapria, CivicScience
> >
> > > Expanding upon this, the follow
ach
> nesting
> > from Tuple and DataBag factories, append data, and next them manually.
> For
> > larger unit tests, this process becomes unwieldy (hundreds of lines per
> > method, non-dynamic), and it would be much simpler to go directly from a
> > String and a Sc
Why not just use PigStorage? This is essentially what it does. It saves a
bag as text, and then loads it again.
I suppose the question becomes: why do you need to do this?
2013/3/18 Dan DeCapria, CivicScience
> In Java, I am trying to convert a DataBag from it's String representation
> with it
andling)
>
> -Kris
>
> On Mon, Mar 18, 2013 at 11:19:17AM +0100, Jonathan Coveney wrote:
> > Absolutely.
> >
> > public class MyUdf extends EvalFunc {
> > public DataBag exec(Tuple input) throws IOException {
> > return (DataBag)input.get(0);
> > }
Absolutely.
public class MyUdf extends EvalFunc {
public DataBag exec(Tuple input) throws IOException {
return (DataBag)input.get(0);
}
}
A dummy example, but there you go. DataBag is a valid pig type like any
other, so you just returnit like you would normally.
2013/3/18 pranjal rajpu
ystem. This time is
> passed in by the system to pig when the job is launched. Since I
> partition files by time field, a user could filter based on the result
> of this UDF.
>
>
>
> On Thu, Mar 14, 2013 at 3:15 PM, Jonathan Coveney
> wrote:
> > No, it is not.
The script you posted wouldn't have any reducers, so it wouldn't matter.
It's a map only job.
2013/3/15
> Dear Apache Pig Users,
>
> It is easy to control a number of reducers in JOIN, GROUP, COGROUP,
> etc. statements by a general "set default_parallel $NUM" command or
> "parallel $NUM" info i
No, it is not. But if it knew that, how would that filter be meaningful?
What do you have in mind?
2013/3/14 Jeff Yuan
> Rohini, I see your point.
>
> One followup question: it's possible for the result of a UDF to be
> constant and not dependent on the tuples of each record, right? Is Pig
> ab
Can you perhaps share more of your implementation? I can imagine a couple
of things which would cause errors like this. Are you making sure that each
instance of EvalFunc is dealing with a different connection?
That's what I'd take a look at first...if that isn't the issue, I can look
into how fin
ncoding, and RLE encoding of
> > > data (Cloudera and Twitter)
> > > * Further improvements to Pig support (Twitter)
> > >
> > > Company names in parenthesis indicate whose engineers signed up to do
> the
> > > work -- others can feel free to jump in too, of course.
> > >
> > > We've also heard requests to provide an Avro container layer, similar
> to
> > > what we do with Thrift. Seeking volunteers!
> > >
> > > We welcome all feedback, patches, and ideas; to foster community
> > > development, we plan to contribute Parquet to the Apache Incubator when
> > the
> > > development is farther along.
> > >
> > > Regards,
> > > Nong Li, Julien Le Dem, Marcel Kornacker, Todd Lipcon, Dmitriy Ryaboy,
> > > Jonathan Coveney, and friends.
> >
>
Can you try this on trunk and let me know if you have a similar error?
Also can you turn on DEBUG and say if it is taking forever during or after
parsing?
2013/3/6 Haitao Yao
> Hi all
> I have a big pig script running under pig-0.9.2. While upgrading
> to pig 0.11 or 0.10, the script n
There have been a number of explanations on the topic before, so I would
prefer to point at one of them (or ensure we document it better), but
basically all of the aggregation functions we use (sum, avg, etc) all
function on bags of stuff. This is actually true in SQL as well (it just
hides the "gr
if you use the alias "@", it should properly dump etc the last alias. If
not file a JIRA.
2013/3/5 Jeff Yuan
> Thanks for your suggestions, they work very well. One follow up question:
>
> Is there a way to dynamically strip STORE and DUMP commands from a
> loaded in script? So everything work
dividends = load 'try.txt'
a = foreach dividends generate FLATTEN(TOBAG(*));
b = foreach (group a all) generate CalculateAvg($1);
I think that should work
2013/3/5 pablomar
> what is the error ?
> function not found or something like that ?
>
> what about this ?
> avg = generate myudfs.C
Why don't you want to group?
2013/3/5 Preeti Gupta
> I want to compute the Average for 1 column dataset
> 1
> 2
> 3
> 4
> 5
>
> and I am not able to do without grouping.
>
> However I got an average with
>
> avg = foreach (group dividends all) generate AVG(dividends);
>
> But
>
> avg = fo
it correctly, for the same run, all the CurrentTime()
> will return the same timestamp. I wonder if there any udf can provide
> runtime timestamp.
>
> Thanks.
> Dan
>
> -Original Message-
> From: Jonathan Coveney [mailto:jcove...@gmail.com]
> Sent: Wednesday,
This is by design, as the notion of a CurrentTime() in a Pig job is a big
poorly specified, so we went with something "unremarkable." What do you
think it should be?
2013/2/27 Cheolsoo Park
> Hi Dan,
>
> Are you using 0.11 or trunk?
>
> If you're using trunk, please take a look at PIG-3014.
> h
What do you have in mind?
2013/2/26 Preeti Gupta
> Hello Everyone,
>
> I want to make some changes in the way Pig generates Hadoop jobs. Any one
> got some idea on how to do this?
>
> regards
>
> Preeti
me:chararray, description:chararray has to be dynamically
> created based on the parameters passed.
> Is there any way of getting this done?
>
> -Original Message-
> From: Jonathan Coveney [mailto:jcove...@gmail.com]
> Sent: Wednesday, February 20, 2013 10:48 PM
> To: us
be achieved in a pig script?
>
> Also depending on the output file format, I need to invoke the
> corresponding exporter script (html or csv) from my wrapper script. I don’t
> see any conditional operators available (if/else) in pig. Any idea how this
> can be achieved?
>
> ---
c, in my pig script I will not be able to
> > refrence the parameters as '$param1' . Is there any way to access these
> > params in the script without referring to the param name?
> >
> >
> > From: Jonathan Coveney [jcove...@
congrats :)
2013/2/20 Jarek Jarcec Cecho
> Congratulations Bill, good job!
>
> Jarcec
>
> On Tue, Feb 19, 2013 at 01:48:18PM -0800, Daniel Dai wrote:
> > Please welcome Bill Graham as our latest Pig PMC member.
> >
> > Congrats Bill!
>
Can you give an example of what you'd like this to look like?
2013/2/19 Siddhi Borkar
> Hi ,
>
> I need to pass parameters dynamically to a pig script. Is there any way to
> read the parameters passed and their corresponding values without giving
> the parameter names in the pig script?
>
> Tha
produce the compact execution plan for the whole script and not
> the several separate ones (one for each alias).
>
>
>
> On 2/18/2013 10:21 PM, Jonathan Coveney wrote:
>
>> I guess I'm confused at what you want then.
>>
>> So we have a script:
>>
>>
command
>
> $ pig -x local -e 'explain -script Temp1/TPC_test.pig -out
> explain-out9.txt'
> it will not give the same output as if we did it for each operation
> separately.
>
>
> On 2/18/2013 7:04 PM, Jonathan Coveney wrote:
>
>> Hacky way: grep
Hacky way: grep for "^\S =", pull out the names, and then do the explains.
Why is doing the progressive explains useful? it wouldn't be too hard to
build this into pig but the results would be pretty unwieldy, it'd be
really big, and pretty redundant.
2013/2/18 Petar Jovanovic
> Hi,
> I am try
Prashant: not sure. It probably isn't. We should make a ticket for that if
it isn't.
Russell: can you close the ticket you made, with the answer?
2013/2/11 Russell Jurney
> Well, I don't know if that works? But Software/ isn't assumed either.
> Really I just need a home directory... I've set mi
Bill was right, he just forgot an escape:
%default HOME `echo \$HOME`
I believe that should work
2013/2/11 Russell Jurney
> Yes, I agree. Please edit! :)
>
>
> On Mon, Feb 11, 2013 at 1:31 AM, Prashant Kommireddi >wrote:
>
> > Only suggesting a workaround, not implying it's the best solution
Sorry, hit enter prematurely.
Although in this particular case, it's a little janky, but you could have a
helper which takes the thrift class i.e. get_name(some_field, 'SomeClass')
and could use that SomeClass to let you refer by name.
2013/2/8 Jonathan Coveney
> Curren
Currently, the answer to this is no. In Javaland in 0.11.0 you can get the
schema in an EvalFunc, and it would not be hard to make this available from
a Jython UDF, though we'd need a patch.
2013/2/7 Stanley Xu
> Dear All,
>
> We are using pig with elephant-bird thrift to process structured rec
A tuple must fit in memory. That is the only bound.
2013/2/6 Dexin Wang
> I'm writing a UDF of my own that would produce tuples, each tuple has a
> string field that could be real large. I did a quick test and the current
> size of the field is 146,447 characters and it doesn't seem to have any
Similar question for Python UDF. In my Python UDF, is referencing field by
> index (instead of alias) is the only option I have?
>
>
> On Tue, Jan 15, 2013 at 2:20 PM, Jonathan Coveney >wrote:
>
> > Another way to do it would be to make a helper function that does
AFAIK, this is a Hadoop issue and not a pig issue. That said, we could make
this a configurable thing to overload and do something more reasonable.
Feel free to open a JIRA and suggest that (or else someone will see it and
say exactly that it is a Hadoop issue and not a Pig issue))
2013/2/2 Benja
Even better, push the tag_with_amenity = FILTER tag BY (tag_attr_k ==
'amenity'); as high as possible.
2013/1/31 Cheolsoo Park
> Hi Jerome,
>
> Try this:
>
> XmlTag = FOREACH xmlToTuple GENERATE FLATTEN ($0);
> XmlTag2 = FOREACH XmlTag {
> tag_with_amenity = FILTER tag BY (tag_attr_k == 'am
There was an issue in the parser that has been resolved in trunk (I forget
if it went into 0.11 or not). Can you test your script on trunk and see if
you still have the issue?
2013/1/28 Dongliang Sun
> Hi All,
>
> When there are too many nested bincond operators (more than 10), it's
> frozen th
else. However, I think
> that making the front-end thread-safe is an achievable goal.
>
> Thanks,
> Cheolsoo
>
>
>
> On Thu, Jan 24, 2013 at 11:18 PM, Ramakrishna Nalam
> wrote:
>
> > That clarifies it for me, thanks a lot.
> >
> > Regards,
> > Rama.
&g
> > pig job submitting thread waiting until the job completes?
> >
> > Is this just a shortcoming today or are there more concrete reasons
> against
> > providing with a pigserver which can submit to the cluster in mapreduce
> > mode async?
> >
> > Thanks,
by
deploying daemons that run pig jobs as local processes.
2013/1/23 Prashant Kommireddi
> Both. Think of it as an app server handling all of these requests.
>
> Sent from my iPhone
>
> On Jan 23, 2013, at 9:09 PM, Jonathan Coveney wrote:
>
> > Thousands of requests,
Thousands of requests, or thousands of Pig jobs? Or both?
2013/1/23 Prashant Kommireddi
> Did not want to have several threads launched for this. We might have
> thousands of requests coming in, and the app is doing a lot more than only
> Pig.
>
> On Wed, Jan 23, 2013 at 5:
start a separate Process which runs Pig?
2013/1/23 Prashant Kommireddi
> Hey guys,
>
> I am trying to do the following:
>
>1. Launch a pig job asynchronously via Java program
>2. Get a notification once the job is complete (something similar to
>Hadoop callback with a servlet)
>
> I
s if this
> is not possible at the moment.
>
> -Prashant
>
> On Mon, Jan 21, 2013 at 5:47 PM, Prashant Kommireddi >wrote:
>
> > At the moment, basically info on I/O paths, operators used (group by,
> > foreach ..), job level info such as number of reducers etc.
> >
At Twitter, we have a lightweight framework that handles stitching code
togetherso I think with pig, stitching stuff together in some organized
way is the current best practice.
2013/1/22 Cheolsoo Park
> Hi Eric,
>
> You can move REGISTER and SET to a properties file and DECLARE and DEFAULT
I do not believe that this is currently supported for nested projections,
though it should be. Feel free to make a JIRA ticket, I do not think it
would be hard.
2013/1/22 Uri Laserson
> I have tuple like so:
>
> (a: (b:int, c:int, d:int, e:int))
>
> I would like to call a UDF and pass a ran
What level of information would you like? IE if you do "explain relation,"
which of the three do you want to hook into?
2013/1/21 Prashant Kommireddi
> Been coding with the APIs and wondering if there is anything that allows
> you to only retrieve the operators, I/O paths etc without actually i
Another way to do it would be to make a helper function that does the
following:
input.get(getInputSchema().getPosition(alias));
Only available in 0.10 and later (I think getInputSchema is in 0.10, at
least...may only be in 0.11)
2013/1/15 Dexin Wang
> Hi,
>
> In my own UDF, is reference a fi
Can you share a script which replicates this? Ideally one that isolates the
issue, if it is quite long...
2013/1/14 abhishek
> >> Hi all,
> >>
> >> When am using JOIN operator in pig, am getting following error
> >>
> >> Pig joins inner plans can only have one output leaf?
> >>
> >> Can any one
How long is it taking?
2013/1/4 Malcolm Tye
> Hi,
>
> Any ideas on how to make Pig run quicker when running it in
> local mode ?
>
>
>
> I'm processing 3 files of about 13MB each with 3 group by statements in my
> script which seem to suck up the time. There's no joins
>
>
>
> I
a = load 'tab1' as (col1, col2, col3);
b = group a by (col1, col2, col3);
c = foreach b generate FLATTEN(group), COUNT_STAR(a);
2012/12/26 abhishek
> Hi all,
>
> How can I achieve above hive query in pig
>
> Create table x as select y.col1,y.col2,y.col3,count(*) as count from tab1
> y group by
This is a very broad question. On the Pig website you can find some papers
on how Pig was implemented, and this should give you a high level view of
what is going on.
For this code, you can use the explain command (explain in; instead of dump
in;) to see the 3 plans that this code generates (logic
> Peace be on you, Jonathan,
> It gives one leaf also
>
> --
> Regards,
> Sarah M. Hassan
>
>
>
> On Tue, Dec 18, 2012 at 4:24 AM, Jonathan Coveney >wrote:
>
> > Try it with joins, I think
> >
> >
> > 2012/12/16 Sarah Mohamed
> >
&
Try it with joins, I think
2012/12/16 Sarah Mohamed
> PhysicalPlan.getLeaves() return a list of leaves, Most of the cases it's
> only one"the root", is there any cases that the physical plan will have
> more than one leaf ?
>
> Thanks
> Sarah
>
it's a little confusing, but the following is a tuple: (key1,foo,)
it's just not the tuple you want. it is a tuple where the first field is
"key1,foo" and the second field is null. The printing makes this ambiguous
2012/12/14 Thomas Bach
> (key1,foo,)
This is a join. This is equivalent to.
A = load 'test_data' as (value);
B = foreach 'filter_data' as (x:int);
C = join A by value, B by x using 'replicated';
D = foreach C generate value as value;
One thing pig does not currently do nicely is let you create a relation
from nothing (ie define the
build/lib/jars/xmlenc-0.52.jar:/Library/apache-cassand
> >r
> >a-1.1.7-src/build/apache-cassandra-1.1.7-SNAPSHOT.jar:/Library/apache-cass
> >a
> >ndra-1.1.7-src/build/apache-cassandra-clientutil-1.1.7-SNAPSHOT.jar:/Libra
> >r
> >y/apache-cassandra-1.1.7-src/build/apache
I'm a little vague on what you want to do. Can you provide an example?
2012/12/11 Prashant Kommireddi
> Here is a snippet of how schema is applied to tuples
>
> String serializedSchema = p.getProperty(signature + SCHEMA_FILE);
> if (serializedSchema != null) {
>
str),
> making your code path impossible...
>
> will
>
>
> On Tue, Dec 11, 2012 at 1:00 PM, Jonathan Coveney >wrote:
>
> > If I were debugging this (note, I know nothing about cassandra), I would
> > put a flag in my ide on cassandra storage and see what is goi
If I were debugging this (note, I know nothing about cassandra), I would
put a flag in my ide on cassandra storage and see what is going on in
there, and why it is erroring out. Then I would follow that backwards into
whatever in Pig was generating that issue. That's pretty vague but can't
really s
I did not implement those UDF's... I imagine the reason for rigorously
using UTC instead of system time is because that can introduce subtle bugs
where your servers have a different time than your client and it can be
hard to debug, etc. It would be pretty easy to add support for timezone to
those
The default loader can't handle this. You would need a custom InputFormat,
which isn't too bad.
2012/12/9 L N
> Hi,
>
>
>
> > I have an unstructured file format. Assume below is the data in a file
> >
> > >
> >
> abxcd xyxc
>
> >
> >
> >
> > I need to process the data in between < >
he physical plan as a
> tree/graph structure.
>
> What I did that I implemented the PigProgressNotificationListener interface
> and I built it myself, and this is what you mean right?
>
> Thanks you for your help.
>
> --
> Regards,
> Sarah M. Hassan
>
>
>
Can you flesh out what you want it to do a little more? Maybe some example
queries?
2012/11/26
> Hi,
>
>
> We have a scenario where we want a single Hadoop job to create/manage
> multiple mapper tasks where each mapper task will query a subset of columns
> in a relational database table. We loo
What is your goal? When you say reconstruct, do you just mean get a handle
on the physical plan? You can make your own execution flag (ie extend the
interface behind local mode etc) and that method gives you the physical
plan.
2012/11/24 Sarah Mohamed
> Peace be on you,
>
> Is there a way to re
Pig is very much not thread safe. It uses static methods to add stuff to
contexts all over the place. It would be a ton of work to fix this.
2012/11/20 Cheolsoo Park
> Hi,
>
> I actually tried to run entire unit test suite in multiple threads, and I
> used this junit extension:
> http://tempusf
In pure Pig, you wouldn't do something like this. However, PIg supports
control flow in Python (I really should get on making the JRuby wrapper,
but I digress). You can find docs for this on the pig website. Basically
the control flow is in Python, and you launch jobs from there.
2012/11/19 Sheng
Make a JIRA and attach the patch, please.
2012/11/19 pablomar
> hi all,
>
> I did it as simple as I could. What about this changes ?
>
>
> PigStorage.java
> original:
> private void readField(byte[] buf, int start, int end) {
> if (start == end) {
> // NULL value
>
be there is an easier way I am missing here. If people have any ideas
> for a more elegant solution I would be happy to contribute develop it and
> contribute the code.
>
> Martin
>
>
>
>
>
>
>
> On 15 November 2012 20:20, Jonathan Coveney wrote:
>
>
Martin,
That is a reasonable workaround. Even in java UDF's, you can't directly
access fields by name. Tuples are indexed only by numbers. Using the Schema
is how I would do it.
2012/11/14 Martin Goodson
> Sorry to reply to my question post but I've found a workaround that I
> thought I should
This is great!
PS "I really liked the simple explanation of FLATTEN: it turns Tuples into
columns (because Tuples contain columns) and turns Bags into rows (because
Bags contain rows)." I'm so glad someone appreciated that :D I put a lot of
effort into that portion of it...
2012/11/14 Steve Bern
There are a couple of ways to shorten the time... one (super helpful one)
would be to look at tests using the MiniCluster, and convert them to use
local mode. A lot of tests are run using a full MR job when they aren't
testing a piece of Pig relevant to that interop.
Another way is to split up the
If it's a parameter, it could just be passed in as a $var
2012/11/13 Miki Tebeka
> Greetings,
>
> Is there a way to dynamically generate (maybe via UDF) the path to
> load/store data? (something like "A = LOAD InputPath() USING
> PigStorage();")
>
> Currently we calculate the load/store path ou
UDF's can only be given String arguments, period. So you can pass it a
boolean in String form and parse it.
2012/11/8 meghana narasimhan
> Hi All,
>
> Can I pass in a boolean value to Pig UDF constructor with Pig 0.9.2?
>
> I have a constructor :
>
> public GenStartEndDate(boolean mtdNoGlob) {
I agree with Alan on all counts. I think the confusing part is that null is
overloaded. Alas.
2012/11/5 Alan Gates
> Better in terms of semantics or terms of documentation? We can't change
> the semantics of null in Pig; it's been that way the whole time. Plus this
> concept of unknown data i
Now is when the real fun starts, Cheolsoo. Congrats :)
2012/10/26 Alan Gates
> Welcome Cheolsoo, and well deserved.
>
> Alan.
>
> On Oct 26, 2012, at 2:54 PM, Julien Le Dem wrote:
>
> > All,
> >
> > Please join me in welcoming Cheolsoo Park as our newest Pig committer.
> > He's been contributing
I have not used DBStorage myself and the comments are lacking, but there is
a syntactical issue here. All store statements have to be in the following
form:
store relation into 'location' using storefunc(args);
So you're case needs to be
STORE data INTO 'location' using DBStorage ('com.mysql.jdb
Howdy Joshua. This question comes up a fair amount, in various forms, and
here is the answer: unless you can figure out a way to reduce this to an
equi-join, then it is going to be tough.
Why is that? Because of how joining in map-reduce land works. The way
joining generally works is by hashing th
M/R is a useful but sometimes leaky abstraction. :)
2012/10/15 Alberto Cordioli
> Ok, I've found
>
> I was using values that return all the same value % number of reducers.
> For an unfortunate case I tested always multiple values..Ohhh, my
> fault.
>
>
> Cheers,
> Alberto
>
>
> On 13 Oc
1 - 100 of 454 matches
Mail list logo