Re: updates using Pig

2012-08-29 Thread pablomar
now I can see it :-) very beautiful place On Wed, Aug 29, 2012 at 5:47 AM, Srini wrote: > Thank-you very much Jonathan... > > On Tue, Aug 28, 2012 at 2:47 AM, Jonathan Coveney >wrote: > > > I would do this with a cogroup. Whether or not you need a UDF depends on > > whether or not a key can a

Re: updates using Pig

2012-08-29 Thread Srini
Thank-you very much Jonathan... On Tue, Aug 28, 2012 at 2:47 AM, Jonathan Coveney wrote: > I would do this with a cogroup. Whether or not you need a UDF depends on > whether or not a key can appear more than once in a file. > > trade-keytrade-add-date trade-price > > feed_group = cogro

Re: updates using Pig

2012-08-27 Thread Jonathan Coveney
I would do this with a cogroup. Whether or not you need a UDF depends on whether or not a key can appear more than once in a file. trade-keytrade-add-date trade-price feed_group = cogroup feed1 by trade-key, feed2 by trade-key; feed_proj = foreach feed_group generate FLATTEN( IsEmpty(fe

Re: updates using Pig

2012-08-27 Thread Srini
Hello TianYi Zhu, Thanks !! and will get back.. -->by the way, you can sort these 2 files by trade-key then merge them using a small script, that's much more faster than using pig. ... Trying out POC on updates in hadoop Thanks, Srinivas On Tue, Aug 28, 2012 at 12:55 AM, TianYi Zhu < tianyi

Re: updates using Pig

2012-08-27 Thread TianYi Zhu
Hi Srinivas, you can write a user defined function for this feed = union feed1, feed2; feed_grouped = group feed by trade-key; output = foreach feed_grouped generate flatten(your_user_defined_function(feed)) as (trade-key, trade-add-date, trade-price) your_user_defined_function take the one or m

updates using Pig

2012-08-27 Thread Srinivas Surasani
Hi, I'm trying to do updates of records in hadoop using Pig ( I know this is not ideal but trying out POC ).. data looks like the below: *feed1:* --> here trade key is unique for each order/record --> this is history file trade-keytrade-add-date trade-price *k1 05/21/20