dividends = load 'try.txt' a = foreach dividends generate FLATTEN(TOBAG(*)); b = foreach (group a all) generate CalculateAvg($1);
I think that should work 2013/3/5 pablomar <pablo.daniel.marti...@gmail.com> > what is the error ? > function not found or something like that ? > > what about this ? > avg = generate myudfs.CalculateAvg(dividends); > > > On Mon, Mar 4, 2013 at 4:56 PM, Preeti Gupta <preetigupt...@soe.ucsc.edu > >wrote: > > > Hello All, > > > > I have dataset like > > > > 0, 10.1, 20.1, 30, 40, > > 50, 60, 70, 80.1, 1, > > 2, 3, 4, 5, 6, > > 7, 8, 9, 10, 11, > > 12, 13, 14, 15, 16, > > 1, 2, 3, 4, 5, > > 56, 6, 7, 8, 9, > > 9, 9, 9, 12, 1, > > 3, 14, 1, 5, 6, > > 7, 8, 8, 9, 12 > > > > So basically comma separated values. But I want to consider this as one > > data column and I want to calculate the average of the whole dataset. > > > > I believe I have to write UDF to calculate average. Pig is able to load > > this data > > > > ( 0, 10.1, 20.1, 30, 40,) > > ( 50, 60, 70, 80.1, 1,) > > ( 2, 3, 4, 5, 6,) > > ( 7, 8, 9, 10, 11,) > > ( 12, 13, 14, 15, 16,) > > ( 1, 2, 3, 4, 5,) > > ( 56, 6, 7, 8, 9,) > > ( 9, 9, 9, 12, 1,) > > ( 3, 14, 1, 5, 6,) > > ( 7, 8, 8, 9, 12 ) > > > > and How do I invoke that UDF in my pig script? Say I implement > > CalculateAvg function. > > > > REGISTER ./myudfs.jar > > dividends = load 'try.txt'; > > dump dividends > > --grouped = group dividends by symbol; > > avg = generate CalculateAvg(dividends); > > dump avg > > --store avg into 'average_dividend'; > > > > It fails. > > > > >