If you calculate the size of the bags, you can use this value as a scalar and divide it by the number of bags you want, and round.
Don't ask me to write that code though :) Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com On Apr 11, 2012, at 9:11 AM, Dan Feldman <[email protected]> wrote: > Hey James, > > Have you looked at linkedIn's collection of UDFs, datafu ( > http://engineering.linkedin.com/open-source/introducing-datafu-open-source-collection-useful-apache-pig-udfs > )? > > In particular, they have a UDF called BagSplit ( > https://github.com/linkedin/datafu/blob/master/src/java/datafu/pig/bags/BagSplit.java). > It might not do exactly what you want since it splits a bag into bags of > size n, not into 10 equal-sized bags, but it shouldn't be too hard to write > your own UDF using BagSplit.java as a reference. > > Dan F. > > > > On Wed, Apr 11, 2012 at 8:53 AM, James Newhaven > <[email protected]>wrote: > >> Hi, >> >> I need to divide a large bag into 10 smaller bags of equal size. Does >> anyone know of a function that can do this easily? I've had a look at the >> standard functions and the PiggyBank and can't find anything appropriate. >> >> Thanks, >> James >>
