Nice! On Sun, Jan 20, 2019 at 8:28 AM Eyal Allweil <e...@apache.org> wrote:
> I wrote a blog post for the PayPal engineering blog detailing some of the > (Pig) content I've contributed to DataFu on behalf of PayPal. The post > contains documentation and code samples of three macros and a UDF: > > *dedup* - for deduplicating rows based on a key and date updated fields > > *sample_by_keys* - a macro for generating a sample of a table based on a > list of unique ids > > *diff_macro* - for generating a human readable diff between two tables > > *CountDistinctUpTo* - a UDF which performs much better than pure Pig for > cases in which you don't need the actual records, but just to verify that a > certain amount exists > > > https://medium.com/paypal-engineering/a-guide-to-paypals-contributions-to-apache-datafu-b30cc25e0312 > > The blog post will be cross-posted to the Apache DataFu blog soon. > > Cheers, > Eyal > -- Russell Jurney @rjurney <http://twitter.com/rjurney> russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB <http://facebook.com/jurney> datasyndrome.com