Nice!

On Sun, Jan 20, 2019 at 8:28 AM Eyal Allweil <e...@apache.org> wrote:

> I wrote a blog post for the PayPal engineering blog detailing some of the
> (Pig) content I've contributed to DataFu on behalf of PayPal. The post
> contains documentation and code samples of three macros and a UDF:
>
> *dedup* - for deduplicating rows based on a key and date updated fields
>
> *sample_by_keys* - a macro for generating a sample of a table based on a
> list of unique ids
>
> *diff_macro* - for generating a human readable diff between two tables
>
> *CountDistinctUpTo* - a UDF which performs much better than pure Pig for
> cases in which you don't need the actual records, but just to verify that a
> certain amount exists
>
>
> https://medium.com/paypal-engineering/a-guide-to-paypals-contributions-to-apache-datafu-b30cc25e0312
>
> The blog post will be cross-posted to the Apache DataFu blog soon.
>
> Cheers,
> Eyal
>
-- 
Russell Jurney @rjurney <http://twitter.com/rjurney>
russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB
<http://facebook.com/jurney> datasyndrome.com

Reply via email to