
On Sun, Jan 20, 2019 at 8:28 AM Eyal Allweil <> wrote:

> I wrote a blog post for the PayPal engineering blog detailing some of the
> (Pig) content I've contributed to DataFu on behalf of PayPal. The post
> contains documentation and code samples of three macros and a UDF:
> *dedup* - for deduplicating rows based on a key and date updated fields
> *sample_by_keys* - a macro for generating a sample of a table based on a
> list of unique ids
> *diff_macro* - for generating a human readable diff between two tables
> *CountDistinctUpTo* - a UDF which performs much better than pure Pig for
> cases in which you don't need the actual records, but just to verify that a
> certain amount exists
> The blog post will be cross-posted to the Apache DataFu blog soon.
> Cheers,
> Eyal
Russell Jurney @rjurney <> LI <> FB

Reply via email to