Eyal Allweil created DATAFU-127: ----------------------------------- Summary: New macro - samply by keys Key: DATAFU-127 URL: https://issues.apache.org/jira/browse/DATAFU-127 Project: DataFu Issue Type: New Feature Reporter: Eyal Allweil Assignee: Eyal Allweil
Two macros that return a sample of a larger table based on a list of keys, with the schema of the larger table. One of the macros filters by dates, the other doesn't. If there are multiple rows with a key that appears in the key list, all of them will be returned (no deduplication is done). The results are returned ordered by the key field in a single file. The implementation uses a replicated join for efficiency, but this means the key list shouldn't be too large as to not fit in memory. The first macro's definition looks as follows: DEFINE sample_by_keys(table, sample_set, join_key_table, join_key_sample) returns out { - table_name - table name to sample - sample_set - a set of keys - join_key_table - join column name in the table - join_key_sample - join column name in the sample -- This message was sent by Atlassian JIRA (v6.4.14#64029)