Sure, pig is great for this sort of thing. And if there is something that it has trouble doing, you can just make a custom little Java program to do it; it's called a UDF.

--- Eric Wadsworth

PS: I'm looking for work with hadoop. Resume is here: http://wadhome.org/~wad/resume/hadoop/

On 08/14/2011 01:34 PM, W.P. McNeill wrote:
I have a large scale set ordering task and am trying to determine if Pig
would be a good tool to use. I've read through the basic documentation and
played with some simple examples and so have a general idea of how Pig
works.

My task is this: I have a set of ordered sets S = {s1, s2, ... sn}. Every
element of S has its elements drawn from some vocabulary V. For example:

V = {a, b, c, d}
s1 = {d, b, c}
s2 = {d, c}
s3 = {d}
s4 = {d, c, b, a}

I want to build a frequency table for how often the vocabulary elements
appear and then reorder each set by increasing element frequency. In this
example the frequency table is:

a->1
b->2
c->3
d->4

so the sets would be reordered like so :

s1 = {b, c, d}
s2 = {c, d}
s3 = {d}
s4 = {a, b, c, d}

Is there an easy way to do this in Pig? I can see how to build the frequency
table, but I'm not sure how to use it to reorder the sets. In particular,
I'm not sure how to fit the idea of variable length ordered sets into the
Pig framework, since the core concept here is a database row whose number of
columns is known in advance.


Reply via email to