To clarify, here is our input: X = LOAD 'input.txt' AS (id1:chararray, id2:charrarray, id3:charrarray, id4:chararray, id5:chararray);
We want to compute Y that consists of a single column denoting the set of all (non-null) ids coming from X. stan On Wed, Jan 25, 2012 at 10:26 PM, Stan Rosenberg <srosenb...@proclivitysystems.com> wrote: > I don't see how flatten would help in this case. > > On Wed, Jan 25, 2012 at 10:19 PM, Prashant Kommireddi > <prash1...@gmail.com> wrote: >> Hi Stan, >> >> Would using FLATTEN and then DISTINCT work? >> >> Thanks, >> Prashant >> >> On Wed, Jan 25, 2012 at 7:11 PM, Stan Rosenberg < >> srosenb...@proclivitysystems.com> wrote: >> >>> Hi Guys, >>> >>> I came across a use case that seems to require an 'explode' operation >>> which to my knowledge is not currently available. >>> That is, given a tuple (x,y,z), 'explode' would generate the tuples >>> (x), (y), (z). >>> >>> E.g., consider a relation that contains an arbitrary number of >>> different identifier columns, say, >>> social security id, student id, etc. We want to compute the set of >>> all distinct identifiers. Assume that the number of identifier >>> columns is large and intermingled with other >>> columns that should be projected out; this is to avoid a solution >>> using 'SPLIT', e.g. >>> >>> To be concrete, if X = {(..., 2, 4, ..., 3), (..., 2,,...,5)} is such >>> a relation, then the answer we want is >>> Y={2,3,4,5}. >>> >>> Any suggestions? >>> >>> Thanks, >>> >>> stan >>>