To clarify, here is our input:

X = LOAD 'input.txt' AS (id1:chararray, id2:charrarray,
id3:charrarray, id4:chararray, id5:chararray);

We want to compute Y that consists of a single column denoting the set
of all (non-null) ids coming from X.

stan


On Wed, Jan 25, 2012 at 10:26 PM, Stan Rosenberg
<srosenb...@proclivitysystems.com> wrote:
> I don't see how flatten would help in this case.
>
> On Wed, Jan 25, 2012 at 10:19 PM, Prashant Kommireddi
> <prash1...@gmail.com> wrote:
>> Hi Stan,
>>
>> Would using FLATTEN and then DISTINCT work?
>>
>> Thanks,
>> Prashant
>>
>> On Wed, Jan 25, 2012 at 7:11 PM, Stan Rosenberg <
>> srosenb...@proclivitysystems.com> wrote:
>>
>>> Hi Guys,
>>>
>>> I came across a use case that seems to require an 'explode' operation
>>> which to my knowledge is not currently available.
>>> That is, given a tuple (x,y,z), 'explode' would generate the tuples
>>> (x), (y), (z).
>>>
>>> E.g., consider a relation that contains an arbitrary number of
>>> different identifier columns, say,
>>> social security id, student id, etc.  We want to compute the set of
>>> all distinct identifiers.  Assume that the number of identifier
>>> columns is large and intermingled with other
>>> columns that should be projected out; this is to avoid a solution
>>> using 'SPLIT', e.g.
>>>
>>> To be concrete, if X = {(..., 2, 4, ..., 3), (..., 2,,...,5)} is such
>>> a relation, then the answer we want is
>>> Y={2,3,4,5}.
>>>
>>> Any suggestions?
>>>
>>> Thanks,
>>>
>>> stan
>>>

Reply via email to