://www.nabble.com/Finding-small-subset-in-very-large-dataset-tp21964853p22081608.html
>>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>
> --
> View this message in context:
> http://www.nabble.com/Finding-small-subset-in-very-large-dataset-tp21964853p22082598.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
t Nabble.com.
>>
>>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
--
View this message in context:
http://www.nabble.com/Finding-small-subset-in-very-large-dataset-tp21964853p22082598.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.
this process up?
>
> Thanks,
> Thibaut
> --
> View this message in context:
> http://www.nabble.com/Finding-small-subset-in-very-large-dataset-tp21964853p22081608.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Is there something else I could do to speed this process up?
Thanks,
Thibaut
--
View this message in context:
http://www.nabble.com/Finding-small-subset-in-very-large-dataset-tp21964853p22081608.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.
lution I was
> looking for :-)
>
> Thibaut
> --
> View this message in context:
> http://www.nabble.com/Finding-small-subset-in-very-large-dataset-tp21964853p21977132.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>
--
The University
Thanks,
I didn't think about the bloom filter variant. That's the solution I was
looking for :-)
Thibaut
--
View this message in context:
http://www.nabble.com/Finding-small-subset-in-very-large-dataset-tp21964853p21977132.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.
gt; > running the map input and filtering the output collector or the input
> based
> > on the results from the reduce phase.
> >
> > Or is there another faster way? Collection A could be so big that it
> > doesn't
> > fit into the memory. I could split collection A up in
nother faster way? Collection A could be so big that it
> doesn't
> fit into the memory. I could split collection A up into multiple smaller
> collections, but that would make it more complicated, so I want to evade
> that route. (This is similar to the approach I described a
ww.nabble.com/Finding-small-subset-in-very-large-dataset-tp21964853p21964853.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.