Re: Finding small subset in very large dataset

2009-02-18 Thread Miles Osborne
://www.nabble.com/Finding-small-subset-in-very-large-dataset-tp21964853p22081608.html >>> Sent from the Hadoop core-user mailing list archive at Nabble.com. >>> >>> >> >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> > > -- > View this message in context: > http://www.nabble.com/Finding-small-subset-in-very-large-dataset-tp21964853p22082598.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Re: Finding small subset in very large dataset

2009-02-18 Thread Thibaut_
t Nabble.com. >> >> > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > -- View this message in context: http://www.nabble.com/Finding-small-subset-in-very-large-dataset-tp21964853p22082598.html Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Finding small subset in very large dataset

2009-02-18 Thread Miles Osborne
this process up? > > Thanks, > Thibaut > -- > View this message in context: > http://www.nabble.com/Finding-small-subset-in-very-large-dataset-tp21964853p22081608.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Re: Finding small subset in very large dataset

2009-02-18 Thread Thibaut_
Is there something else I could do to speed this process up? Thanks, Thibaut -- View this message in context: http://www.nabble.com/Finding-small-subset-in-very-large-dataset-tp21964853p22081608.html Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Finding small subset in very large dataset

2009-02-12 Thread Miles Osborne
lution I was > looking for :-) > > Thibaut > -- > View this message in context: > http://www.nabble.com/Finding-small-subset-in-very-large-dataset-tp21964853p21977132.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > -- The University

Re: Finding small subset in very large dataset

2009-02-12 Thread Thibaut_
Thanks, I didn't think about the bloom filter variant. That's the solution I was looking for :-) Thibaut -- View this message in context: http://www.nabble.com/Finding-small-subset-in-very-large-dataset-tp21964853p21977132.html Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Finding small subset in very large dataset

2009-02-11 Thread Aaron Kimball
gt; > running the map input and filtering the output collector or the input > based > > on the results from the reduce phase. > > > > Or is there another faster way? Collection A could be so big that it > > doesn't > > fit into the memory. I could split collection A up in

Re: Finding small subset in very large dataset

2009-02-11 Thread Amit Chandel
nother faster way? Collection A could be so big that it > doesn't > fit into the memory. I could split collection A up into multiple smaller > collections, but that would make it more complicated, so I want to evade > that route. (This is similar to the approach I described a

Finding small subset in very large dataset

2009-02-11 Thread Thibaut_
ww.nabble.com/Finding-small-subset-in-very-large-dataset-tp21964853p21964853.html Sent from the Hadoop core-user mailing list archive at Nabble.com.