Choice of Bloom Filter implementation in Hadoop application

2009-06-05 Thread Ivan Balashov
Dear all, As part of optimization process in our Hadoop application we're trying to use Bloom filter in order not to pass needless records through to the reduce stage. We've noticed, that Hadoop dev team recently introduced the implementation of BloomMapFile (https://issues.apache.org/jira/browse

Using multiple FileSystems in hadoop input

2009-05-06 Thread Ivan Balashov
Greetings to all, Could anyone suggest if Paths from different FileSystems can be used as input of Hadoop job? Particularly I'd like to find out whether Paths from HarFileSystem can be mixed with ones from DistributedFileSystem. Thanks, -- Kind regards, Ivan