Some hints:

1) For features, you could start with unit tests available with hadoop fs.
For performance, compare various bench results.

3) I could see at least 2 reasons for that. It could be that your
filesystem does not support locality, so tasks are not executed on the same
node as the data.
Lot of stuff are done under the hood, maybe your fs has some lack for a
very specific use case.

You should choose your benchmarks very carefully to make sure they actually
test what you want to test (i.e. not cpu)

Julien

2013/2/21 Ling Kun <erlv5...@gmail.com>

> Dear all,
>     I am currently look into some other filesystem implementation, like
> lustre, gluster, or other NAS with POSIX support, and trying to replace
> HDFS with it.
>
>     I have implement a filesystem class( AFS)  which will provide
> interface to Hadoop MapReduce, like the one of RawLocalFileSystem, and
> examples like wordcount, terasort works well.
>
>    However, I am not sure whether my implementation is correct for all the
> MapReduce applications that Hadoop MapReduce+Hadoop HDFS can run.
>
>    My question is :
> 1. How Hadoop community do MapReduce regression test for any update of
> Hadoop HDFS and Hadoop  MapReduce
>
> 2. Beside MapReduce wordcount and Terasort examples, are there any missing
> filesystem interface support for MapReduce application. Since the
> FileSystem has POSIX support, the hsync have also supported.
>
> 3. According to my test, the performance is worse than the HDFS+MapReduce.
> Any suggestion or hint on the performance analysis? ( Without MapReduce,
> the performance of the filesystem is better than HDFS and also local
> filesystem).
> 3.1 the following are the same for the performance comparation:
> 3.1.1 architecture: 4 node for MR, and another different 4 nodes for
> HDFS/AFS
> 3.1.2 application: the input size , the number of mapper and reducers are
> the same.
>
>
> Thanks.
>
> Ling Kun
>
>

Reply via email to