Re: everything becomes very slow when the number of writes is larger than the size of the cluster using *TestDFSIO* benchmark?

2008-05-15 Thread Samuel Guo
estDFSIO -write -nrFiles n -fileSize 2048 -bufferSize 65536 -resFile resfile *TestDFSIO* (/src/test/org/apache/hadoop/fs/TestDFSIO.java) > Raghu. > > Samuel Guo wrote: > >> Hi all, >> >> I run the *TestDFSIO* benchmark on a simple cluster of 2 nodes. >> The file size

everything becomes very slow when the number of writes is larger than the size of the cluster using *TestDFSIO* benchmark?

2008-05-13 Thread Samuel Guo
Hi all, I run the *TestDFSIO* benchmark on a simple cluster of 2 nodes. The file size is the same in all cases 2GB. The number of files tried is 1,2,4,8(only write). The bufferSize is 65536 bytes. The file replication is 1. the results as below: files 1 2 4 8 write -- Throughout(mb/s) 52.89 52.

Replication failover of HDFS

2008-05-06 Thread Samuel Guo
Hi all, I am reading the hadoop source code to study the design of the hadoop distributed filesystem. And I think I've got some questions about the file replication of HDFS. I know the degree of replication of HDFS is configurable on a configure file such as "hadoop-default.xml". The default degr

Re: Distributed indexing

2008-04-28 Thread Samuel Guo
Ted Dunning 写道: Check out the bailey and katta projects on sourceforge. I get nothing when checking out the katta project on sourceforge :( Also take a look at Nutch. Hadoop is certainly good for indexing and it isn't that hard to put distributed search alongside hadoop with indexes being p

Re: Distributed indexing

2008-04-28 Thread Samuel Guo
map/reduce will be a suitable approach for indexing large doc collections. but I don't know is it suitable for retrieval. you can see *Nutch* for the distributed searching. under the hadoop/contrib directory , there is a *Index* package. It may be helpful :) Matt Wood 写道: Hello all, I was

java.lang.NoClassDefFoundError: org/apache/lucene/index/IndexDeletionPolicy in contrib/index

2008-04-27 Thread Samuel Guo
me known:) Thanks in advanced! Best Wishes Samuel Guo

Any API used to get the last modified time of the File in HDFS?

2008-04-20 Thread Samuel Guo
Hi all, Can anyone tell me : is there any api I can use to get the metadata info such as the last modified time and etc. of a File in hdfs? Thanks a lot :) Best Wishes:) Samuel Guo