from:"Samuel Guo"

Re: everything becomes very slow when the number of writes is larger than the size of the cluster using TestDFSIO benchmark?

2008-05-15 Thread Samuel Guo

estDFSIO -write -nrFiles n -fileSize 2048 -bufferSize 65536 -resFile resfile *TestDFSIO* (/src/test/org/apache/hadoop/fs/TestDFSIO.java) > Raghu. > > Samuel Guo wrote: > >> Hi all, >> >> I run the *TestDFSIO* benchmark on a simple cluster of 2 nodes. >> The file size

everything becomes very slow when the number of writes is larger than the size of the cluster using TestDFSIO benchmark?

2008-05-13 Thread Samuel Guo

Hi all, I run the *TestDFSIO* benchmark on a simple cluster of 2 nodes. The file size is the same in all cases 2GB. The number of files tried is 1,2,4,8(only write). The bufferSize is 65536 bytes. The file replication is 1. the results as below: files 1 2 4 8 write -- Throughout(mb/s) 52.89 52.

Replication failover of HDFS

2008-05-06 Thread Samuel Guo

Hi all, I am reading the hadoop source code to study the design of the hadoop distributed filesystem. And I think I've got some questions about the file replication of HDFS. I know the degree of replication of HDFS is configurable on a configure file such as "hadoop-default.xml". The default degr

Re: Distributed indexing

2008-04-28 Thread Samuel Guo

Ted Dunning 写道: Check out the bailey and katta projects on sourceforge. I get nothing when checking out the katta project on sourceforge :( Also take a look at Nutch. Hadoop is certainly good for indexing and it isn't that hard to put distributed search alongside hadoop with indexes being p

Re: Distributed indexing

2008-04-28 Thread Samuel Guo

map/reduce will be a suitable approach for indexing large doc collections. but I don't know is it suitable for retrieval. you can see *Nutch* for the distributed searching. under the hadoop/contrib directory , there is a *Index* package. It may be helpful :) Matt Wood 写道: Hello all, I was

java.lang.NoClassDefFoundError: org/apache/lucene/index/IndexDeletionPolicy in contrib/index

2008-04-27 Thread Samuel Guo

me known:) Thanks in advanced! Best Wishes Samuel Guo

Any API used to get the last modified time of the File in HDFS?

2008-04-20 Thread Samuel Guo

Hi all, Can anyone tell me : is there any api I can use to get the metadata info such as the last modified time and etc. of a File in hdfs? Thanks a lot :) Best Wishes:) Samuel Guo

Re: everything becomes very slow when the number of writes is larger than the size of the cluster using TestDFSIO benchmark?

everything becomes very slow when the number of writes is larger than the size of the cluster using TestDFSIO benchmark?

Replication failover of HDFS

Re: Distributed indexing

Re: Distributed indexing

java.lang.NoClassDefFoundError: org/apache/lucene/index/IndexDeletionPolicy in contrib/index

Any API used to get the last modified time of the File in HDFS?

7 matches

Site Navigation

Mail list logo

Footer information

Re: everything becomes very slow when the number of writes is larger than the size of the cluster using *TestDFSIO* benchmark?

everything becomes very slow when the number of writes is larger than the size of the cluster using *TestDFSIO* benchmark?

Replication failover of HDFS

Re: Distributed indexing

Re: Distributed indexing

java.lang.NoClassDefFoundError: org/apache/lucene/index/IndexDeletionPolicy in contrib/index

Any API used to get the last modified time of the File in HDFS?

7 matches

Mail list logo

Re: everything becomes very slow when the number of writes is larger than the size of the cluster using TestDFSIO benchmark?

everything becomes very slow when the number of writes is larger than the size of the cluster using TestDFSIO benchmark?