how to configure a remote client? Has anyone tried this before?

2008-08-06 Thread Richard Zhang
Hi folks: How to configure a remote client to the HDFS? For a cluster with a few nodes, could we make one node as the remote client but not one of the data nodes/task nodes? What to change the hadoop-site.xml to configure this? Has anyone tried this before? Thanks. Richard

!!!Help: Strange difference on the number of maps in HDFS and local file system

2008-07-10 Thread Richard Zhang
Hello Hadoopers: I am trying to running the same map reduce job on HDFS and local file system. That is one time, I run the map reduce job on HDFS and another time I run the same map reduce job with the same input data on local file ext3 system without using HDFS. I found that the number of maps g

Re: Help: how to check the active datanodes?

2008-07-03 Thread Richard Zhang
health. Port is specified > by > >dfs.http.address > in your hadoop-default.xml. >If the datanodes status is not as expected, you need to check log files. > They show the details of failure. > > On Fri, Jul 4, 2008 at 4:17 AM, Richard Zhang <[EMAIL PROTECTED]> &g

Help: how to check the active datanodes?

2008-07-03 Thread Richard Zhang
Hi guys: I am running hadoop on a 8 nodes cluster. I uses start-all.sh to boot hadoop and it shows that all 8 data nodes are started. However, when I use bin/hadoop dfsadmin -report to check the status of the data nodes and it shows only one data node (the one with the same host as name node) is a

Is this a design setting:hadoop can not do concurrent writing on different data nodes in the same cluster?

2008-07-03 Thread Richard Zhang
Hi Hadoop folks: I have a 8 nodes cluster and use copyFromLocal to write some text workload into dfs. I tried to write it on different nodes in the same cluster. That is ,I log on to each machine in the same cluster after hadoop boots. In each machine, I use: bin/hadoop dfs -copyFromLocal workload.

Re: Apache Hadoop Wins Terabyte Sort Benchmark

2008-07-02 Thread Richard Zhang
Congratulations! A good news for hadoopers. Richard On Wed, Jul 2, 2008 at 5:36 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: > Congratulations! > > The note mentions "a couple of optimization patches to remove > intermediate writes to disk". Is there a jira / svn revision number for > these optimiz

randomtextwriter can not write 100GB text file on a 500GB cluster

2008-06-29 Thread Richard Zhang
Hi folks: I am trying to write a 100GB text file on a cluster with 500GB free storage space. For smaller scale writing such as 100MB, 1G, it works fine. But it shows the DFS client can not complete writing errors when I ran with 100GB writing. Does anyone has any ideas on these types of errors or m

are there any large data set to test the map reduce program on hadoop?

2008-06-27 Thread Richard Zhang
Hello Folks: I wrote a map reduce program for analyzing text files. I would like to use a large data set with text files to test the performance of the program. Are there any available text data set which can be used to test programs on Hadoop? If you know, please let me know. Thanks. Richard

Re: InputSplit boundaries

2008-06-26 Thread Richard Zhang
The file system block size is the upper bound of the split size. The min split size can be set up by users. On Thu, Jun 26, 2008 at 12:23 PM, Naama Kraus <[EMAIL PROTECTED]> wrote: > Thanks for the input. Naama > > On Thu, Jun 26, 2008 at 2:12 PM, Amar Kamat <[EMAIL PROTECTED]> wrote: > > > Naama

Anyone knows how to input the cluster racks architecture information?

2008-06-26 Thread Richard Zhang
Hi Guys: I am running map reduce job on the 20 nodes cluster. I have the architecture of the cluster such as how the racks are organized. How to input these information into hadoop? I suppose these information should be useful for hadoop. But, when searching the code, I did not find any class/metho

Re: is there a way to to debug hadoop from Eclipse

2008-06-17 Thread Richard Zhang
> and then you'd go to eclips ->run->open debug dialog and set up remote > > debugging with the correct port. > > > > if you find out a way to debug the mappers/reducers on eclipse, let me > > know :D > > > > > > On Jun 16, 2008, at 3:10 PM, Richard Zha

Re: is there a way to to debug hadoop from Eclipse

2008-06-16 Thread Richard Zhang
> > Hello Hadoopers: > > Is there a way to debug the hadoop code from Eclipse IDE? I am using Eclipse > to read the source and build the project now. > How to start the hadoop jobs from Eclipse? Say if we can put the server > names, could we trace the running process through > eclipse, such as se

does anyone have idea on how to run multiple sequential jobs with bash script

2008-06-10 Thread Richard Zhang
Hello folks: I am running several hadoop applications on hdfs. To save the efforts in issuing the set of commands every time, I am trying to use bash script to run the several applications sequentially. To let the job finishes before it is proceeding to the next job, I am using wait in the script l

creating less than 10G data with RandomWriter

2008-06-02 Thread Richard Zhang
Hello Hadoopers: I am running the RandomWrite on a 8 nodes cluster. Because the default setting is creating 1G/mapper, 10mappers/host. Considering replications, it is essentially creating 30G/host. Because each node in the cluster has at most 30G. So my cluster is full and can not execute further c

create less than 10G data/host with RandomWrite

2008-06-02 Thread Richard Zhang
Hello Hadoopers: I am running the RandomWrite on a 8 nodes cluster. Because the default setting is creating 1G/mapper, 10mappers/host. Considering replications, it is essentially creating 30G/host. Because each node in the cluster has at most 30G. So my cluster is full and can not execute further c