Re: hadoop streaming using a java program as mapper

2012-05-01 Thread Boyu Zhang
ode? > > thuhuang...@gmail.com > > > > 在 2012-5-2,下午1:17, Boyu Zhang 写道: > > > Hi All, > > > > I am in a little bit strange situation, I am using Hadoop streaming to > run > > a bash shell program myMapper.sh, and in the myMapper.sh, it call

the time format of the log files

2011-09-07 Thread Boyu Zhang
Hi all, I need to parse the _log/history file to get the running time for map and reduce phases. And the log is in this format: Task TASKID="task_201109071413_0001_m_00" TASK_TYPE="MAP" * START_TIME="1315430092923"* SPLITS="/default-rack/dash-0-11\.sdsc\.edu" . MapAttempt TASK_TYPE="MAP" TASK

Re: HOD exception: java.io.IOException: No valid local directories in property: mapred.local.dir

2011-04-13 Thread Boyu Zhang
Thanks a lot for the comments, but I set the mapred.local.dir to /tmp which is a dir on every local machies. Still I got the same error, and I use the same conf file, with 3 nodes (I have this problem when use 4 nodes), I don't have the problem. Any idea what problem it may be? Thanks a lot. And

HOD exception: java.io.IOException: No valid local directories in property: mapred.local.dir

2011-04-11 Thread Boyu Zhang
Hi All, I was trying to run the program using HOD on a cluster, when I allocate using 5 nodes, it runs fine, but when I allocate using 6 nodes, everytime I tried to run a program, I get this error: 11/04/11 19:45:50 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usag

Corrupted input data to map

2010-10-15 Thread Boyu Zhang
Hi all, I am running a program with input 1 million lines of data, among the 1 million, 5 or 6 lines data are corrupted. The way the are corrupted is: in the position which a float number is expected, like 3.4 , instead of a float number, something like this is there: 3.4.5.6 . So when the map run

Re: nodes with different memory sizes

2010-10-15 Thread Boyu Zhang
D. Refer to the documentation > in > http://hadoop.apache.org/common/docs/r0.20.2/hod_config_guide.html#3.2+hod+options > . > > Thanks > Hemanth > > On Sat, Oct 9, 2010 at 12:11 AM, Boyu Zhang wrote: > > Hi Pablo, > > > > thank you for the reply. Actually I forgot

Re: nodes with different memory sizes

2010-10-08 Thread Boyu Zhang
.org/common/docs/current/cluster_setup.html > ) > > e.g.: > mapred.child.java.opts-Xmx8G > > > I hope this helps > Yours > Pablo Cingolani > > > > On Fri, Oct 8, 2010 at 12:17 PM, Boyu Zhang wrote: > > Dear All, > > > > I am trying to run

nodes with different memory sizes

2010-10-08 Thread Boyu Zhang
Dear All, I am trying to run a memory hungry program in a cluster with 6 nodes, among the 6 nodes, 2 of them have 32 G memory, and the rest have 16 G memory. I am wondering is there a way of configuring the cluster so that the process run in the big nodes have more memory while the process run in

Re: Passing information to Map Reduce

2010-08-13 Thread Boyu Zhang
Hi Pete, Maybe you can set a job configuration entry to the value you want, and get that entry value in the map program. Boyu On Fri, Aug 13, 2010 at 3:55 PM, Pete Tyler wrote: > When my Java based client creates a mapreduce Job instance I can set the > job name, which is readable by the map an

Re: large parameter file, too many intermediate output

2010-08-13 Thread Boyu Zhang
07 PM, Harsh J wrote: > Apart from the combiner suggestion, I'd also suggest using > intermediate map-output compression always (With LZO, if possible). > Saves you some IO. > > On Fri, Aug 13, 2010 at 3:24 AM, Boyu Zhang wrote: > > Hi Steve, > > > > Thanks for t

Re: large parameter file, too many intermediate output

2010-08-12 Thread Boyu Zhang
Hi Steve, Thanks for the reply! On Thu, Aug 12, 2010 at 5:47 PM, Steve Lewis wrote: > I don't think of half a billion key value pairs as that large a number - > nor 20,000 per task - these are > not atypical for hadoop tasks and many users will see these as small > numbers > while you might us

Re: large parameter file, too many intermediate output

2010-08-12 Thread Boyu Zhang
> Hi Himanshu, > > Thanks for the reply! > > On Thu, Aug 12, 2010 at 5:08 PM, Himanshu Vashishtha < > vashishth...@gmail.com> wrote: > >> it seems each input line is generating 500 kv pairs (which is kind of >> exploding the data), so chunking/not chunking will not make difference. >> >> I am wonde

Re: large parameter file, too many intermediate output

2010-08-12 Thread Boyu Zhang
> > > On Thu, Aug 12, 2010 at 12:58 PM, Boyu Zhang > wrote: > > > Dear All, > > > > I am working on this algorithm that is kind of like "clustering" data > set. > > The algorithm is like this: > > > > The data set is broken into N(~40

large parameter file, too many intermediate output

2010-08-12 Thread Boyu Zhang
Dear All, I am working on this algorithm that is kind of like "clustering" data set. The algorithm is like this: The data set is broken into N(~40) chunks, each chunk contains 2,000 lines. For each mapper, it retrieves a "parameter file" which contains 500 lines and for each line read from the da

HOD on Scyld

2010-04-12 Thread Boyu Zhang
Dear All, I have been trying to use HOD on Scyld as a common user(not root), but I have some problem to get it start. I am wondering did anyone use HOD on Scyld cluster successfully? Any help would be appreciated, thanks! Boyu

Re: hadoop on demand setup: Failed to retrieve 'hdfs' service address

2010-04-08 Thread Boyu Zhang
... And when I check the file, the who dir is not there. And do you know how to check the namenode/datanode logs? I can't find them anywhere. Thanks a lot! Boyu On Thu, Apr 8, 2010 at 4:58 PM, Kevin Van Workum wrote: > On Thu, Apr 8, 2010 at 2:23 PM, Boyu Zhang wrote: > > Hi Kevin

Re: hadoop on demand setup: Failed to retrieve 'hdfs' service address

2010-04-08 Thread Boyu Zhang
Hi Kevin, I am having the same error, but my critical error is: [2010-04-08 13:47:25,304] CRITICAL/50 hadoop:303 - Cluster could not be allocated because of the following errors. Hodring at n0 failed with following errors: JobTracker failed to initialise Have you solved this? Thanks! Boyu On T

HOD: JobTracker failed to initialise

2010-04-08 Thread Boyu Zhang
Dear All, I am trying to install HOD on a cluster. When I tried to allocate a new Hadoop cluster, I got the following error: [2010-04-08 13:47:25,304] CRITICAL/50 hadoop:303 - Cluster could not be allocated because of the following errors. Hodring at n0 failed with following errors: JobTracker fa

Re: HOD on Scyld

2010-03-23 Thread Boyu Zhang
f > Also make sure that, in the file /path/to/hod/conf/hodrc you set > "java-home" > (under both [hod] and [hodring] ) to a working JRE or JDK in your system. > > Does that work? > > Antonio > > On Tue, Mar 23, 2010 at 3:34 PM, Boyu Zhang wrote: > > > t

Re: HOD on Scyld

2010-03-23 Thread Boyu Zhang
.x executable (not the > directory). > > Antonio > > On Mon, Mar 22, 2010 at 5:07 PM, Boyu Zhang wrote: > > > Updata: I used the command: $ bin/hod allocate -d /home/zhang/cluster -n > 4 > > -c /home/zhang/hadoop-0.20.2/contrib/hod/conf/hodrc -t > > /home/zhang/

Re: Python problem with HOD

2010-03-22 Thread Boyu Zhang
problem. > >> It works fine now. But I met the problem on the other cluster in which I > >> believe the python version is 2.4.3. > >> > >> > >> On Mon, Mar 22, 2010 at 4:35 PM, Boyu Zhang >wrote: > >> > >>> Dear All, > >>>

Re: 2 HOD Questions

2010-03-22 Thread Boyu Zhang
t; thought > > I used 2.52, and set hodrc for it. But I didnt use the HOD_PYTHON_HOME > env > > var. > > > > Cheers. > > Song Liu > > > > > > On Mon, Mar 22, 2010 at 4:56 PM, Boyu Zhang > wrote: > > > >> Thanks for the reply

Re: 2 HOD Questions

2010-03-22 Thread Boyu Zhang
. I thought > I used 2.52, and set hodrc for it. But I didnt use the HOD_PYTHON_HOME env > var. > > Cheers. > Song Liu > > On Mon, Mar 22, 2010 at 4:56 PM, Boyu Zhang wrote: > > > Thanks for the reply. I use python 2.6.5. > > > > I overcame the problem by

Re: 2 HOD Questions

2010-03-22 Thread Boyu Zhang
On Mon, Mar 22, 2010 at 12:52 PM, Song Liu wrote: > Hi! > Unfortunately, I didnt, and are still waiting for solution. > And which python version are u using? > > Regards > Song Liu > > On Mon, Mar 22, 2010 at 3:24 PM, Boyu Zhang wrote: > > > Hi, > >

Python problem with HOD

2010-03-22 Thread Boyu Zhang
Dear All, sorry to bother again. I overcame the Uncaught Exception : need more than 2 values to unpack by export HOD_PYTHON_HOME. But now I had a new error. $ bin/hod allocate -d /home/zhang/cluster -n 4 -c /home/zhang/hadoop-0.20.2/contrib/hod/conf/hodrc -t /home/zhang/hadoop-0.20.

Re: HOD on Scyld

2010-03-22 Thread Boyu Zhang
environment variable? Thanks a lot! On Mon, Mar 22, 2010 at 11:52 AM, Boyu Zhang wrote: > Dear All, > > I have been trying to get HOD working on a cluster running Scyld. But there > are some problems. I configured the minimum configurations. > > 1. I executed the command: >

HOD on Scyld

2010-03-22 Thread Boyu Zhang
Uncaught Exception : need more than 2 values to unpack Could anyone tell me why am I having this error? Is the problem the operating system, or Torque, or because I commented out line 576, or anything else? Any comment is welcome and appreciated. Thanks a lot! Sincerely, Boyu Zhang

Re: 2 HOD Questions

2010-03-22 Thread Boyu Zhang
Hi, I am having the same exception too. Uncaught Exception : need more than 2 values to unpack Did you solve this? On Mon, Mar 15, 2010 at 10:55 AM, Song Liu wrote: > Hi all, I have two questions about HOD > > 1. I confiured and setup a HOD on one cluster, it works fine, but when I > finished

Installation On A Cluster With SSH Only To The Front Node

2010-03-16 Thread Boyu Zhang
, it seems like that my cluster has different architecture from the cluster it refers to. Does anyone have the experience installing hadoop on a cluster like mine? Is there any link I can refer to? Any help is appreciated, thanks a lot! SIncerely, Boyu Zhang

Re: TaskTracker: Java heap space error

2010-03-11 Thread Boyu Zhang
umping up the heap for the map task? > > Since you are setting io.sort.mb to 256M, pls set heap-size to 512M at > least, if not more. > > mapred.child.java.opts -> -Xmx512M or -Xmx1024m > > Arun > > > On Mar 11, 2010, at 8:24 AM, Boyu Zhang wrote: > > Dear

TaskTracker: Java heap space error

2010-03-11 Thread Boyu Zhang
Dear All, I am running a hadoop job processing data. The output of map is really large, and it spill 15 times. So I was trying to set io.sort.mb = 256 instead of 100. And I leave everything else default. I am using 0.20.2 version. And when I run the job, I got the following errors: 2010-03-11 11:

Re: Re: Questions About Passing Parameters to Hado op Job

2009-11-22 Thread Boyu Zhang
Hi Karthik, Thanks a lot for the information! I will look into it and try! Boyu 2009/11/22 Karthik Kambatla > Though it is recommended for large files, DistributedCache might be a good > alternative for you. > > > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/Di

Re: Re: Questions About Passing Parameters to Hado op Job

2009-11-22 Thread Boyu Zhang
actually how map-side join works. > > > > > > Gang Luo > > - > > Department of Computer Science > > Duke University > > (919)316-0993 > > gang@duke.edu > > > > > > > > - 原始邮件 > > 发件人: Boyu Zhang &

Questions About Passing Parameters to Hadoop Job

2009-11-22 Thread Boyu Zhang
they are large, in a sense. My current way is that I use the job.set and job.get to set and retrieve these lines as configurations. But it is not efficient at all! Could anyone help me with an alternative solution? Thanks a million! Boyu Zhang University of Delaware

Re: How To Pass Parameters To Mapper Through Main Method

2009-10-26 Thread Boyu Zhang
assuming the size isnt > relatively large ) and use configure / setup to retrieve these.. Or use > distributed cache to read a file containing these lines ( possibly with jvm > reuse if you want that extra bit as well. ) > > Thanks, > Amogh > > On 10/26/09 6:17 AM, "Boyu

How To Pass Parameters To Mapper Through Main Method

2009-10-25 Thread Boyu Zhang
? Thanks a lot for reading my email, really appreciate any help! Boyu Zhang(Emma) University of Delaware

Hadoop Input File Directory

2009-09-11 Thread Boyu Zhang
input data to the top level? Thanks a lot! Boyu Zhang University of Delaware

Hadoop Input Files Directory

2009-09-11 Thread Boyu Zhang
/dir_0. My question is that is there any way that I can process all the files in a hierarchy with the input path set to the top level? Thanks a lot for the time! Boyu Zhang University of Delaware

Re: How To Run Multiple Map & Reduce Functions In One Job

2009-09-04 Thread Boyu Zhang
... > > On Fri, Sep 4, 2009 at 12:22 PM, Boyu Zhang wrote: > > > Yes, the output of the first iteration is the input of the second > > iteration. > > Actually, I am trying the page ranking problem. In the algorithm, you > have > > to run several iterations each u

Re: How To Run Multiple Map & Reduce Functions In One Job

2009-09-04 Thread Boyu Zhang
separate > > job > > > configs for them. You can pass these different configs to the Tool > object > > > in > > > the same parent class... But they will essentially be different jobs > > being > > > called together from inside the same java parent clas

Re: How To Run Multiple Map & Reduce Functions In One Job

2009-09-04 Thread Boyu Zhang
ssue?? > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz > > > On Fri, Sep 4, 2009 at 11:36 AM, Boyu Zhang wrote: > > > Dear All, > > > > I am using Hadoop 0.20.0. I have an application that needs to run >

How To Run Multiple Map & Reduce Functions In One Job

2009-09-04 Thread Boyu Zhang
for your time! Boyu Zhang(Emma)

RE: Questions on How the Namenode Assign Blocks to Datanodes

2009-07-24 Thread Boyu Zhang
Thanks for the tip, I am reading the code now, thanks a lot! Boyu Zhang -Original Message- From: Hairong Kuang [mailto:hair...@yahoo-inc.com] Sent: Friday, July 24, 2009 1:55 PM To: common-user@hadoop.apache.org Subject: Re: Questions on How the Namenode Assign Blocks to Datanodes

RE: Questions on How the Namenode Assign Blocks to Datanodes

2009-07-24 Thread Boyu Zhang
Thank you for the reply. Do you by any chance remember where did you read this? Thanks a lot! Boyu -Original Message- From: Hairong Kuang [mailto:hair...@yahoo-inc.com] Sent: Friday, July 24, 2009 12:54 PM To: common-user@hadoop.apache.org Subject: Re: Questions on How the Namenode A

RE: Questions on How the Namenode Assign Blocks to Datanodes

2009-07-24 Thread Boyu Zhang
replica = 1, I just use it for testing to see how HDFS works, what is the policy to decide which datanode gets which block? Thank you so much! Boyu Zhang Ph. D. Student Computer and Information Sciences Department University of Delaware (210) 274-2104 bzh...@udel.edu http://www.eecis.udel.edu

Questions on How the Namenode Assign Blocks to Datanodes

2009-07-23 Thread Boyu Zhang
). But what polices are used to assign these blocks to datanode? In my case, machine1 got 14 blocks, machine2 got 12 blocks and machine3 got 16 blocks. Could anyone one help me with it? Or is there any documentation I can read to help me clarify this? Thanks a lot! Boyu Zhang Ph. D. Student

Questions on How the Namenode Assign Blocks to Datanodes

2009-07-23 Thread Boyu Zhang
). But what polices are used to assign these blocks to datanode? In my case, machine1 got 14 blocks, machine2 got 12 blocks and machine3 got 16 blocks. Could anyone one help me with it? Or is there any documentation I can read to help me clarify this? Thanks a lot! Boyu Zhang Ph

Re: Datanode Cannot Connect To The Server

2009-07-17 Thread Boyu Zhang
st line in /etc/hosts, that this may happen. > > On Thu, Jul 16, 2009 at 1:54 PM, Boyu Zhang wrote: > > > Thank you for your suggestion. > > > > I have done that plenty of times, and every time I delete the pids and > the > > files /tmp/hadoop-name that namenode form

Re: Datanode Cannot Connect To The Server

2009-07-16 Thread Boyu Zhang
Thank you for your suggestion. I have done that plenty of times, and every time I delete the pids and the files /tmp/hadoop-name that namenode formate genrated. But I got the same error over and over. I found out that after I start-dfs.sh, I check the netstat -pnl(from the master), there is a p

Re: Datanode Cannot Connect To The Server

2009-07-16 Thread Boyu Zhang
actual machien names. Thanks a lot! 2009/7/16 zjffdu > Boyu, > > Can you ping the master node from the slave node ? And can you open task > web > UI on the slave ? > > > Jeff Zhang > > > > -Original Message- > From: Boyu Zhang [mailto:boyuzhan...

Re: Datanode Cannot Connect To The Server

2009-07-15 Thread Boyu Zhang
to get anything going. with > 0.20.0 i got it up and running in about 20 min > > > On Wed, Jul 15, 2009 at 4:24 PM, Boyu Zhang wrote: > > > I am using the version 0.18.3 right now, the configurations files you > > mentioned are available in the version 0.20. Anyway,

Re: Datanode Cannot Connect To The Server

2009-07-15 Thread Boyu Zhang
org/core/docs/current/cluster_setup.html#Site+Configuration > > thats a link to the site that should be useful. its the cluster setup just > follow it carefully and u should be good to go. > > * > On Wed, Jul 15, 2009 at 4:09 PM, Boyu Zhang wrote: > > > Thank you Divij, > > > &g

Re: Datanode Cannot Connect To The Server

2009-07-15 Thread Boyu Zhang
sed on the server > and > the port that the slave is trying to communicate on. > > On Wed, Jul 15, 2009 at 3:57 PM, Boyu Zhang wrote: > > > Dear all, > > > > I am trying to set up a cluster of two machines using Hadoop. One machine > > is > > both n

Datanode Cannot Connect To The Server

2009-07-15 Thread Boyu Zhang
Dear all, I am trying to set up a cluster of two machines using Hadoop. One machine is both namenode and jobtracker, the other machine is the datanode and tasktracker. I set up passwd-less ssh both directions. The comments I used are: > bin/hadoop namenode -format > bin/start-dfs.sh starti

RE: could only be replicated to 0 nodes, instead of 1

2009-07-14 Thread Boyu Zhang
corresponding directory. If your last job did not end correctly, you will have this kind of problems. Hope it could help. Boyu Zhang Ph. D. Student Computer and Information Sciences Department University of Delaware (210) 274-2104 bzh...@udel.edu http://www.eecis.udel.edu/~bzhang -Original

Questions on Hadoop On Demand (HOD)

2009-07-07 Thread Boyu Zhang
Dear all, This is a beginner to Hadoop. I am looking into how to use virtual machine in Hadoop, and I have some general questions about HOD. Can I consider HOD a virtual machine for running Hadoop jobs?(usually VMs are really big, but this one seems light-weighted.) Is the configuration

Questions on Hadoop On Demand (HOD)

2009-07-07 Thread Boyu Zhang
Dear all, This is a beginner to Hadoop. I am looking into how to use virtual machine in Hadoop, and I have some general questions about HOD. Can I consider HOD a virtual machine for running Hadoop jobs?(usually VMs are really big, but this one seems light-weighted.) Is the configuration