ode?
>
> thuhuang...@gmail.com
>
>
>
> 在 2012-5-2,下午1:17, Boyu Zhang 写道:
>
> > Hi All,
> >
> > I am in a little bit strange situation, I am using Hadoop streaming to
> run
> > a bash shell program myMapper.sh, and in the myMapper.sh, it call
Hi all,
I need to parse the _log/history file to get the running time for map and
reduce phases. And the log is in this format:
Task TASKID="task_201109071413_0001_m_00" TASK_TYPE="MAP" *
START_TIME="1315430092923"* SPLITS="/default-rack/dash-0-11\.sdsc\.edu" .
MapAttempt TASK_TYPE="MAP" TASK
Thanks a lot for the comments, but I set the mapred.local.dir to /tmp which
is a dir on every local machies.
Still I got the same error, and I use the same conf file, with 3 nodes (I
have this problem when use 4 nodes), I don't have the problem.
Any idea what problem it may be? Thanks a lot. And
Hi All,
I was trying to run the program using HOD on a cluster, when I allocate
using 5 nodes, it runs fine, but when I allocate using 6 nodes, everytime I
tried to run a program, I get this error:
11/04/11 19:45:50 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found
in the classpath. Usag
Hi all,
I am running a program with input 1 million lines of data, among the 1
million, 5 or 6 lines data are corrupted. The way the are corrupted is: in
the position which a float number is expected, like 3.4 , instead of a float
number, something like this is there: 3.4.5.6 . So when the map run
D. Refer to the documentation
> in
> http://hadoop.apache.org/common/docs/r0.20.2/hod_config_guide.html#3.2+hod+options
> .
>
> Thanks
> Hemanth
>
> On Sat, Oct 9, 2010 at 12:11 AM, Boyu Zhang wrote:
> > Hi Pablo,
> >
> > thank you for the reply. Actually I forgot
.org/common/docs/current/cluster_setup.html
> )
>
> e.g.:
> mapred.child.java.opts-Xmx8G
>
>
> I hope this helps
> Yours
> Pablo Cingolani
>
>
>
> On Fri, Oct 8, 2010 at 12:17 PM, Boyu Zhang wrote:
> > Dear All,
> >
> > I am trying to run
Dear All,
I am trying to run a memory hungry program in a cluster with 6 nodes, among
the 6 nodes, 2 of them have 32 G memory, and the rest have 16 G memory. I am
wondering is there a way of configuring the cluster so that the process run
in the big nodes have more memory while the process run in
Hi Pete,
Maybe you can set a job configuration entry to the value you want, and get
that entry value in the map program.
Boyu
On Fri, Aug 13, 2010 at 3:55 PM, Pete Tyler wrote:
> When my Java based client creates a mapreduce Job instance I can set the
> job name, which is readable by the map an
07 PM, Harsh J wrote:
> Apart from the combiner suggestion, I'd also suggest using
> intermediate map-output compression always (With LZO, if possible).
> Saves you some IO.
>
> On Fri, Aug 13, 2010 at 3:24 AM, Boyu Zhang wrote:
> > Hi Steve,
> >
> > Thanks for t
Hi Steve,
Thanks for the reply!
On Thu, Aug 12, 2010 at 5:47 PM, Steve Lewis wrote:
> I don't think of half a billion key value pairs as that large a number -
> nor 20,000 per task - these are
> not atypical for hadoop tasks and many users will see these as small
> numbers
> while you might us
> Hi Himanshu,
>
> Thanks for the reply!
>
> On Thu, Aug 12, 2010 at 5:08 PM, Himanshu Vashishtha <
> vashishth...@gmail.com> wrote:
>
>> it seems each input line is generating 500 kv pairs (which is kind of
>> exploding the data), so chunking/not chunking will not make difference.
>>
>> I am wonde
>
>
> On Thu, Aug 12, 2010 at 12:58 PM, Boyu Zhang
> wrote:
>
> > Dear All,
> >
> > I am working on this algorithm that is kind of like "clustering" data
> set.
> > The algorithm is like this:
> >
> > The data set is broken into N(~40
Dear All,
I am working on this algorithm that is kind of like "clustering" data set.
The algorithm is like this:
The data set is broken into N(~40) chunks, each chunk contains 2,000 lines.
For each mapper, it retrieves a "parameter file" which contains 500 lines
and for each line read from the da
Dear All,
I have been trying to use HOD on Scyld as a common user(not root), but I
have some problem to get it start. I am wondering did anyone use HOD on
Scyld cluster successfully? Any help would be appreciated, thanks!
Boyu
...
And when I check the file, the who dir is not there. And do you know how to
check the namenode/datanode logs? I can't find them anywhere. Thanks a lot!
Boyu
On Thu, Apr 8, 2010 at 4:58 PM, Kevin Van Workum wrote:
> On Thu, Apr 8, 2010 at 2:23 PM, Boyu Zhang wrote:
> > Hi Kevin
Hi Kevin,
I am having the same error, but my critical error is:
[2010-04-08 13:47:25,304] CRITICAL/50 hadoop:303 - Cluster could not be
allocated because of the following errors.
Hodring at n0 failed with following errors:
JobTracker failed to initialise
Have you solved this? Thanks!
Boyu
On T
Dear All,
I am trying to install HOD on a cluster. When I tried to allocate a new
Hadoop cluster, I got the following error:
[2010-04-08 13:47:25,304] CRITICAL/50 hadoop:303 - Cluster could not be
allocated because of the following errors.
Hodring at n0 failed with following errors:
JobTracker fa
f
> Also make sure that, in the file /path/to/hod/conf/hodrc you set
> "java-home"
> (under both [hod] and [hodring] ) to a working JRE or JDK in your system.
>
> Does that work?
>
> Antonio
>
> On Tue, Mar 23, 2010 at 3:34 PM, Boyu Zhang wrote:
>
> > t
.x executable (not the
> directory).
>
> Antonio
>
> On Mon, Mar 22, 2010 at 5:07 PM, Boyu Zhang wrote:
>
> > Updata: I used the command: $ bin/hod allocate -d /home/zhang/cluster -n
> 4
> > -c /home/zhang/hadoop-0.20.2/contrib/hod/conf/hodrc -t
> > /home/zhang/
problem.
> >> It works fine now. But I met the problem on the other cluster in which I
> >> believe the python version is 2.4.3.
> >>
> >>
> >> On Mon, Mar 22, 2010 at 4:35 PM, Boyu Zhang >wrote:
> >>
> >>> Dear All,
> >>>
t; thought
> > I used 2.52, and set hodrc for it. But I didnt use the HOD_PYTHON_HOME
> env
> > var.
> >
> > Cheers.
> > Song Liu
> >
> >
> > On Mon, Mar 22, 2010 at 4:56 PM, Boyu Zhang
> wrote:
> >
> >> Thanks for the reply
. I thought
> I used 2.52, and set hodrc for it. But I didnt use the HOD_PYTHON_HOME env
> var.
>
> Cheers.
> Song Liu
>
> On Mon, Mar 22, 2010 at 4:56 PM, Boyu Zhang wrote:
>
> > Thanks for the reply. I use python 2.6.5.
> >
> > I overcame the problem by
On Mon, Mar 22, 2010 at 12:52 PM, Song Liu wrote:
> Hi!
> Unfortunately, I didnt, and are still waiting for solution.
> And which python version are u using?
>
> Regards
> Song Liu
>
> On Mon, Mar 22, 2010 at 3:24 PM, Boyu Zhang wrote:
>
> > Hi,
> >
Dear All,
sorry to bother again. I overcame the
Uncaught Exception : need more than 2 values to unpack
by export HOD_PYTHON_HOME. But now I had a new error.
$ bin/hod allocate -d /home/zhang/cluster -n 4 -c
/home/zhang/hadoop-0.20.2/contrib/hod/conf/hodrc -t
/home/zhang/hadoop-0.20.
environment
variable? Thanks a lot!
On Mon, Mar 22, 2010 at 11:52 AM, Boyu Zhang wrote:
> Dear All,
>
> I have been trying to get HOD working on a cluster running Scyld. But there
> are some problems. I configured the minimum configurations.
>
> 1. I executed the command:
>
Uncaught Exception : need more than 2 values to unpack
Could anyone tell me why am I having this error? Is the problem the
operating system, or Torque, or because I commented out line 576, or
anything else?
Any comment is welcome and appreciated. Thanks a lot!
Sincerely,
Boyu Zhang
Hi,
I am having the same exception too. Uncaught Exception : need more than 2
values to unpack
Did you solve this?
On Mon, Mar 15, 2010 at 10:55 AM, Song Liu wrote:
> Hi all, I have two questions about HOD
>
> 1. I confiured and setup a HOD on one cluster, it works fine, but when I
> finished
, it seems like that my
cluster has different architecture from the cluster it refers to. Does
anyone have the experience installing hadoop on a cluster like mine? Is
there any link I can refer to? Any help is appreciated, thanks a lot!
SIncerely,
Boyu Zhang
umping up the heap for the map task?
>
> Since you are setting io.sort.mb to 256M, pls set heap-size to 512M at
> least, if not more.
>
> mapred.child.java.opts -> -Xmx512M or -Xmx1024m
>
> Arun
>
>
> On Mar 11, 2010, at 8:24 AM, Boyu Zhang wrote:
>
> Dear
Dear All,
I am running a hadoop job processing data. The output of map is really
large, and it spill 15 times. So I was trying to set io.sort.mb = 256
instead of 100. And I leave everything else default. I am using 0.20.2
version. And when I run the job, I got the following errors:
2010-03-11 11:
Hi Karthik,
Thanks a lot for the information! I will look into it and try!
Boyu
2009/11/22 Karthik Kambatla
> Though it is recommended for large files, DistributedCache might be a good
> alternative for you.
>
>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/Di
actually how map-side join works.
> >
> >
> > Gang Luo
> > -
> > Department of Computer Science
> > Duke University
> > (919)316-0993
> > gang@duke.edu
> >
> >
> >
> > - 原始邮件
> > 发件人: Boyu Zhang
&
they
are large, in a sense.
My current way is that I use the job.set and job.get to set and retrieve
these lines as configurations. But it is not efficient at all!
Could anyone help me with an alternative solution? Thanks a million!
Boyu Zhang
University of Delaware
assuming the size isnt
> relatively large ) and use configure / setup to retrieve these.. Or use
> distributed cache to read a file containing these lines ( possibly with jvm
> reuse if you want that extra bit as well. )
>
> Thanks,
> Amogh
>
> On 10/26/09 6:17 AM, "Boyu
?
Thanks a lot for reading my email, really appreciate any help!
Boyu Zhang(Emma)
University of Delaware
input data to the top level?
Thanks a lot!
Boyu Zhang
University of Delaware
/dir_0.
My question is that is there any way that I can process all the files in a
hierarchy with the input path set to the top level?
Thanks a lot for the time!
Boyu Zhang
University of Delaware
...
>
> On Fri, Sep 4, 2009 at 12:22 PM, Boyu Zhang wrote:
>
> > Yes, the output of the first iteration is the input of the second
> > iteration.
> > Actually, I am trying the page ranking problem. In the algorithm, you
> have
> > to run several iterations each u
separate
> > job
> > > configs for them. You can pass these different configs to the Tool
> object
> > > in
> > > the same parent class... But they will essentially be different jobs
> > being
> > > called together from inside the same java parent clas
ssue??
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Fri, Sep 4, 2009 at 11:36 AM, Boyu Zhang wrote:
>
> > Dear All,
> >
> > I am using Hadoop 0.20.0. I have an application that needs to run
>
for your time!
Boyu Zhang(Emma)
Thanks for the tip, I am reading the code now, thanks a lot!
Boyu Zhang
-Original Message-
From: Hairong Kuang [mailto:hair...@yahoo-inc.com]
Sent: Friday, July 24, 2009 1:55 PM
To: common-user@hadoop.apache.org
Subject: Re: Questions on How the Namenode Assign Blocks to Datanodes
Thank you for the reply. Do you by any chance remember where did you read
this? Thanks a lot!
Boyu
-Original Message-
From: Hairong Kuang [mailto:hair...@yahoo-inc.com]
Sent: Friday, July 24, 2009 12:54 PM
To: common-user@hadoop.apache.org
Subject: Re: Questions on How the Namenode A
replica = 1,
I just use it for testing to see how HDFS works, what is the policy to
decide which datanode gets which block? Thank you so much!
Boyu Zhang
Ph. D. Student
Computer and Information Sciences Department
University of Delaware
(210) 274-2104
bzh...@udel.edu
http://www.eecis.udel.edu
). But what polices
are used to assign these blocks to datanode? In my case, machine1 got 14
blocks, machine2 got 12 blocks and machine3 got 16 blocks.
Could anyone one help me with it? Or is there any documentation I can read
to help me clarify this?
Thanks a lot!
Boyu Zhang
Ph. D. Student
). But what polices
are used to assign these blocks to datanode? In my case, machine1 got 14
blocks, machine2 got 12 blocks and machine3 got 16 blocks.
Could anyone one help me with it? Or is there any documentation I can read
to help me clarify this?
Thanks a lot!
Boyu Zhang
Ph
st line in /etc/hosts, that this may happen.
>
> On Thu, Jul 16, 2009 at 1:54 PM, Boyu Zhang wrote:
>
> > Thank you for your suggestion.
> >
> > I have done that plenty of times, and every time I delete the pids and
> the
> > files /tmp/hadoop-name that namenode form
Thank you for your suggestion.
I have done that plenty of times, and every time I delete the pids and the
files /tmp/hadoop-name that namenode formate genrated. But I got the same
error over and over.
I found out that after I start-dfs.sh, I check the netstat -pnl(from the
master), there is a p
actual machien names.
Thanks a lot!
2009/7/16 zjffdu
> Boyu,
>
> Can you ping the master node from the slave node ? And can you open task
> web
> UI on the slave ?
>
>
> Jeff Zhang
>
>
>
> -Original Message-
> From: Boyu Zhang [mailto:boyuzhan...
to get anything going. with
> 0.20.0 i got it up and running in about 20 min
>
>
> On Wed, Jul 15, 2009 at 4:24 PM, Boyu Zhang wrote:
>
> > I am using the version 0.18.3 right now, the configurations files you
> > mentioned are available in the version 0.20. Anyway,
org/core/docs/current/cluster_setup.html#Site+Configuration
>
> thats a link to the site that should be useful. its the cluster setup just
> follow it carefully and u should be good to go.
>
> *
> On Wed, Jul 15, 2009 at 4:09 PM, Boyu Zhang wrote:
>
> > Thank you Divij,
> >
> &g
sed on the server
> and
> the port that the slave is trying to communicate on.
>
> On Wed, Jul 15, 2009 at 3:57 PM, Boyu Zhang wrote:
>
> > Dear all,
> >
> > I am trying to set up a cluster of two machines using Hadoop. One machine
> > is
> > both n
Dear all,
I am trying to set up a cluster of two machines using Hadoop. One machine is
both namenode and jobtracker, the other machine is the datanode and
tasktracker.
I set up passwd-less ssh both directions.
The comments I used are:
> bin/hadoop namenode -format
> bin/start-dfs.sh
starti
corresponding directory.
If your last job did not end correctly, you will have this kind of problems.
Hope it could help.
Boyu Zhang
Ph. D. Student
Computer and Information Sciences Department
University of Delaware
(210) 274-2104
bzh...@udel.edu
http://www.eecis.udel.edu/~bzhang
-Original
Dear all,
This is a beginner to Hadoop. I am looking into how to use virtual machine
in Hadoop, and I have some general questions about HOD.
Can I consider HOD a virtual machine for running Hadoop jobs?(usually VMs
are really big, but this one seems light-weighted.)
Is the configuration
Dear all,
This is a beginner to Hadoop. I am looking into how to use virtual machine
in Hadoop, and I have some general questions about HOD.
Can I consider HOD a virtual machine for running Hadoop jobs?(usually VMs
are really big, but this one seems light-weighted.)
Is the configuration
57 matches
Mail list logo