Bryan,
Not sure you should be concerned with whether the output is on local vs.
HDFS. I wouldn't think there would be much of a performance difference if
you are doing streaming output (append) in both cases. Hadoop already uses
local storage where ever possible (including for the task working
d
Keep two configuration directories with different slaves files (say
conf.dfs/ and conf.mr/) and use `hadoop-daemons.sh --config {conf dir
path} start {daemon}` to start up DN/TT daemons.
On Thu, May 5, 2011 at 8:06 AM, Matthew John wrote:
> Hi all,
>
> I see that there is an option to provide a s
Hi all,
I went down the lines of figuring out how the JobTracker, JobInProgress and
TaskScheduler combined works out the problem of giving a Task (corresponding
to an InputSplit) to a Node (corresponding to a TaskTracker in the node). I
understand that a set of methods in JobInProgress like obtain
Hi all,
I see that there is an option to provide a slaves_file as input to
bin/start-dfs.sh and bin/start-mapred.sh so that slaves are parsed from this
input file rather than the default conf/slaves.
Can someone please help me with the syntax for this. I am not able to figure
this out.
Thanks,
M
Hey Matt,
we are using the same Dell boxes, and we can get 2 GB/s per node (read and
write) without problems.
On Wed, May 4, 2011 at 8:43 AM, Matt Goeke wrote:
> I have been reviewing quite a few presentations on the web from
> various businesses, in addition to the ones I watched first hand
Got it. Thankyou Harsh. BTW
It is `hadoop dfs -Ddfs.blocksize=size -put file file`. No dot between
"block" and "size"
On Wed, May 4, 2011 at 3:18 PM, He Chen wrote:
> Tried second solution. Does not work, still 2 64M blocks. h
>
>
> On Wed, May 4, 2011 at 3:16 PM, He Chen wrote:
>
>> Hi Har
Hello
I'm using a small fully distributed Hadoop cluster. All Hadoop daemons run under
"hadoop" users, and I submit jobs as
"user".
I ran into a couple of problems when I set mapred.output.dir to an (nfs)
file:// location.
1. The output dir gets created, but it belongs to "hadoop".
It sort o
Exactly what I want. Thanks Harsh J.
-Gang
- 原始邮件
发件人: Harsh J
收件人: common-user@hadoop.apache.org
发送日期: 2011/5/4 (周三) 4:03:35 下午
主 题: Re: don't want to output anything
Hello Gang,
On Thu, May 5, 2011 at 1:22 AM, Gang Luo wrote:
>
>
> Hi,
>
> I use MapReduce to process and output
Tried second solution. Does not work, still 2 64M blocks. h
On Wed, May 4, 2011 at 3:16 PM, He Chen wrote:
> Hi Harsh
>
> Thank you for the reply.
>
> Actually, the hadoop directory is on my NFS server, every node reads the
> same file from NFS server. I think this is not a problem.
>
> I li
Hi Harsh
Thank you for the reply.
Actually, the hadoop directory is on my NFS server, every node reads the
same file from NFS server. I think this is not a problem.
I like your second solution. But I am not sure, whether the namenode
will divide those 128MB
blocks to smaller ones in future or
Am I mistaken or are side-effect files on HDFS? I need my temp files to be on
the local filesystem. Also, the java working directory is not the reducer's
local processing directory, thus "./tmp" doesn't get me what I'm after. As it
stands now I'm using java.io.tmpdir which is not a long-term sol
Hello Gang,
On Thu, May 5, 2011 at 1:22 AM, Gang Luo wrote:
>
>
> Hi,
>
> I use MapReduce to process and output my own stuff, in a customized way. I
> don't
> use context.write to output anything, and thus I don't want the empty files
> part-r-x on my fs. Is there someway to eliminate the ou
Your client (put) machine must have the same block size configuration
during upload as well.
Alternatively, you may do something explicit like `hadoop dfs
-Ddfs.block.size=size -put file file`
On Thu, May 5, 2011 at 12:59 AM, He Chen wrote:
> Hi all
>
> I met a problem about changing block size
Hi,
I use MapReduce to process and output my own stuff, in a customized way. I
don't
use context.write to output anything, and thus I don't want the empty files
part-r-x on my fs. Is there someway to eliminate the output?
Thanks.
-Gang
Hi Bryan,
These are called side effect files, and I use them extensively:
O'Riley Hadoop 2nd Edition, p. 187
Pro Hadoop, p. 279
You get the path to the save the file(s) using:
http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/Fi
leOutputFormat.html#getWorkOutputPath%28org
Bryan,
The relative ./tmp directory is under the work-directory of the task's
attempt on a node. I believe it should be sufficient to use that as a
base, to create directories or files under? Or are you simply looking
for a way to expand this relative directory name to a fully absolute
one (which
Hi all
I met a problem about changing block size from 64M to 128M. I am sure I
modified the correct configuration file hdfs-site.xml. Because I can change
the replication number correctly. However, it does not work on block size
changing.
For example:
I change the dfs.block.size to 134217728 byt
Hey!
Sorry my math is off. I keep thinking in terms of TB per core and not drives.
:-)
To be honest I don't know if I would recommend 6 core cpus.
We're running on what is now considered 'old hardware' (Intel Xeon e5500
series) .
Yes we saw that w 8 cores and 4 drives, we were limited by the
Right. What I am struggling with is how to retrieve the path/drive that the
reducer is using, so I can use the same path for local temp files.
On May 4, 2011, at 9:03 AM, Robert Evans wrote:
> Bryan,
>
> I believe that map/reduce gives you a single drive to write to so that your
> reducer has
Mike,
Thanks for the response. It looks like this discussion forked on the CDH
list so I have two different conversations now. Also, you're dead on
that one of the presentations I was referencing was Ravi's.
With your setup I agree that it would have made no sense to go the 2.5"
drive route given
Hi Matt.
I think you attended Ravi's presentation
One of the reasons we used 4 drives per node is that our nodes are in 1U boxes
and you can only fit 4 3.5" SATA drives in those boxes. Could we have gone for
more drives using 2.5" SATA drives? Yes, but then you will reduce the amount of
I have been reviewing quite a few presentations on the web from
various businesses, in addition to the ones I watched first hand at
the cloudera data summit last week, and I am curious as to others
thoughts around hard drive ratios. Various sources including Cloudera
have sited 1 HDD x 2 cores x 4
Bryan,
I believe that map/reduce gives you a single drive to write to so that your
reducer has less of an impact on other reducers/mappers running on the same
box. If you want to write to more drives I thought the idea would then be to
increase the number of reducers you have and let mapred as
I only wanted the context as a way of getting at the configuration, so
making the class implement Configurable will solve my problem.
On Tue, May 3, 2011 at 11:35 PM, Harsh J wrote:
> Hello,
>
> On Wed, May 4, 2011 at 5:13 AM, W.P. McNeill wrote:
> > I have a new-API
> > Partitioner<
> http://h
I too am looking for the best place to put local temp files I create during
reduce processing. I am hoping there is a variable or property someplace that
defines a per-reducer temp directory. The "mapred.child.tmp" property is by
default simply the relative directory "./tmp" so it isn't useful o
Hello Hadoop users,
I came across some Map-Reduce examples on google code.
Here is the link
http://code.google.com/p/hadoop-map-reduce-examples/wiki/Wikipedia_GeoLocation
In the Mapper class, the writer has used GEO_RSS_URI.
If someone has used these codes, can anyone help me in figuring out w
Hello,
Can anyone please explain the use of the Transaction Log on the NameNode ?
AFAIK, it logs the files created and deleted details in the Hadoop Cluster.
Thanks.
hey folks,
BerlinBuzzwords 2011 is close only 33 days left until the big Search,
Store and Scale opensource crowd is gathering
in Berlin on June 6th/7th.
The conference again focuses on the topics search,
data analysis and NoSQL. It is to take place on June 6/7th 2011 in Berlin.
We are looking f
28 matches
Mail list logo