number of reducers to desired. This involves two highly ineffective (for the
task) steps - sorting and fetching. Is there a way to get around that?
Ideally I'd want all mapper outputs to be written to one file, one record
per line.
Thanks.
---
Dmitry Pushkarev
+1-650-644-8988
on
DP alignment whereas navigation in seed space and N*log(n) sort requires
only a fraction of that time - that was my experience applying hadoop
cluster to sequencing human genomes.
---
Dmitry Pushkarev
+1-650-644-8988
-Original Message-
From: michael.sch...@gmail.com [mailto:michael.sch
everything abnormal for /dev/sdaX. But if you have better solution I'd
appreciate if you share it.
---
Dmitry Pushkarev
+1-650-644-8988
Definitely not,
You should be looking at expandable Ethernet storage that can be extended by
connecting additional SAS arrays. (like dell powervault and similar things
from other companies)
600Mb is just 6 seconds over gigabit network...
---
Dmitry Pushkarev
-Original Message-
From
I'm wondering if there is any way to make hadoop give larger datasets to
map jobs? (trivial way, of course would be to split dataset to N files and
make it feed one file at a time, but is there any standard solution for
that?)
Thanks.
---
Dmitry Pushkarev
+1-650-644-8988
we're running very small (15 nodes, 120 cores) specifically built for the
task.
---
Dmitry Pushkarev
+1-650-644-8988
-Original Message-
From: Delip Rao [mailto:delip...@gmail.com]
Sent: Tuesday, January 20, 2009 6:19 PM
To: core-user@hadoop.apache.org
Subject: Re: streaming split sizes
Hi
Hi.
Can anyone share experience of successfully parallelizing matlab tasks using
hadoop?
We have implemented this thing with python (in form of simple module that
takes serialized function and data array and runs this function on the
cluster)m but we really have no clue how to that in
Dear hadoop users,
I'm lucky to work in academic environment where information security is not
the question. However, I'm sure that most of the hadoop users aren't.
Here is the question: how secure hadoop is? (or let's say foolproof)
Here is the answer:
Hi.
I have a strange problem with hadoop when I run jobs under windows (my
laptop runs XP, but all cluster machines including namenode run Ubuntu). I
run job (which runs perfectly under linux, and all configs and Java versions
are the same), all mappers finishes successfully, and so does
Hi. I'm writing mapreduce task in jython, and I can't launch
ToolRunner.run, jython says TypeError: integer required on ToolRunner.run
line and I can't get more detailed explanation. I guess the error is either
in ToolRunner or in setConf:
What am I doing wrong? J
And can anyone share a
Awesome, wish I had it couple of weeks ago.
By the way, can someone give me a Jython code that interacts with HBase?
I want to learn to write simple mapreducers (for example to go over all rows
in a given column, compute something, and put result into another column).
It work I promise to write
Why not use HAR over HDFS? Idea being that if you don't do too much of
writing, having files compacter to har archives (that will be stored in 64mb
slices) might be a good answer.
Thus the question for hadoop developers, is hadoop har-aware?
In two senses:
1. Whether it tries to assign tasks
because of a
global lock, unfortunately. The other cpus would still be used to some
extent by network IO and other threads. Usually we don't see just one
cpu at 100% and nothing else on the other cpus.
What kind of load do you have?
Raghu.
Dmitry Pushkarev wrote:
Hi.
My namenode runs
Hi.
My namenode runs on a 8-core server with lots of RAM, but it only uses one
core (100%).
Is it possible to tell namenode to use all available cores?
Thanks.
This will effectively ruin system on large scale. Since you will have to update
all blocks when you play with metadata...
-Original Message-
From: 叶双明 [mailto:[EMAIL PROTECTED]
Sent: Wednesday, September 10, 2008 12:06 AM
To: core-user@hadoop.apache.org
Subject: Re: Thinking about
Hi.
How can node find out how many task are being run on it at a given time?
I want tasktracer nodes (which are assigned from amazon EC) to shutdown if
nothing is being run for some period of time, but don't yet see right way of
implementing this.
on that machine is running slower than others), speculative
execution, if enabled, can help a lot. Also, implicitly, faster/better
machines get more work than the slower machines.
On 9/8/08 3:27 AM, Dmitry Pushkarev [EMAIL PROTECTED] wrote:
Dear Hadoop users,
Is it possible without
Dear Hadoop users,
Is it possible without using java manage task assignment to implement some
simple rules? Like do not launch more that 1 instance of crawling task on
a machine, and do not run data intensive tasks on remote machines, and do
not run computationally intensive tasks on
Hi,
I'd check java version installed, that was the problem in my case, and
surprisingly no output from hadoop. If it help - can you submit bug request
? :)
-Original Message-
From: Shirley Cohen [mailto:[EMAIL PROTECTED]
Sent: Thursday, September 04, 2008 10:07 AM
To:
/docs/r0.18.0/hadoop_archives.html
On 9/3/08 3:21 PM, Dmitry Pushkarev [EMAIL PROTECTED] wrote:
Does anyone have har/unhar utility?
Or at least format description: It looks pretty obvious though, but just
in
case.
Thanks
03, 2008 4:00 AM
To: core-user@hadoop.apache.org
Subject: Re: har/unhar utility
You could create a har archive of the small files and then pass the
corresponding har filesystem as input to your mapreduce job. Would that
work?
On 9/3/08 4:24 PM, Dmitry Pushkarev [EMAIL PROTECTED] wrote
Dear hadoop users,
Our lab in slowly switching from SGE to hadoop, however not everything seems
to be easy and obvious. We are in no way computer scientists, we're just
physicists, biologist and couple of statisticians trying to solve our
computational problems, please take this into
22 matches
Mail list logo