On Mon, Jul 5, 2010 at 07:47, Bardia Afshin brandon...@gmail.com wrote:
What's the unsubcribe link?
To unsubscribe, send mail to
general-unsubscr...@hadoop.apache.org
Many Apache MLs have an unsubscribe footer.
Anyone volunteering to make this happen for this list, too?
Bernd
If you set the number of reduce tasks to zero, the outputs of the mappers
will be sent directly to the OutputFormat. You can debug your map phase of a
job by disabling reduce and inspecting the mapper outputs, and then
re-enable the reducer after you've got the mapping part of the job running
There are a number of different versions and distributions of Hadoop which,
as far as I understand, all differ from each other. I know that in the
0.20-append branch, files in HDFS can be appended, and that the Y!
distribution (0.20.S) implements security features through Kerberos. And
then there
Hi,
Is org.apache.hadoop.mapred.lib.MultipleOutputFormat deprecated? I did not
find @deprecated comments in source file in 0.20.2
But I cannot use following:
job.setOutputFormatClass( org.apache.hadoop.mapred.lib.MultipleOutputFormat
)
The type does not match.
Segel, Jay
Thanks for reply!
Your parallelism comes from multiple tasks running on different nodes
within the cloud. By default you get one map/reduce job per block. You can
write your own splitter to increase this and then get more parallelism.
sounds like an elegant solution. We can modify the
On Mon, Jul 5, 2010 at 5:08 AM, elton sky eltonsky9...@gmail.com wrote:
Segel, Jay
Thanks for reply!
Your parallelism comes from multiple tasks running on different nodes
within the cloud. By default you get one map/reduce job per block. You can
write your own splitter to increase this and
On Mon, Jul 5, 2010 at 1:12 AM, Evert Lammerts evert.lamme...@sara.nlwrote:
There are a number of different versions and distributions of Hadoop
which, as far as I understand, all differ from each other. I know that in
the 0.20-append branch, files in HDFS can be appended, and that the Y!
There's actually an open ticket somewhere to make distcp do this using the
new concat() API in the NameNode.
Where can I find that open ticket?
concat() allows several files to be combined into one file at the metadata
level, so long as a number of
restrictions are met. The work hasn't been done
On Jul 5, 2010, at 5:01 PM, elton sky wrote:
Well, this sounds good when you have many small files, you concat() them
into a big one. I am talking about split a big file into blocks and copy all
a few blocks in parallel.
Basically, your point is that hadoop dfs -cp is relatively slow and
Basically, your point is that hadoop dfs -cp is relatively slow and could
be made faster. If HDFS had a more multi-threaded design, itwould make cp
operations faster.
What I mean is, if we have the size of a file we can parallel by calculating
blocks. Otherwise we couldn't.
On Tue, Jul 6, 2010
Hi,
I have written my custom partitioner for partitioning datasets. I want to
partition two datasets using the same partitioner and then in the next
mapreduce job, I want each mapper to handle the same partition from the two
sources and perform some function such as joining etc. How I can
11 matches
Mail list logo